Airflow on Rasberry Pi, Spark SQL and Flink Streaming

Welcome to the second issue of data engineering newsletter. In this issue, we'll start with understanding in detail about the role of a Data engineer, then move on to set up Airflow on a rasberry Pi, and then look at running Spark SQL on encrypted data.

Who are you Data Engineer?#


In this article, Einav Baraban highlights the different data roles in the industry, and then moves on to explain the different types of Data Engineers and the key skills required for the role.

Install Airflow 2 on a Rasberry Pi#


Apache Airflow is one of the most used workflow orchestration tool, and this tutorial by Pedro Madruga helps you to get started with airflow running on Rasberry Pi.

How to run Spark SQL on Encrypted Data#


This tutorial by Octavian Sima introduces Opaque SQL, an open source platform for securely running Spark SQL on encrypted data. The objective of Opaque SQL is to enable analytics on sensitive data using Spark dataframes.

Autoscaling Apache Flink Applications#

Apache Flink, a distributed stream processing engine, introduced the much anticipated feature of auto-scaling in the latest release - 1.13. In this talk Robert Metzger explains different streaming applications deployment scenarios, and how auto-scaling can reduce cost and improve operations.