Welcome to the second issue of data engineering newsletter. In this issue, we'll start with understanding in detail about the role of a Data engineer, then move on to set up Airflow on a rasberry Pi, and then look at running Spark SQL on encrypted data.article
In this article, Einav Baraban highlights the different data roles in the industry, and then moves on to explain the different types of Data Engineers and the key skills required for the role.tutorial
Apache Airflow is one of the most used workflow orchestration tool, and this tutorial by Pedro Madruga helps you to get started with airflow running on Rasberry Pi.tutorial
This tutorial by Octavian Sima introduces Opaque SQL, an open source platform for securely running Spark SQL on encrypted data. The objective of Opaque SQL is to enable analytics on sensitive data using Spark dataframes.
Apache Flink, a distributed stream processing engine, introduced the much anticipated feature of auto-scaling in the latest release - 1.13. In this talk Robert Metzger explains different streaming applications deployment scenarios, and how auto-scaling can reduce cost and improve operations.video