← All writing

Jul 15, 2020

Archive4 min read

The data engineering guide

A learning roadmap I wish I'd had when I started. Programming & data structures, SQL and database internals, big-data systems, data stores, and the DevOps surface area you can't avoid. Five years later most of it still holds.

  • data-engineering
  • learning
  • notes

A 2020 roadmap aimed at someone breaking into data engineering. What to learn, in roughly what order, and which resources I found worth the time. The five sections are programming & data structures, SQL and database internals, big-data systems (Hadoop, Spark, Kafka), data stores (warehouses vs lakes, NoSQL, Delta), and DevOps fundamentals (Linux, CI/CD, containers).

Five years later the resources I'd swap are the streaming chapter (Kafka still good, but I'd add Iceberg + a modern lakehouse) and the storage chapter (DuckDB and Iceberg deserve top billing). The skeleton holds up.

Original on Medium.