Jul 15, 2020
Archive4 min readThe data engineering guide
A learning roadmap I wish I'd had when I started. Programming & data structures, SQL and database internals, big-data systems, data stores, and the DevOps surface area you can't avoid. Five years later most of it still holds.
- data-engineering
- learning
- notes
A 2020 roadmap aimed at someone breaking into data engineering. What to learn, in roughly what order, and which resources I found worth the time. The five sections are programming & data structures, SQL and database internals, big-data systems (Hadoop, Spark, Kafka), data stores (warehouses vs lakes, NoSQL, Delta), and DevOps fundamentals (Linux, CI/CD, containers).
Five years later the resources I'd swap are the streaming chapter (Kafka still good, but I'd add Iceberg + a modern lakehouse) and the storage chapter (DuckDB and Iceberg deserve top billing). The skeleton holds up.
Original on Medium.