DE Camp Week 2 Recap

Week 2 at DE Camp was all about SQL. We started the week by wrapping up our discussion of config-driven pipelines and logging, and how to modularize our code in this regard using yml files for a consistent structure.

Then, it was all about SQL. Though I’ve worked with SQL regularly throughout my career, it’s been at a pretty basic level - schema migrations, the basics of indexing, and joins, views and function. I haven’t had much of a chance to use SQL for analytics purposes, so the time spent this week on CTEs and window functions was a good refresher on more advanced SQL concepts. I felt the gaps in my knowledge as we started discussing window functions, so I spent some additional time outside of class going through Learn SQL’s window functions path. I learned a lot more about how over() works for aggregating data without using a group by clause, and the use of rank() and ntile to order and group rows in relation to the rows around them (i.e, the window).

We also spent some time reviewing the major concepts of ELT, and its (dis)advantages compared to ETL. Unlike wtih ETL, where we’re transforming the data before loading it into a target data store, with ELT we’re more concerned with preserving the raw data in its original form, so the data is typically loaded directly into a store after loading it, after which transformations can be executed to get the data into the correct shape for analytics or other use cases. As part of the extract process, we reviewed the use of Jinja to help with abstraction and reusability of code. Nothing too new here; I’ve used Jinja a fair amount in the past with a few different tech stacks, including Django, Flask, and HubSpot templates.

Lastly, we practiced incrementally loading data from one data source to another, using our configuration options and Jinja to set the appropriate config values.

Overall, the bulk of the week was spent on using SQL to move data around; it was a great review of SQL as a whole, and how we can use software development best practices (as well as tools like Jinja) to create robust data loading processes.

Written by

Leo Rubiano

Reader, programmer, traveler. Experienced back-end dev proficient with Python, Go, Elixir, Ecto, and Postgres.