Already in my time in the Data Engineering Zoomcamp, something super useful has popped up: Docker.

I’ve worked with Docker on a regular basis for several years at this point. Spinning up a new container to launch an app locally, putting together a docker-compose script, using volumes to persist data - these are all things I’m comfortable with. But at a lot of modern orgs, the use of containers is abstracted away by DevOps teams, and I’d forgotten just how cool Docker’s underlying technology is.

More practically, my time in the datazoomcamp has allowed me to refresh my memory on some particulary useful things we can do with Docker, and also taught me something new.

For starters, as part of the week 1 curriculum, we had to get our environments correctly set up in anticipation of future projects, and this involved loading a dataset. There were a handful of different lessons and steps involved, but at the end, by following along, I had re-learned how to spin up a Docker-based instance Postgres, as well as a Docker-based instance of pgAdmin to connect to the database (served via the Docker Postgres container). Very useful stuff!

I’d done that before on another personal computer, but I’m pretty sure I had followed steps to connect them via a Docker network. That’s one of the routes the zoomcamp first suggests, but I kept running into issues with the localhost not being identified. So I went with the second route of using a docker-compose file with the two outlined services and linking them that way.

I also learned how to use Docker to run an ingestion script, something that I’m sure will come up often in the world of data engineering. I ended up with a simple Dockerfile that references a local Python script, which in turn loads the csv data found in the directory, uses panda to generate the appropriate SQL representing the data in the csv, and loads it into the Postgres database being served by the Docker container.

Reflecting on those, all of it seems logical, reproducible, and like an obvious solution. But it was pretty cool to learn how to create containers for specific purposes like that (as opposed to the more traditional use I’ve run into in my career of serving up a web app). And now that I’ve got my pgAdmin-postgres connection running on Docker, I don’t think I’m going back to my local apps versions.

Written by

Leo Rubiano

Reader, programmer, traveler. Experienced back-end dev proficient with Python, Go, Elixir, Ecto, and Postgres.