Today I discovered…
dbt
CLI to write SQL queries for data transformation with ability to version, test, and document them before safely deploying to production
💖 What I like about dbt:
Version control - I could use similar git workflow for data models that I use for the code
Testing - Similar to code tests, I could write tests for the SQL queries, that boosts my confidence over the changes done in the model
Reuse models - I could reference one SQL model to build another one on top of it
Programmatically building queries - I could use code to write SQL models which enables a lot more possibilities
No additional database/infrastructure - All queries are run in the targeted database directly and the generated views/table are saved in the same database with different schema.
👎 What I dislike about dbt:
Limited workflow scheduling features - I needed to trigger dbt based on certain conditions/events and expected dbt to help with that. It is debatable whether dbt should take more active role in this or not. On the hindsight, it looks like the correct decision by dbt to outsource this to other tools only.
No NoSQL support - In my use case, I had unstructured data. I had to twist my head and work with Postgres and dbt to get something done which would have been so much easy to deal with if I didn’t need to think about a column being text vs number. I see 10x more value in NoSQL support than SQL for dbt.
Dependency hell - A dbt project can run into dependency hell with huge no. of SQL models going in all directions leading to surprises sometimes, specially when we talk about inter-project and cross-team collaboration. I see possibility of improvement in the way dependencies are managed and provide better tools for an average user to avoid them.
⭐ Ratings and metrics
Based on my experience, I would rate this project as following
Production readiness: 9/10
Docs rating: 7/10
Time to POC(proof of concept): less than 1 day
Author: Drew Banin @drewbanin and dbt Labs team
Demo | Source
🛡 License: Apache-2.0
Tech Stack: Python
🗣️ What people say about dbt around the web
What is wrong with dbt on Reddit
Why dbt is so popular on Reddit
You can also discuss in response to this post on Substack
If you discovered an interesting Open-Source project and want me to feature it in the newsletter, get in touch via the form above. To support this newsletter and Open-Source authors, follow #OpenSourceDiscovery on LinkedIn and Twitter