2 min read

What is DBT?

What is DBT? Data Build Tool makes data engineering activities straightforward and accessible.
3:41

DBT (data build tool) makes data engineering activities straightforward and accessible. It is an open-source tool that can handle data modelling, query orchestration, and documentation. With it, a data analyst need only write simple select statements and dbt handles the rest – transforming those statements into tables and views.

 

 

DBT focusses on the T in any given ETL (extract, transform, load) process. The transformations themselves are written as SQL code, meaning SQL coding skills and knowledge of data modelling are the most important skills for using dbt. This enables a wider range of data professionals to bridge the gap between ad-hoc analysis and durable, robust data engineering work – something that is sorely needed when data engineers are in short supply. Because the code is modularised, it allows a complex chain of queries to be debugged easily, saving time and making project onboarding easier for new team members.

The tool supports a wide variety of table materialisations, known as dbt models, including table, view, incremental, and ephemeral. This allows table maintenance to be handled with minimal repetition of code and simplifies complex table management controls into flexible and accessible SQL wrappers. At the same time, dbt uses references to other models to determine the order of execution in any given transformational process, simultaneously documenting this lineage to help analysts understand the flow of data.

 

How does QuantSpark use dbt?

Our teams have used dbt across several recent projects, creating both novel enterprise data models and separate data cubes:

 

For a motorbike brand:

Here dbt operated on top of Amazon Redshift, handling the transformation of data from several databases and raw data sources. It ingested the data, cleaned and standardised it, whilst documenting the source systems so future analyst teams could understand the process.
The data was then modelled and transformed, outputting a series of analyst-friendly tables: customers, purchases, sales, targets, stock etc. These were then fed into BI and MI dashboards which allowed the management team to make more data-backed strategic choices and respond to real-time changes in trends throughout the business.

 

For a software-as-a-service (SaaS) client:

We created three data cubes using dbt. Each served a specific purpose: monitor their sales and contract renewals; identify which sales came from which marketing channel; and analyse the health of each customer account.

In this instance dbt sat on top of a Postgres database and was responsible for storing SQL transformations, determining query execution order and documenting data flow lineage. A set of dashboards for each cube were created from the resulting data cubes. This gave heavily increased visibility on key revenue related metrics from the marketing funnel to customer lifetime behaviours and key drivers of churn and revenue retention.

 

For an asset manager:

Just as for the motorbike brand project, here dbt operated on top of Amazon Redshift, handling the transformation of data from multiple 3rd party data sources, including Bloomberg and Charles River. The transformation, cleaning and documentation of this data is handled by dbt, streamlining the process. The modelled data was then used to feed BI dashboards which refresh every hour.

Not only did this allow the client-side analysts to make more informed investment decisions, but dbt reduced testing time dramatically by unifying business logic into a single spot, instead of several custom scripts.

 

Written with contributions from Stephanie Boyle