Mastering dbt: Your Complete Step-by-Step Handbook

Setting Up dbt Core

Setting up dbt Core involves installing the tool, configuring your environment, and running your first dbt project. Below is a step-by-step guide to setting up dbt Core.

Step 1: Install dbt Core

1.1 Install Python

dbt Core requires Python 3.7 or later. Check your Python version:

python --version

If Python is not installed, download and install it from python.org.

Creating a virtual environment helps manage dependencies. Different Python projects might require different versions of the same package. If you install everything globally, you might break one project while setting up another.

For example:

  • Project A requires dbt-core==1.6.0
  • Project B requires dbt-core==1.7.2

If you install both globally, they might conflict. A virtual environment ensures each project gets the right versions. so it’s not mandatory but recommended to create a virtual environment.

python -m venv dbt-env
source dbt-env/bin/activate  # For Mac/Linux
dbt-env\Scripts\activate     # For Windows

1.3 Install dbt Core

Run the following command to install dbt Core:

pip install dbt-core

If you are using a specific database, install the corresponding adapter:

Why Do You Need a Database Adapter in dbt?

dbt (Data Build Tool) is not a database itself—it connects to your database and runs SQL transformations. To do this, dbt needs an adapter that tells it how to communicate with a specific database.

Each adapter is a Python package that enables dbt to translate its SQL code into the database-specific SQL dialect.

Available dbt Adapters (and When to Use Them)

Here’s a breakdown of commonly used dbt adapters and when you need them:

DatabaseAdapter PackageInstall Command
PostgreSQLdbt-postgrespip install dbt-postgres
Snowflakedbt-snowflakepip install dbt-snowflake
BigQuerydbt-bigquerypip install dbt-bigquery
Redshiftdbt-redshiftpip install dbt-redshift
Databricksdbt-databrickspip install dbt-databricks
MySQLdbt-mysqlpip install dbt-mysql
SQL Serverdbt-sqlserverpip install dbt-sqlserver
Oracledbt-oraclepip install dbt-oracle
Trinodbt-trinopip install dbt-trino

Without the correct adapter, dbt cannot connect to your database!

Check if dbt is installed correctly:

dbt --version

How to Install the postgres database on your local machine:

Download Dbeaver from here

For macOS (Using Homebrew)

Steps:

  1. Install Homebrew (if not installed).
  2. Use Homebrew to install PostgreSQL.
  3. Start PostgreSQL service.
  4. Verify installation.
  5. Create a database and schema.
  6. Connect PostgreSQL to dbt.

Commands:

# Install Homebrew (if not installed)
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"

# Install PostgreSQL
brew install postgresql

# Start PostgreSQL service
brew services start postgresql

# Verify PostgreSQL installation
psql --version

# Access PostgreSQL shell
psql postgres

# Create a new database and schema
CREATE DATABASE dbt_db;
CREATE SCHEMA schema_db;

# Create a new database user (optional)
CREATE USER myuser WITH PASSWORD 'mypassword';
ALTER ROLE myuser WITH SUPERUSER;

For Windows (Using PostgreSQL Installer)

Steps:

  1. Download the PostgreSQL installer from the official website.
  2. Run the installer and follow the setup wizard.
  3. Set a password for the PostgreSQL postgres user.
  4. Start PostgreSQL using pgAdmin or Command Prompt.
  5. Create a database and schema.

Commands:

# Open Windows Command Prompt (cmd) and access PostgreSQL shell
psql -U postgres

# Create a new database and schema
CREATE DATABASE dbt_db;
\c dbt_db
CREATE SCHEMA schema_db;

# Create a new database user (optional)
CREATE USER myuser WITH PASSWORD 'mypassword';
ALTER ROLE myuser WITH SUPERUSER;

For Linux (Ubuntu/Debian-based systems)

Steps:

  1. Update package lists.
  2. Install PostgreSQL.
  3. Start and enable the PostgreSQL service.
  4. Switch to the PostgreSQL user.
  5. Create a database and schema.

Commands:

# Update package lists
sudo apt update

# Install PostgreSQL
sudo apt install postgresql postgresql-contrib -y

# Start PostgreSQL service
sudo systemctl start postgresql

# Enable PostgreSQL to start on boot (optional)
sudo systemctl enable postgresql

# Switch to the PostgreSQL user and access shell
sudo -i -u postgres
psql

# Create a new database and schema
CREATE DATABASE dbt_db;
\c dbt_db
CREATE SCHEMA schema_db;

# Create a new database user (optional)
CREATE USER myuser WITH PASSWORD 'mypassword';
GRANT ALL PRIVILEGES ON DATABASE dbt_db TO myuser;

Step 2: Initialize a dbt Project

2.1 Create a New dbt Project

Run the following command:

dbt init my_dbt_project
cd my_dbt_project

This creates a project folder with the following structure:

my_dbt_project/
│-- dbt_project.yml
│-- models/
│-- seeds/
│-- tests/
│-- macros/
│-- snapshots/
│-- analysis/   <-- (New in recent versions)
│-- target/     <-- (Appears after running dbt)
│-- logs/     <-- (Appears after running dbt)
└── ~/.dbt/profiles.yml  <-- (Manually created)

Step 3: Configure dbt Profiles

3.1 Locate the profiles.yml file

The profiles.yml file is usually found in:

  • Mac/Linux: ~/.dbt/profiles.yml
  • Windows: C:\Users\YOUR_USER\.dbt\profiles.yml

If it doesn’t exist, create it manually.

3.2 Configure Database Connection

Modify profiles.yml based on your database.

PostgreSQL Example:

my_dbt_project:
  target: dev
  outputs:
    dev:
      type: postgres
      host: my-database-host
      user: my-user
      password: my-password
      port: 5432
      dbname: my_database
      schema: public

Understanding dbt Project Structure and Why It’s Needed

When you run dbt init my_dbt_project, dbt automatically creates a structured project folder. This structure is designed to organize your SQL models, configurations, and transformations efficiently.

Let’s break down each folder and file and understand why dbt uses this structure.

1. dbt_project.yml (Core Configuration File)

This is the main configuration file for the dbt project. It contains:

  • Project name
  • Model configurations (e.g., schema, materializations)
  • File paths for models, seeds, and tests

dbt_project.yml tells dbt where to find models, how to execute them, and what settings to apply.

2. models/ (SQL Transformations)

This folder contains SQL files, where you define how to transform raw data into cleaned and structured tables.

Why do you need it?
Instead of writing raw SQL queries every time, dbt allows you to define models as reusable SQL scripts.

When you run “dbt run“, dbt compiles and executes this SQL against your database.

3. seeds/ (Static CSV Data for Reference)

Seeds are CSV files stored in your dbt project that can be loaded into the database.

Why do you need it?
If you have small reference datasets (like country codes, discount rates, and business rules), you can store them in CSV files instead of an external table.

To load this into the database, run:

dbt seed

all important dbt seed commands :

#!/bin/bash

# Load all seed data into the database
dbt seed

# Load a specific seed file 
dbt seed --select your_seed_name

# Re-run all seeds, even if they already exist
dbt seed --full-refresh

# Re-run a specific seed with full refresh (replace 'country_codes' with your seed name)
dbt seed --select your_seed_name --full-refresh

# Show logs while running seeds
dbt seed --debug

4. tests/ (Data Validation)

This folder contains SQL test cases to validate your data.

Why do you need it?
Data quality is critical! dbt tests help catch missing values, duplicates, or inconsistent data.

To run tests:

dbt test

all important dbt test commands :



# Run all tests (models, seeds, sources)
dbt test

# Run tests only on a specific test file 
dbt test --select test_file_name

5. macros/ (Reusable SQL Functions)

Macros are custom SQL functions that help avoid repeating SQL code.

Why do you need it?
Instead of writing the same logic in multiple places, you can create a macro (similar to a function) and use it across different models.

6. snapshots/ (Historical Tracking)

Snapshots help track historical changes in a table over time.

Why do you need it?
If your source table changes over time (e.g., customer status updates), a snapshot records each change.

To capture historical data, run:

dbt snapshot

dbt Snapshot Strategies

dbt supports two snapshot strategies for tracking data changes over time:

StrategyHow It WorksWhen to Use
timestampTracks changes using an updated_at column. A new version is recorded only if the timestamp changes.Best when your table has a reliable updated_at column.
checkTracks changes based on specified columns (check_cols). A new version is recorded if any of these column values change.Timestamp
  • Timestamp-based Snapshots
strategy='timestamp',
updated_at='updated_at'
  • Compares the updated_at column in each row.
  • If the timestamp changes, dbt records a new version of the row.
  • If the timestamp does not change, dbt does nothing.

Example:

{% snapshot customer_snapshot %}
{{
    config(
        target_schema='snapshots',
        unique_key='customer_id',
        strategy='timestamp',
        updated_at='updated_at'
    )
}}
SELECT * FROM my_schema.customers
{% endsnapshot %}
  • Column-based Snapshots
strategy='check',
check_cols=['column1', 'column2']
  • Compares the specified columns in check_cols.
  • If any value in check_cols changes, dbt records a new version of the row.
  • If no changes in check_cols, dbt does nothing.

Example:

{% snapshot order_snapshot %}
{{
    config(
        target_schema='snapshots',
        unique_key='order_id',
        strategy='check',
        check_cols=['order_status', 'total_amount']
    )
}}
SELECT * FROM my_schema.orders
{% endsnapshot %}

7. profiles.yml (Database Connection Settings)

Unlike older versions of dbt, profiles.yml is not created inside your project. Instead, it’s stored in:

  • Mac/Linux: ~/.dbt/profiles.yml
  • Windows: %USERPROFILE%\.dbt\profiles.yml
  • Stores database connection settings (hostname, username, password, schema, etc.).
  • Tells dbt which database and schema to use.
  • Supports multiple environments (e.g., dev, prod).

To test your database connection:

dbt debug

8. logs/ (Execution Logs & Debugging)

This folder contains log files generated every time you run a dbt command (dbt run, dbt test, etc.).

  • Stores logs of dbt runs (dbt.log, debug.log, run_results.json).
  • Helps with troubleshooting errors and debugging issues.

9. target/ (Compiled SQL & Run Results)

This folder is created when you run dbt (dbt run, dbt compile, dbt test). It stores:

  • Compiled SQL before execution (Lets you see the exact SQL dbt runs.).
  • Run results (Stores results of previous runs, such as success, failure, and execution time).
  • Documentation files (Contains metadata used for documentation. dbt docs generate output).

What’s Inside the target/ Folder?

File/FolderDescription
compiledStores compiled SQL queries before execution.
run/Stores the executed SQL queries used for building models.
manifest.jsonMetadata file that contains project details, dependencies, and relationships.Useful for dbt documentation and debugging.
catalog.jsonProvides metadata about database tables, columns, and their types.Helps with documentation and lineage tracking.
run_results.jsonStores information about the last dbt run, including model statuses (success, failure, error).
logs/Contains logs for debugging errors.

10. analysis/ (Exploratory SQL Queries)

This is a new addition in recent dbt versions. It’s for ad-hoc queries that don’t become dbt models.

  • Stores one-off SQL analyses that don’t need to be production models.
  • Keeps exploratory queries separate from core dbt transformations.
  • Allows teams to collaborate on SQL research before turning queries into models.

Step 4: Define and Run Models

To execute the model:

dbt run

To view the generated SQL:

dbt compile

Step 5: Generate Documentation

To generate and view dbt documentation:

dbt docs generate
dbt docs serve --port 8081

This will launch an interactive UI with your project’s documentation.

Conclusion

In this article, we’ve covered everything you need to get started with dbt Core and how to navigate its project structure efficiently. From installing dbt to understanding its fundamental components, you now have a solid foundation to begin leveraging dbt’s capabilities for your data transformation tasks.

By setting up dbt and familiarizing yourself with essential concepts like models, seeds, tests, snapshots, and analyses, you can enhance your workflow and build reliable, maintainable data pipelines. Understanding the dbt project structure—which includes files like dbt_project.yml, models/, tests/, analyses/, and more—ensures that you work within an organized framework that allows for scalability, collaboration, and version control.

Key takeaways:

  1. dbt Installation: You’ve learned how to install dbt Core, configure a virtual environment, and set up a project, preparing you for a seamless data transformation experience.
  2. Project Structure: We’ve broken down the different components of a dbt project, including essential folders like models/ for SQL transformations, tests/ for validation, and snapshots/ for tracking historical changes.
  3. Connecting Databases: You now know how to integrate dbt with databases like PostgreSQL or BigQuery, enabling smooth data transformations in your environment.
  4. Testing and Data Integrity: You’ve learned how to add tests, ensuring the reliability and accuracy of your datasets, and how to configure seeds and snapshots for consistent, historical data tracking.

With this knowledge in hand, you’re ready to confidently implement dbt in your projects. Whether you’re a data engineer, data analyst, or data scientist, dbt offers a powerful framework to streamline and automate your ETL pipeline, ensuring your data is consistently up-to-date, accurate, and ready for analysis. Happy modeling and transforming!

for more articles about dbt here

More From Author

DBT Tutorial for Beginners: What is DBT and Why Use It?