Data Transformation using dbt Cloud
dbt Cloud is a hosted enterprise-ready platform that simplifies managing and running transformation pipelines, providing visibility and governance around them. The integration of dbt Cloud with source control repositories simplifies version control of dbt code. This makes collaboration between the development team easy while CI/CD automation enables testing the dbt code before pushing it to production.
Calibo Accelerate Platform supports data transformation using dbt Cloud. You can import dbt projects and either run existing jobs or create and run new jobs in dbt Cloud.
See Configure dbt Cloud Connection Details
To create a dbt Cloud custom transformation job
-
Sign in to the Calibo Accelerate platform and navigate to Products.
-
Select a product and feature. Click the Develop stage of the feature and navigate to Data Pipeline Studio.
-
Add a data transformation stage and add a dbt Cloud node to the stage.
-
Click the dbt Cloud node. You can either run an existing job or create a new one.
-
To run an existing job, perform step 6. To create a new job, perform step 7.
-
To run an existing job, do the following:
Job Name
-
Existing Projects - Select the project that contains the job that you want to run.
-
Existing jobs - Select the job that you want to run.
-
Node Rerun Attempts - Specify the number of times the pipeline rerun is attempted on this node of the pipeline, in case of failure. The default setting is done at the pipeline level. You can change the rerun attempts by selecting 1, 2, or 3.
Database Connections
The warehouse name and the database name are auto populated and are non-editable.
Repository Name
The repository name associated with the existing job is displayed along with the repository path. Both the fields are auto populated and are non-editable.
Execution Settings
In the Commands section, do the following:
-
Run Source Freshness - Enabling this setting ensures that the data in your source tables is refreshed or updated.
-
Click the Delete icon to delete the commands that you want to remove for this job.
-
Rearrange the sequence of commands by dragging the handler and dropping it.
-
+ Add Command - Click this option to add new commands to the existing job.
-
Generate Docs on Run - Enable this option to automatically generate documentation when a command is executed. The documentation includes schema structure, test results, and versioning information.
Advanced Settings
-
Environment Variables - Development. - For an existing job this field is auto populated and is non-editable.
-
Target Name - Change the target if the logic changes for a selected target.
-
Run Timeout - This is the maximum number of seconds for which the run will be executed before being canceled. If it is set to 0 (default setting), the job run is canceled after running for 24 hours.
-
dbt Version - The default setting is Latest. Change it as required.
-
Threads - This is the number of parallel threads that dbt can use to run multiple models simultaneously, while running a job.
Click Complete.
-
-
To create a new job, do the following:
Job Name
-
Existing Projects - Select a project to which you want to add the new dbt job.
-
Job Name - Provide a name for the job.
-
Description - Provide a description for the job.
-
Environment - Select the environment in which you want to run the job. The environments are created in each dbt project.
-
Node Rerun Attempts - Specify the number of times the pipeline rerun is attempted on this node of the pipeline, in case of failure. The default setting is done at the pipeline level. You can change the rerun attempts by selecting 1, 2, or 3.
Database Connections
-
Warehouse Name - This field is auto populated and non-editable.
-
Database Name - This field is auto populated and non-editable.
Repository Name
The repository name associated with the selected project is displayed along with the repository path. Both the fields are auto populated and are non-editable.
Execution Settings
In the Commands section, do the following:
-
Run Source Freshness - Enabling this setting ensures that the data in your source tables is refreshed or updated.
-
+ Add Command - Click this option to add new commands to the existing job.
-
Generate Docs on Run - Enable this option to automatically generate documentation when a command is executed. The documentation includes schema structure, test results, and versioning information.
Advanced Settings
-
Environment Variables - Development. - For an existing job this field is auto populated and is non-editable.
-
Target Name - Change the target if the logic changes for a selected target.
-
Run Timeout - Maximum number of seconds for which the run will be executed before being canceled. If it is set to 0 (default setting), the job run is concealed after running for 24 hours.
-
dbt Version - The default setting is Latest. Change it as required.
-
Threads - This is the number of parallel threads that dbt can use to run multiple models simultaneously, while running a job.
Click Complete.
-
-
To run the dbt job, do one of the following:
-
Publish the pipeline. Click Run Pipeline on the home page of DPS.
-
Publish the pipeline. Click the dbt node and click Start in the side drawer.
What's next? Snowflake Custom Transformation Job |