Data Pipeline Stages

A data pipeline in the Calibo Accelerate platform may consist of the following stages:

Data Sources
The Calibo Accelerate platform supports a variety of data sources from which you can fetch data into your pipeline, including:
- Salesforce
- ServiceNow
- CSV
- Calibo Data Ingestion Catalog
- MS Excel
- Parquet
- FTP
- SFTP
- Amazon S3
- REST API
- Microsoft SQL Server (RDBMS)
- MySQL (RDBMS)
- Oracle (RDBMS)
- PostgreSQL (RDBMS)
- Snowflake (RDBMS)
- Azure Data Lake
- Amazon Kinesis Streams
For a detailed list of supported data sources and configuration details, see Data Sources.
Data Integration
Data integration tools help you to extract data from various sources and ingest it into a data lake in the required format . The Calibo Accelerate platform currently supports the following data integration tools:
- Databricks
- Snowflake
- Amazon AppFlow
- Snowpark
For more details on data integration tools, see Data Integration.
Data Lake
A data lake is used to store raw data from various sources as well as processed data after transformations or data quality improvements. The Calibo Accelerate platform currently supports the following data lakes:
- Amazon S3
- Snowflake
- Azure Data Lake
- Databricks Unity Catalog
For more details, see Data Lake.
Data Transformation

Data transformation is the process of converting data into a usable format by aggregating, grouping, or combining it to enrich the dataset and make it suitable for analytics and reporting. The Calibo Accelerate platform currently supports the following data transformation tools:
- Databricks
- Snowflake
- Snowpark
- dbt
For more details, see Data Transformation.
Data Quality
Maintaining high data quality is critical for ensuring the accuracy, completeness, and reliability of the data. The data quality stage involves using data profiler, data analyzer, and data issue resolver to ensure your data meets quality standards. The Calibo Accelerate platform currently supports the following tools for improving data quality:
- Databricks
- Snowflake
- Databricks Unity Catalog
For more details, see Data Quality.
Data Analytics
Data analytics involves applying algorithms and machine learning models to data to identify patterns and generate predictions based on these patterns. The Calibo Accelerate platform currently supports:
Python with JupyterLab
For more details, see Data Analytics.
Data Visualization
Data visualization converts complex datasets into clear, actionable insights through the use of charts, graphs, maps, and dashboards and other visual elements. The Calibo Accelerate platform currently supports:
- QlikSense as a data visualization tool. Additionally, you can create custom dashboards or charts by using Angular and Java.
For more details, see Data Visualization.
Analytical Data Store
An analytical data store is used for efficient processing and analysis of large datasets, including structured, unstructured, and semi-structured data. The Calibo Accelerate platform supports the following analytical data stores:
- Amazon RDS for MySQL
- Amazon RDS for PostgreSQL
- Microsoft SQL Server
- MySQL
- Oracle
- PostgreSQL
- Snowflake
For more details, see Analytical Data Store.
Workflow
Workflow is a native capability of the Calibo Accelerate platform which is used to initiate an approval process. In Data Pipeline Studio, workflows manage the approval of data residing in a data lake before it is used for further processing. Currently, workflows are supported for Amazon S3 data lakes.

To use workflows in Data Pipeline Studio, you must:
- Create a Workflow Template and define the type of workflow, stages, and approvers.
- Apply the template in the workflow stage of Data Pipeline Studio.
  
  For more information, see Workflows in Data Pipeline Studio.