Data Pipeline Stages

A data pipeline in the Lazsa Platform may consist of the following stages:

  • Data Sources
    The Lazsa Platform supports a variety of data sources from which you can fetch data into your pipeline, including:

    • CSV

    • REST API

    • FTP/ SFTP

    • Relational Database Management Systems (RDBMS)

    • Enterprise systems like ServiceNow and Salesforce

    • MS Excel

    • Parquet

    • Amazon S3

    • Amazon Kinesis Streams

    For a detailed list of supported data sources, see Data Sources .

  • Data Integration
    Data integration tools help you to extract data from various sources and ingest it into a data lake in the required format . The Lazsa Platform currently supports the following data integration tools:
    • Databricks
    • Snowflake
    • Amazon AppFlow

    For more details on data integration tools, see Data Integration.

  • Data Lake
    A data lake is used to store raw data from various sources as well as processed data after transformations or data quality improvements. The Lazsa Platform currently supports the following data lakes:
    • Amazon S3
    • Snowflake

    For more details, see Data Lake.

  • Data Transformation

    Data transformation is the process of converting data into a usable format by aggregating, grouping, or combining it to enrich the dataset and make it suitable for analytics and reporting. The Lazsa Platform currently supports the following data transformation tools:
    • Databricks
    • Snowflake

    For more details, see Data Transformation.

  • Data Quality
    Maintaining high data quality is critical for ensuring the accuracy, completeness, and reliability of the data. The data quality stage involves using data profiler, data analyzer, and data issue resolver to ensure your data meets quality standards. The Lazsa Platform currently supports the following tools for improving data quality:
    • Databricks
    • Snowflake

    For more details, see Data Quality.

  • Data Analytics
    Data analytics involves applying algorithms and machine learning models to data to identify patterns and generate predictions based on these patterns. The Lazsa Platform currently supports Python with JupyterLab for data analytics.

    For more details, see Data Analytics.

  • Data Visualization
    Data visualization converts complex datasets into clear, actionable insights through the use of charts, graphs, maps, and dashboards and other visual elements. The Lazsa Platform currently supports QlikSense as a data visualization tool. Additionally, you can create custom dashboards or charts by using Angular and Java.

    For more details, see Data Visualization.

  • Analytical Data Store
    An analytical data store is used for efficient processing and analysis of large datasets, including structured, unstructured, and semi-structured data. The Lazsa Platform supports the following analytical data stores:

    • Amazon RDS for MySQL

    • Amazon RDS for PostgreSQL

    • Microsoft SQL Server

    • MySQL

    • Oracle

    • PostgreSQL

    • Snowflake

    For more details, see Analytical Data Store.

  • Workflow
    Workflow is a native capability of the Lazsa Platform which is used to initiate an approval process. In Data Pipeline Studio, workflows manage the approval of data residing in a data lake before it is used for further processing. Currently, workflows are supported for Amazon S3 data lakes.

    To use workflows in Data Pipeline Studio, you must:

    • Create a Workflow Template and define the type of workflow, stages, and approvers.

    • Apply the template in the workflow stage of Data Pipeline Studio.

      For more information, see Workflows in Data Pipeline Studio.

 

Related Topics Link IconRecommended Topics

What's next? Create a Data Pipeline