Data Ingestion of Unstructured Data into Unity Catalog

Calibo Accelerate supports the data ingestion of unstructured data into Unity Catalog, using Databricks for data integration. Using managed volumes for storing unstructured data brings benefits like scalability, performance optimization, and simplified management.

The data ingestion pipeline for unstructured data has the following nodes:

SFTP/FTP (data source) > Databricks (data integration) > Unity Catalog (data lake)

Data ingestion of unstructured data

To create a data ingestion job for unstructured data

  1. Log on to the Calibo Accelerate platform and navigate to Products.

  2. Select a product and feature. Click the Develop stage of the feature and navigate to Data Pipeline Studio.

  3. Create a pipeline with the following nodes:

    • Data Source - FTP/SFTP

    • Data Integration - Databricks (Unity Catalog enabled)

    • Data Lake - Unity Catalog (Data Lakehouse)

  4. Configure the data source and data lake nodes and connect them to the data integration node.

Complete the following steps to create a data ingestion job for unstructured data:

 

Related Topics Link IconRecommended Topics What's next? Data Ingestion from Data Catalogs to Unity Catalog