Custom Integration with target as Amazon S3 Data Lake

The Calibo Accelerate platform now provides the option to write custom code in order to read data from a supported data source and ingest it into an Amazon S3 data lake. Apart from templatized integration jobs, you can now perform complex operations on data, by creating custom integration jobs.

To create a Databricks custom integration job

  1. Sign in to theCalibo Accelerate platform and navigate to Products.

  2. Select a product and feature. Click the Develop stage of the feature, you are navigated to Data Pipeline Studio.

  3. Create a pipeline with the following nodes:

    Note: The stages and technologies used in this pipeline are merely for the sake of example.

    • Data Source - Amazon S3

    • Data Integration- Databricks

    • Data Lake - Amazon S3

    Custom data integration piepline using Databricks

  4. Configure the Amazon S3 nodes in the data source and data lake stages.

  5. In the data integration stage click the Databricks node, and select Create Custom Job - to create a custom integration job.

    Select job type for custom integration job

  6. Complete the following steps to create the Databricks custom integration job:

To replace the placeholder custom code

After you have created the custom integration job, click the Databricks Notebook icon. This navigates you to the custom integration job in the Databricks UI. Replace the code with your custom code and then run the job.

Custom Integration Databricks Notebook Link

Related Topics Link IconRecommended Topics What's next? Databricks Templatized Data Integration Jobs