Data Crawler Adapters

Data Crawlers crawl data from various types of sources and create a data catalog from it. A data crawler fetches data as well as metadata providing wider visibility and deeper access to the data. Data crawlers are used for the following:

  • Discovery

  • Viewing schema

  • Viewing sample data preview

  • Creating data catalog

  • Creating data lineage

Implementation

The primary function of a crawler is to fetch data and metadata.

Identifying the technology

The first step is to identify the category of the technology like whether the source data belongs to an RDBMS. The logic, set of classes, and interfaces depend on the category. Currently the Calibo Accelerate platform supports the following categories for data crawling:

  • RDBMS

    • Oracle

    • MySQL

    • Postgresql

    • Snowflake

    • MSSQL

  • CSV

  • REST API

  • FTP/SFTP

  • MS Excel

  • AWS S3

  • Parquet

Apart from the categories that are currently supported, you can also integrate a new RDBMS type. The section below provides the required information for integrating a new RDBMS like MariaDB.

 

Related Topics Link IconRecommended Topics

What's next? Data Lake Adapters