Data Crawler Adapters

Data Crawlers crawl data from various types of sources and create a data catalog from it. A data crawler fetches data as well as metadata providing wider visibility and deeper access to the data. Data crawlers are used for the following:

Discovery
Viewing schema
Viewing sample data preview
Creating data catalog
Creating data lineage

Implementation

The primary function of a crawler is to fetch data and metadata.

Identifying the technology

The first step is to identify the category of the technology like whether the source data belongs to an RDBMS. The logic, set of classes, and interfaces depend on the category. Currently the Calibo Accelerate platform supports the following categories for data crawling:

RDBMS
- Oracle
- MySQL
- Postgresql
- Snowflake
- MSSQL

CSV
REST API
FTP/SFTP
MS Excel
AWS S3
Parquet

Apart from the categories that are currently supported, you can also integrate a new RDBMS type. The section below provides the required information for integrating a new RDBMS like MariaDB.