Data Crawler Adapters
Data Crawlers crawl data from various types of sources and create a data catalog from it. A data crawler fetches data as well as metadata providing wider visibility and deeper access to the data. Data crawlers are used for the following:
-
Discovery
-
Viewing schema
-
Viewing sample data preview
-
Creating data catalog
-
Creating data lineage
Implementation
The primary function of a crawler is to fetch data and metadata.
Identifying the technology
The first step is to identify the category of the technology like whether the source data belongs to an RDBMS. The logic, set of classes, and interfaces depend on the category. Currently the Calibo Accelerate platform supports the following categories for data crawling:
-
RDBMS
-
Oracle
-
MySQL
-
Postgresql
-
Snowflake
-
MSSQL
-
-
CSV
-
REST API
-
FTP/SFTP
-
MS Excel
-
AWS S3
-
Parquet
Apart from the categories that are currently supported, you can also integrate a new RDBMS type. The section below provides the required information for integrating a new RDBMS like MariaDB.
What's next? Data Lake Adapters |