Data Crawler and Data Catalog

The Lazsa Data Pipeline Studio (DPS) lets you crawl data from various types of sources and create a data catalog from it, which can be used in the data source stage of a data pipeline. A data crawler fetches metadata along with the data. Creating a data catalog from the data crawler provides wider visibility and deeper access to the data. After you create a data crawler and fetch the data you can filter the data before creating a data catalog.

Data Crawler supports the following types of data sources:

  • AWS Redshift

  • Amazon S3

  • CSV

  • FTP

  • SFTP

  • REST API

  • MS Excel

  • Microsoft SQL Server

  • MySQL

  • Oracle database

  • Postgre SQL

  • Snowflake

How do I create a data crawler?

In the Lazsa Platform you can create a data crawler in Data Pipeline Studio which is at the product level or you can create one that can be used across the platform.

To create a data crawler at the platform level

  1. Sign in to the Lazsa Platform and click Configuration in the left navigation pane.

  2. On the Platform Setup screen, click Data Pipeline Studio.

  3. Click + New Crawler.

  4. Provide a name for the crawler and click Next.

  5. Select a data source from the dropdown and click Save and Crawl. On the crawler screen, you can view the data that was fetched by the crawler.

Can I filter the data from the data crawler before I create a data catalog?

Once you crawl the data from your data source, you can apply certain conditions to the required columns of the tables in order to filter the data according to your use case. For example, if you have the sales data for a product and you want to filter customers based on age, you can use a condition to do that.

  1. On the Data Crawlers tab of Data Pipeline Studio, click the data crawler for which you want to filter data.

  2. On the data crawler screen, select a table. On the list of columns, notice the pencil icon adjacent to the column name. Click the icon.

    Edit column of Data Crawler

  3. On the Column Details side drawer, under Constraint, select the following:

    1. Condition - Select > (greater than).

    2. Value - Enter 25.

Data Crawler Filter Constraint

  1. Click Save. The condition and value are added. Close the side drawer.

  2. Click the preview icon of the table to view the filtered data. The preview shows data as per the applied condition.

    Note:

    Currently, the preview opion is only available for RDBMS data crawlers - MSSQL, MySQL, PostgreSQL, Oracle, and Snowflake.

    Data Crawler Filter Preview

How do I create a data catalog from a data crawler?

  1. On the data crawler screen, in the field Add to Data Ingestion Catalog provide a name for the catalog and click Add. The catalog that you created is listed in the Data Ingestion Catalogs tab. You can also view the information about which data crawler is associated with the data catalog.

  2. If the data crawler associated with the data catalog is updated, then you can update the data catalog or you can create another version of the data catalog.

 

Related Topics Link IconRecommended Topics What's next? Data Sources