Data Issue Resolver using Unity Catalog

In the Data Issue Resolver stage, you enhance the quality of data in various ways by handling duplicate data, handling missing data, outliers, specifying the partitioning order, handling case sensitivity, string operations and so on.

Prerequisites

You must complete the following prerequisites before creating a data issue resolver job:

  • The data quality nodes have specific requirements as far as the Databricks Runtime version of the cluster and access mode is concerned. Following are the requirements for Unity Catalog-enabled Databricks used as a data issue resolver node in the data pipeline:

    Data Quality Node Databricks Cluster Runtime Version Access Mode
    Data Issue Resolver 14.3 LTS Dedicated/Standard
  • Access to a Databricks Unity Catalog node which will be used as a data lake in the data ingestion pipeline.

Creating a data issue resolver job

  1. In the data quality stage, add a Data Issue Resolver node. Connect to and from the data lake.

  2. Click the issue resolver node and click Create Job to create an issue resolver job.

    Unity Catalog Data Issue Resolver job

  3. Complete the following steps to create the job:

  4. With this the job creation is done. You can run the job in multiple ways:

    DQ Issue Resolver Create Job

    • Click the Data Issue Resolver node and click Start to initiate the job run.

    • Publish the pipeline and then click Run Pipeline.

  5. On completion of the job, click the Issue Resolver Result tab and then click View Resolver Results.

    DQ Issue Resolver Result tab

  6. View the output of the Issue Resolver job. Click to download and save the results of Issue Resolver to a CSV file.

    DQ Issue Resolver Output

Related Topics Link IconRecommended Topics What's next? Snowflake Custom Transformation Job