Data Lake Adapters

A data lake is a centralized repository designed to store, process, and secure a vast amount of structured, semistructured, and unstructured data. Data lakes provide organizations with the capability to store diverse types of data. One of the key features of a data lake is its scalability, which allows organizations to store and manage massive volumes of data. Data lakes often use distributed computing frameworks to process and analyze data in parallel, enabling faster insights and analytics. Data lakes are suitable for various use cases such as big data analytics, machine learning, and data exploration.

Currently the Lazsa Platform supports Amazon S3 and Snowflake data lakes. You can extend the functionality to support additional data lakes.

APIs and Interfaces

You can use the following APIs for extending the functionality of data lake as required.

getRowsFromFile ()
  • This method takes RowsRequestBean as a parameter

  • It reads the file records

  • Return RowsResponseBean with result dataset

GetRowsFromAllPathFiles ()
  • This method takes list of RowsRequestBean as a parameter

  • It reads all the file path data from data lake

  • It returns list of RowsResponseBean

GetRowsFromMultisheetFile ()
  • This method takes RowsRequestBeans as a request parameter

  • It reads all the mutlisheet rows from files such as excel, parquet etc

  • It returns RowsResponseBean as a response

GetFileSchema ()
  • This method takes RowsRequestBean as request parameter

  • Read schema for files such as, csv, parquet, excel, json etc.

  • Returns list of FileSchemaBean

GetFileAndFolderList ()
  • This method takes ListRequestBean and fileType as a paramter

  • Lists all the files and folders from data lake based on selected file type

  • Returns ListResponseBean with files and folders records result set

GetDataQualiltyOutput ()
  • This method takes RowsRequestBean as request parameter

  • Reads Data Quality output from very first file, as it can have n number of records

  • Returns RowsResponseBean with minimal data quality output records

GetFolderList ()
  • This method takes ListRequestBean as a paramter

  • Lists all folders from data lake

  • Returns ListResponseBean with files records result set

GetDeltaLogVersionList ()
  • This method takes ListRequestBean and fileType as a paramter

  • Lists all the delta log versions

  • Returns DeltaLogVersion list

GetDeltaLogVersionDetails ()
  • This method takes ListRequestBean

  • It retrieves delta log version details

  • Returns DeltaLogVersionDetails bean as a response

GetDeltaLogSchemaDetails ()
  • This method takes DeltaSchemaRequestBean as a paramter

  • Retrieves delta schema

  • Returns DeltaSchemaResponseBean

Related Topics Link IconRecommended Topics

What's next?Data Quality Adapters