Data Lake Adapters

A data lake is a centralized repository designed to store, process, and secure a vast amount of structured, semistructured, and unstructured data. Data lakes provide organizations with the capability to store diverse types of data. One of the key features of a data lake is its scalability, which allows organizations to store and manage massive volumes of data. Data lakes often use distributed computing frameworks to process and analyze data in parallel, enabling faster insights and analytics. Data lakes are suitable for various use cases such as big data analytics, machine learning, and data exploration.

Currently, the Calibo Accelerate platform supports Amazon S3 and Snowflake data lakes. You can extend the functionality to support additional data lakes.

APIs and Interfaces

You can use the following APIs for extending the functionality of data lake as required.

getRowsFromFile ()	This method takes RowsRequestBean as a parameter It reads the file records Return RowsResponseBean with result dataset
GetRowsFromAllPathFiles ()	This method takes list of RowsRequestBean as a parameter It reads all the file path data from data lake It returns list of RowsResponseBean
GetRowsFromMultisheetFile ()	This method takes RowsRequestBeans as a request parameter It reads all the mutlisheet rows from files such as excel, parquet etc It returns RowsResponseBean as a response
GetFileSchema ()	This method takes RowsRequestBean as request parameter Read schema for files such as, csv, parquet, excel, json etc. Returns list of FileSchemaBean
GetFileAndFolderList ()	This method takes ListRequestBean and fileType as a paramter Lists all the files and folders from data lake based on selected file type Returns ListResponseBean with files and folders records result set
GetDataQualiltyOutput ()	This method takes RowsRequestBean as request parameter Reads Data Quality output from very first file, as it can have n number of records Returns RowsResponseBean with minimal data quality output records
GetFolderList ()	This method takes ListRequestBean as a paramter Lists all folders from data lake Returns ListResponseBean with files records result set
GetDeltaLogVersionList ()	This method takes ListRequestBean and fileType as a paramter Lists all the delta log versions Returns DeltaLogVersion list
GetDeltaLogVersionDetails ()	This method takes ListRequestBean It retrieves delta log version details Returns DeltaLogVersionDetails bean as a response
GetDeltaLogSchemaDetails ()	This method takes DeltaSchemaRequestBean as a paramter Retrieves delta schema Returns DeltaSchemaResponseBean