Configure AWS Glue Data Catalog Connection Details
AWS Glue Data Catalog is a repository that stores the metadata of the assets of your data source. It organizes the metadata into databases and tables, providing a logical structure for managing the metadata. This structure also provides data access control at both the table level and database level by using AWS IAM policies.
Lazsa leverages AWS Glue Data Catalog to maintain comprehensive metadata for all data assets within your pipeline. This metadata includes essential information such as column names, data types, and partitioning keys, enabling efficient data searching, querying, and auditing.
After you save your AWS Glue Data Catalog connection details in the Lazsa Platform, you can start using Glue Data Catalog as a centralized metadata store in your data integration and data transformation jobs.
To configure the connection details of AWS Glue Data Catalog, do the following:
-
Sign in to the Lazsa Platform and click Configuration in the left navigation pane.
-
On the Platform Setup screen, on the Cloud Platform, Tools & Technologies tile, click Configure.
-
On the Cloud Platform, Tools & Technologies screen, in the Databases and Data Warehouses section, click Configure.
(After you save your first connection details in this section, you see the Modify button here.) -
On the Databases and Data Warehouses screen, click the AWS Glue Data Catalog logo .
-
On the Glue Catalog screen, do the following:
In the Details section, provide the following information:
Field Description Name Give a unique name to your AWS Glue Data Catalog configuration. This name is used to save and identify your AWS Glue Data Catalog connection details in the Lazsa Platform. Description Provide a brief description that helps you identify the purpose or context of this AWS Glue Data Catalog configuration.
In the Configuration section, depending on how you want the Lazsa Platform to retrieve and use the credentials to connect to your AWS Glue Data Catalog, do one of the following:
Field Description Connect using Lazsa Orchestrator Agent Enable Lazsa Orchestrator Agent
Turn on this toggle to use Lazsa Orchestrator Agent to programmatically resolve AWS Glue Data Catalog secrets stored in your secrets management tool within your private network and to connect to Glue Data Catalog. These secrets include the external ID and the ARN of the IAM role that allows the Lazsa Platform to access your Glue Data Catalog.
Select the Orchestrator Agent
In the Lazsa Orchestrator Agent dropdown list, all your configured agents deployed in EKS clusters are displayed. Select the one you want to use to connect to Glue Data Catalog.
AWS Secrets Manager that the selected Orchestrator Agent is authorized to access for retrieving secrets is auto-selected. Specify the following details:
Secret Name: Provide the name of the secret in AWS Secrets Manager where you store your AWS Glue Data Catalog secrets.
AWS Account ID: The unique identifier of the AWS account in which the selected Orchestrator Agent is installed is auto-populated.
Region: Select the AWS region where the selected Orchestrator Agent is installed.
Creating an IAM Role to Access Glue Data Catalog
You must create an IAM role to allow the Lazsa Platform to access Glue Data Catalog from your AWS account. You can create this IAM role manually or use the CloudFormation template (CFT) provided by Calibo on this screen.In the role's trust relationship, you must add a
Condition
element and provide an external ID to validate that the request is from the authorized entity with the correct external ID.If you manually create the IAM role, copy the following permissions policy and attach it to the role:
Copy{
"Version": "2012-10-17",
"Statement": [
{
"Condition": {
"StringEquals": {
"aws:RequestedRegion": "<Region where Lazsa Platform is installed>"
}
},
"Action": [
"glue:GetDatabase",
"glue:CreateDatabase",
"glue:GetDatabases",
"glue:CreateTable",
"glue:GetTables",
"glue:GetTable"
],
"Resource": "*",
"Effect": "Allow"
}
]
}To create the IAM role by using the Calibo-provided CFT, download the CFT and upload it when creating a stack in CloudFormation. The external ID is already mentioned in the template. In this case, the IAM policy is attached automatically to the role.
Note:
The Download CFT button is enabled only after you specify the AWS region where the selected Orchestrator Agent is installed.
After your IAM role is created, store the external ID and Glue IAM role ARN in the AWS Secrets Manager that you have selected earlier. Then provide the following values.
External ID Key: Provide the key for the external ID stored in AWS Secrets Manager to enable the Lazsa Orchestrator Agent to resolve secrets and access your AWS Glue Data Catalog.
Glue IAM Role Key: Provide the key for the IAM Role ARN stored in AWS Secrets Manager.
Select Secret Manager - Lazsa
With this option, your AWS Glue Data Catalog connection secrets are securely stored in the Lazsa-managed secrets store.
Select Lazsa and do the following: AWS Account
From the list of your AWS accounts configured in the Lazsa Platform, select the account from which you want to access Glue Data Catalog.AWS Account ID
The account ID of the selected AWS account is auto-populated.Region
The region of your AWS account is auto-populated.External ID
The external ID generated by Calibo is auto-populated.Cross Account Role ARN
To enter the role ARN in this field, you must first create an IAM role to allow the Lazsa Platform to access Glue Data Catalog from your AWS account.
You can create this IAM role manually or use the CloudFormation template (CFT) provided by Calibo on this screen.In the role's trust relationship, you must add a
Condition
element and provide an external ID to validate that the request is from the authorized entity with the correct external ID.If you manually create the IAM role, copy the following permissions policy and attach it to the role:
Copy{
"Version": "2012-10-17",
"Statement": [
{
"Condition": {
"StringEquals": {
"aws:RequestedRegion": "<Region where Lazsa Platform is installed>"
}
},
"Action": [
"glue:GetDatabase",
"glue:CreateDatabase",
"glue:GetDatabases",
"glue:CreateTable",
"glue:GetTables",
"glue:GetTable"
],
"Resource": "*",
"Effect": "Allow"
}
]
}To create the IAM role by using the Calibo-provided CFT, download the CFT and upload it when creating a stack in CloudFormation. The external ID is already mentioned in the template. In this case, the IAM policy is attached automatically to the role.
After your IAM role is created, provide the role ARN.
- AWS Secrets Manager
Select AWS Secrets Manager and do the following.Secrets Management Tool
In this dropdown list, the AWS Secrets Manager configurations that you save and activate in the Secret Management section on the Cloud Platform, Tools & Technologies screen are listed for selection. Select the AWS Secrets Manager where you have stored your Glue Data Catalog secrets.Secret Name
Provide the name of the secret, for the Lazsa Platform to retrieve the secrets.AWS Account
From the list of your AWS accounts configured in the Lazsa Platform, select the account from which you want to access Glue Data Catalog.AWS Account ID: The account ID of the selected AWS account is auto-populated.
Region
The region of the selected AWS account is auto-populated.Creating an IAM Role to Access Glue Data Catalog
You must create an IAM role to allow the Lazsa Platform to access Glue Data Catalog from your AWS account. You can create this IAM role manually or use the CloudFormation template (CFT) provided by Calibo on this screen.
In the role's trust relationship, you must add a
Condition
element and provide an external ID to validate that the request is from the authorized entity with the correct external ID.If you manually create the IAM role, copy the following permissions policy and attach it to the role:
Copy{
"Version": "2012-10-17",
"Statement": [
{
"Condition": {
"StringEquals": {
"aws:RequestedRegion": "<Region where Lazsa Platform is installed>"
}
},
"Action": [
"glue:GetDatabase",
"glue:CreateDatabase",
"glue:GetDatabases",
"glue:CreateTable",
"glue:GetTables",
"glue:GetTable"
],
"Resource": "*",
"Effect": "Allow"
}
]
}To create the IAM role by using the Calibo-provided CFT, download the CFT and upload it when creating a stack in CloudFormation. The external ID is already mentioned in the template. In this case, the IAM policy is attached automatically to the role.
After your IAM role is created, store the external ID and IAM role ARN in AWS Secrets Manager that you have selected earlier. Then provide the following values.
External ID Key: Provide the key for the external ID stored in AWS Secrets Manager to enable the Lazsa Orchestrator Agent to resolve secrets and access your AWS Glue Data Catalog.
Glue IAM Role Key: Provide the key for the IAM Role ARN stored in AWS Secrets Manager.
Azure Key Vault
Select Azure Key Vault and do the following.
In the Vault Configuration dropdown list, the Azure Key Vault configurations that you save and activate in the Secret Management section on the Cloud Platform, Tools & Technologies screen are listed for selection. Select the Azure Key Vault where you have stored the secrets for Glue Data Catalog.
Vault Name
Provide the vault name, for the Lazsa Platform to retrieve your Glue Data Catalog secrets.AWS Account
From the list of your AWS accounts configured in the Lazsa Platform, select the account from which you want to access Glue Data Catalog.AWS Account ID: The account ID of the selected AWS account is auto-populated.
Region
The region of the selected AWS account is auto-populated.Creating an IAM Role to Access Glue Data Catalog
You must create an IAM role to allow the Lazsa Platform to access Glue Data Catalog from your AWS account. You can create this IAM role manually or use the CloudFormation template (CFT) provided by Calibo on this screen.
In the role's trust relationship, you must add a
Condition
element and provide an external ID to validate that the request is from the authorized entity with the correct external ID.If you manually create the IAM role, copy the following permissions policy and attach it to the role:
Copy{
"Version": "2012-10-17",
"Statement": [
{
"Condition": {
"StringEquals": {
"aws:RequestedRegion": "<Region where Lazsa Platform is installed>"
}
},
"Action": [
"glue:GetDatabase",
"glue:CreateDatabase",
"glue:GetDatabases",
"glue:CreateTable",
"glue:GetTables",
"glue:GetTable"
],
"Resource": "*",
"Effect": "Allow"
}
]
}
To create the IAM role by using the Calibo-provided CFT, download the CFT and upload it when creating a stack in CloudFormation. The external ID is already mentioned in the template. In this case, the IAM policy is attached automatically to the role.After your IAM role is created, store the external ID and IAM role ARN in the Azure Key Vault that you have selected earlier. Then provide the following values.
External ID Secret: Provide the secret name where you have stored the external ID in Azure Key Vault to enable the Lazsa Platform to resolve secrets and access your AWS Glue Data Catalog.
Glue IAM Role Secret: Provide the secret name where you have stored the IAM Role ARN in Azure Key Vault.
Click Test Connection to validate whether you have configured the correct connection details and you can connect to AWS Glue Data Catalog successfully.
Database
After a successful test connection, a list of existing databases within the Glue Data Catalog is displayed, each associated with an Amazon S3 location where metadata is stored. Select an existing database to store metadata in your data integration and data transformation jobs.
You can also create a new Glue Data Catalog database from within the Lazsa Platform.
Click .
In the Create New Database side drawer, provide a unique name for your database, an optional description, and the S3 URI where metadata will be stored.Secure configuration details with a password
To password-protect your AWS Glue Data Catalog connection details, turn on this toggle, enter a password, and then retype it to confirm. This is optional but recommended. When you share the connection details with multiple users, password protection helps you ensure authorized access to the connection details.Click Save Configuration. You can now see the configuration listed on the Databases and Data Warehouses screen.
Once you save your AWS Glue Data Catalog connection details successfully, you can start using Glue Data Catalog as a metastore in your data integration and data transformation jobs within your data pipeline.
To understand how you can use AWS Glue Data Catalog as a metastore with Databricks and S3 located across different AWS accounts, see Using Glue Data Catalog as Metastore for Databricks in Cross-Account Setup.
What's next? Configure Technologies and Testing Tools