Data Ingestion from Salesforce to Snowflake using Amazon AppFlow
Amazon AppFlow is a data integration service that lets you ingest data from SaaS applications like Salesforce or ServiceNow into a Snowflake data lake.
Prerequisites
To use Amazon AppFlow for data integration, you must complete the following prerequisites:
-
Configure an Amazon AppFlow instance in the Data Integration section of Cloud Platform Tools & Technologies.
-
Configure Salesforce to use as a data source in the Databases and Data Warehouses section of Cloud Platform Tools & Technologies.
-
Configure Snowflake to use as a data lake in the Databases and Data Warehouses section of Cloud Platform Tools & Technologies.
-
Create a AppFlow to Snowflake connector in the Databases and Data Warehouses section of Cloud Platform Tools &Technologies. This connector is used to connect to the Snowflake data lake.
To create a data integration job using Amazon AppFlow
-
Click the Amazon AppFlow node in the data integration stage of the pipeline, and click Create Job.
-
Complete the following steps to create the job:
Job Name-
Job Name - Provide an appropriate name for the data integration job.
-
Node Rerun Attempts - This is the number of times the pipeline rerun is attempted on this node, in case of failure. The default setting is done at the pipeline level. You can select rerun attempts for this node. If you do not set the rerun attempts, then the default setting is considered. You can select 1, 2, or 3.
Source-
Source - Based on the data source node that you have added in the pipeline, the source is auto-populated.
-
Datastore - Based on the datastore that you have configured for the data source node, the datastore is auto-populated.
-
Objects - The selected objects of the Salesforce instance are displayed.
Click Next.
Target-
Target - The target node that is added to the pipeline is auto-populated.
-
Datastore - The configured datastore that you have added to the target node is auto-populated.
-
Target Connector - This is the connector that you created for the selected target node. See Creating an Amazon AppFlow connector. While creating the connector, ensure that you select the same Snowflake datastore that is selected for the target node.
-
Target Table - Select an existing table from the dropdown list or create a new table.
To add a new table, click New Table and provide the following information:
-
Database - select a database in which you want to create the new table.
-
Schema - select a schema in which you want to create a new table.
-
Table Name - Provide a name for the new table. To follow the Snowflake recommended naming convention, see Identifier Requirements.
-
You can view the columns of the target table with the data types. This list is populated based on the selected source. You can delete columns from the target table, based on your requirement.
-
Add Custom Columns - Apart from the default columns in the target table, you can also add custom columns. To add a custom column, do the following:
-
Column Name - Provide a name for the custom column.
-
Type - Choose the column type from the dropdown list.
-
Click Add. You can view the added custom columns.
-
-
Click Add to add the new table.
-
-
Click Next.
Field MappingIn this step you map the fields with the columns in the target table.
-
Source Table - Select the table from the source for which you want to do field mapping.
-
Source Field - Select a field from the dropdown list that you want to map with the target field.
-
Destination Field - Select a field from the dropdown list that you want to map to the source field.
-
Click Map Fields.
-
Modify Values - You can mask or truncate the values in the mapped fields in the following ways:
-
Select Mask from the dropdown options. Under Action, select All, First #, or Last #. If you select First # or Last #, then provide the value to represent the number of characters that are to be masked.
-
Select Truncate from the dropdown options. Specify the value to represent the number of characters that are to be retained. The remaining text is truncated.
-
Filter/ Validation (Optional)In this step, you can apply filters or validations for certain columns while ingesting the data.
-
Source Table - Select a source table for which you want to apply filters or validate the values in the columns.
-
Validations - Provide the following information to run validations:
-
Field Name - Select the column on which you want to run the validation.
-
Condition - Choose a condition from the following options:
-
Values are negative
-
Values are missing or null
-
Value is zero
-
Values contain text
-
-
Action - Choose one of the following actions:
-
Terminate Flow - the job execution is terminated.
-
Ignore Record - ignore the record that does not meet the specified condition.
-
-
-
Filters - Use this option to filter the data and ingest it into the data lake. Provide the following information:
-
Field Name - select a field from the dropdown list on which you want to run the filter condition.
-
Condition - select a condition to filter the data, based on the selected field.
ExamplesField Conditions Criteria Date Select one of the following conditions:
-
is same
-
is before
-
is same or before
-
is after
-
is same or after
-
is between
Specify the date. String Select one of the following conditions:
-
contains
-
is exactly
-
is not equal to
Specify the text. For fields which contain decimal values like rating or years of experience
Type: double
Select one of the following conditions:
-
is not equal to (!=)
-
is equal to (=)
-
is less than (<)
-
is less than or equal to (<=)
-
is greater than (>)
-
is greater than or equal to (>=)
Specify the value. -
-
Tags (Optional)In this step, you add tags in key-value pairs to Amazon AppFlow. Tags help you to better organize and track your resources. Adding tags is an optional step.
To create a tag, do the following:
-
Key - provide a key for the tag, this is similar to a name.
-
Value (Optional) - provide a value for the tag. This is optional.
NotificationsSQS and SNS
-
Configurations - Select an SQS or SNS configuration that is integrated with the Lazsa Platform
-
Events - Select the events for which you want to enable SQS or SNS queues.
-
Select All
-
Node Execution Failed
-
Node Execution Succeeded
-
Node Execution Running
-
Node Execution Rejected
-
-
Event Details - Select the details of the events for which notifications are enabled.
-
Additional Parameters - provide any additional parameters to be considered for SQS and SNS queues.
-
-
With this the job creation is complete. You can run the job in the following ways:
-
Click the data integration node and click Start to initiate the job run.
-
Publish the pipeline and then click Run Pipeline.
-
-
After the job run is successful, you can view the ingested data from the Lazsa Platform. Click the data lake node and browse to the target table. File Preview shows you the data ingested into the Snowflake data lake.
What's next? Databricks Templatized Data Integration Jobs |