Merge Stage in DataStage - Data Warehousing Data Warehousing

 

The Merge stage is a processing stage. It can have any number of input links, a single output link, and the same number of reject links as there are update input links.The Merge stage is one of three stages that join tables based on the values of key columns.

The three stages differ mainly in the memory they use, the treatment of rows with unmatched keys, and their requirements for data being input (for example, whether it is sorted).

Merge stage combines a master dataset with one or more update datasets based on the key columns.the output record contains all the columns from master record plus any additional columns from each update record that are required.

A master record and update record will be merged only if both have same key column values.

The data sets input to the Merge stage must be key partitioned and sorted. As part of preprocessing your data for the Merge stage, you should also remove duplicate records from the master data set. If you have more than one update data set, you must remove duplicate records from the update data sets as well.Unlike Join stages and Lookup stages, the Merge stage allows you to specify several reject links. You must have the same number of reject links as you have update links.Reject Link contains data from respective input Update link which are failed to match with Master.

Different options:Unmatched Masters Mode: Keep means that unmatched rows (those without any updates) from the master link are output; Drop means that unmatched rows are dropped instead.Warn On Reject Updates:True to generate a warning when bad records from any update links are rejected.Warn On Unmatched Masters: True to generate a warning when there are unmatched rows from the master link.

Example:Master dataset:CUSTOMER_ID CUSTOMER_NAME1                     Peter2            MariaUpdate datasetCUSTOMER_ID CITY ZIP_CODE SEX1                    Mexico 90630 M2                    Mexico 90630 FOutput:CUSTOMER_ID CUSTOMER_NAME CITY ZIP_CODE SEX1                     Peter     Mexico 90630 M2                     Maria     Mexico 90630 F



Meet Ananth Tirumanur. Hi there 👋

I work on projects in data science, big data, data engineering, data modeling, software engineering, and system design.

Connect with me:

My Resources:

Languages and Tools:

AWS, Bash, Docker, Elasticsearch, Git, Grafana, Hadoop, Hive, EMR, Glue, Athena, Lambda, Step Functions, Airflow/MWAA, DynamoDB, Kafka, Kubernetes, Linux, MariaDB, MySQL, Pandas, PostgreSQL, Python, Redis, Scala, SQLite