What is Staging area why we need it in DWH?
If target and source databases are different and target table volume is high it contains some millions of records in this scenario, if we create staging tables in the target database we can simply do outer join in the to determine insert/update. This approach will give you good performance.
It will avoid full table scan to determine insert/updates on target.
While processing flat files to data warehousing we can perform cleansing.
Data cleansing, also known as data scrubbing, is the process of ensuring that a set of data is correct and accurate. During data cleansing, records are checked for accuracy and consistency.
Weeding out unnecessary or unwanted things (characters and spaces etc) from incoming data to make it more meaningful and informative
Data can be gathered from heterogeneous systems and put together
Data scrubbing is the process of fixing or eliminating individual pieces of data that are incorrect, incomplete or duplicated before the data is passed to end user.
Data scrubbing is aimed at more than eliminating errors and redundancy. The goal is also to bring consistency to various data sets that may have been created with different, incompatible business rules.
ODS (Operational Data Sources)
My understanding of ODS is, its a replica of OLTP system and so the need of this, is to reduce the burden on production system (OLTP) while fetching data for loading targets. Hence its a mandate Requirement for every Warehouse.
So every day do we transfer data to ODS from OLTP to keep it up to date?
OLTP is a sensitive database they should not allow multiple select statements it may impact the performance as well as if something goes wrong while fetching data from OLTP to data warehouse it will directly impact the business.
ODS is the replication of OLTP.
Thank you for Reading !! Hit a like ..
— Pawan Kumar