Data Warehouse – Slowly Changing Dimensions SCD (Type 1 vs Type 2)

Data warehouse (DW) structure may differ depending on what Slowly Changing Dimension (SCD) model we choose. Among all SCD approaches there are two that are the most frequent: so called SCD type 1 and SCD type 2. I will discuss how to implement these two SCD types from ETL perspective in CloverETL.

**SCD Types
The difference between type 1 and type 2 is usually only in history tracking. While in the first type (SCD type 1) DW does not preserve historical states of dimensional data at all, in the second one DW holds the entire history of dimensional data (records are enriched by additional flags, such as date ranges that describes the period when the record is valid for).

For those who are not familiar with SCD types, I recommend reading the Wikipedia article at first.

**Data Warehouse Structure
In this example, there are the following tables:

  • fact_item (Fact table with quantitative sales data: Units Sold, Units Paid, Units Shipped)

  • dim_date (Date dimension – Type 0 SCD type used)

  • dim_item (Product dimension that were sold)

  • dim_store (Dimension of places where aforementioned products were sold)

  • dim_customer (Dimension of customers that bought above products)

Fact table that holds information on how many items were sold, paid and shipped is connected with dimensional tables using so called surrogate keys – these are PRIMARY KEYS of dimensional entries.

Depending on what SCD type is used dim tables may contain other fields. This is the case of SCD type 2 and Item, Store and Customer dimensions, which besides the regular data (surrogate keys and dimensional values) contains also two other columns (start_date and end_date). These two fields describe the period when a particular entry was used.

**ETL Process
Regular ETL process may usually be divided into three steps:

  • Extraction

  • Transformation

  • Loading

Loading is the last phase that is subject of this example. The previous two phases are not described in this example. I have already prepared source data – the files are located in data-in folder in both projects.

SCD2-DEMO.zip