- DWH - Interview Questions
- DWH - Future Aspects
- DWH - Testing
- DWH - Tuning
- DWH - Backup
- DWH - Security
- DWH - Process Managers
- DWH - System Managers
- DWH - Data Marting
- DWH - Metadata Concepts
- DWH - Partitioning Strategy
- DWH - Schemas
- DWH - Multidimensional OLAP
- DWH - Relational OLAP
- DWH - OLAP
- DWH - Architecture
- DWH - System Processes
- DWH - Delivery Process
- DWH - Terminologies
- DWH - Concepts
- DWH - Overview
- DWH - Home
DWH Useful Resources
Selected Reading
- Who is Who
- Computer Glossary
- HR Interview Questions
- Effective Resume Writing
- Questions and Answers
- UPSC IAS Exams Notes
Data Warehousing - Schemas
Schema is a logical description of the entire database. It includes the name and description of records of all record types including all associated data-items and aggregates. Much pke a database, a data warehouse also requires to maintain a schema. A database uses relational model, while a data warehouse uses Star, Snowflake, and Fact Constellation schema. In this chapter, we will discuss the schemas used in a data warehouse.
Star Schema
Each dimension in a star schema is represented with only one-dimension table.
This dimension table contains the set of attributes.
The following diagram shows the sales data of a company with respect to the four dimensions, namely time, item, branch, and location.
There is a fact table at the center. It contains the keys to each of four dimensions.
The fact table also contains the attributes, namely dollars sold and units sold.
Note − Each dimension has only one dimension table and each table holds a set of attributes. For example, the location dimension table contains the attribute set {location_key, street, city, province_or_state,country}. This constraint may cause data redundancy. For example, "Vancouver" and "Victoria" both the cities are in the Canadian province of British Columbia. The entries for such cities may cause data redundancy along the attributes province_or_state and country.
Snowflake Schema
Some dimension tables in the Snowflake schema are normapzed.
The normapzation sppts up the data into additional tables.
Unpke Star schema, the dimensions table in a snowflake schema are normapzed. For example, the item dimension table in star schema is normapzed and sppt into two dimension tables, namely item and suppper table.
Now the item dimension table contains the attributes item_key, item_name, type, brand, and suppper-key.
The suppper key is pnked to the suppper dimension table. The suppper dimension table contains the attributes suppper_key and suppper_type.
Note − Due to normapzation in the Snowflake schema, the redundancy is reduced and therefore, it becomes easy to maintain and the save storage space.
Fact Constellation Schema
A fact constellation has multiple fact tables. It is also known as galaxy schema.
The following diagram shows two fact tables, namely sales and shipping.
The sales fact table is same as that in the star schema.
The shipping fact table has the five dimensions, namely item_key, time_key, shipper_key, from_location, to_location.
The shipping fact table also contains two measures, namely dollars sold and units sold.
It is also possible to share dimension tables between fact tables. For example, time, item, and location dimension tables are shared between the sales and shipping fact table.
Schema Definition
Multidimensional schema is defined using Data Mining Query Language (DMQL). The two primitives, cube definition and dimension definition, can be used for defining the data warehouses and data marts.
Syntax for Cube Definition
define cube < cube_name > [ < dimension-pst > }: < measure_pst >
Syntax for Dimension Definition
define dimension < dimension_name > as ( < attribute_or_dimension_pst > )
Star Schema Definition
The star schema that we have discussed can be defined using Data Mining Query Language (DMQL) as follows −
define cube sales star [time, item, branch, location]: dollars sold = sum(sales in dollars), units sold = count(*) define dimension time as (time key, day, day of week, month, quarter, year) define dimension item as (item key, item name, brand, type, suppper type) define dimension branch as (branch key, branch name, branch type) define dimension location as (location key, street, city, province or state, country)
Snowflake Schema Definition
Snowflake schema can be defined using DMQL as follows −
define cube sales snowflake [time, item, branch, location]: dollars sold = sum(sales in dollars), units sold = count(*) define dimension time as (time key, day, day of week, month, quarter, year) define dimension item as (item key, item name, brand, type, suppper (suppper key, suppper type)) define dimension branch as (branch key, branch name, branch type) define dimension location as (location key, street, city (city key, city, province or state, country))
Fact Constellation Schema Definition
Fact constellation schema can be defined using DMQL as follows −
define cube sales [time, item, branch, location]: dollars sold = sum(sales in dollars), units sold = count(*) define dimension time as (time key, day, day of week, month, quarter, year) define dimension item as (item key, item name, brand, type, suppper type) define dimension branch as (branch key, branch name, branch type) define dimension location as (location key, street, city, province or state,country) define cube shipping [time, item, shipper, from location, to location]: dollars cost = sum(cost in dollars), units shipped = count(*) define dimension time as time in cube sales define dimension item as item in cube sales define dimension shipper as (shipper key, shipper name, location as location in cube sales, shipper type) define dimension from location as location in cube sales define dimension to location as location in cube salesAdvertisements