Skip to content

The Data Warehouse Architecture

August 17, 2012

From a high perspective, the data warehouse architecture can be represented as a block diagram with five main components:

  1. The data sources,
  2. The integration area,
  3. The storage area,
  4. The presentation layer and,
  5. The hardware Infrastructure.

The data flow from the data sources to the integration area, storage area, and presentation layer


Enlarge image

Every component is complex enough to be described in a simple blog. The blocks in the figure are sized to allow clarity; neither, the size of a block or it position, represent importance order. Follow a brief description of each block:

The Data Sources

The Data Sources are not part of Data Warehouse Architecture, but of the Enterprise Architecture.

The Data Sources are all the containers of information in the enterprise. This data can be structured (i.e. relational databases, Excel spreadsheets, etc), unstructured (i.e. Word documents, text files, flat files, etc), big data repositories (i.e. sensors readings records, website logs) or multimedia (i.e. videos, images, voice records, etc.).

Two (2) important components are the Master Data Repository and the Enterprise Data Repository. If a Master Data Repository has been implemented, the main data warehouse dimensions can be directly derived from this component. If the Enterprise Data Repository has been implemented, we can use it as the main source for populating the Operational Reporting Data Repository and to obtain the new data to be transferred to the Staging Area. One of the common limitations in using the Enterprise Data Repository for analytics is that, often, does not contain temporary data information (timestamps of the events).

The ETL Processes

Even it the bulk of the data preparation occurs in the Integration Area, ETL (Extract – Transform – Load) processes are necessary at every step in data flow.

The Integration Area

The integration area is where the data, originated from disparate sources, is linked, transformed (is needed) and structured in a suitable format to be stored in the Enterprise Data Warehouse.

This area has two (2) main components, the Data Reduction Area and the Staging Area. The Data Reduction Area is where we implement different techniques to extract insight from the Big Data Repository. These insights are summarized, transformed in relational structures and move to the staging area before becoming part of the Enterprise Data Warehouse. The staging is where we made the final preparation of the new data to become conformed with the Enterprise Data Warehouse structures and semantic.

The data in the Integration Area is highly volatile; most of it is only deltas that will be added to the Enterprise Data Warehouse. Once moved to the Enterprise Data Warehouse, this data can be discarded to give place to new data from the Data Sources.

The Storage Area

The storage area is where the persistent data reside. Except from the Operational Reporting Data Repository, the other repositories in this area grow constantly, new data is added but no old data is deleted, this area contains the enterprise memory.

The Operational Reporting Data Repository is composed of a federation of replicated databases (publications) from the Operational Systems databases. The function of this component is to provide the data to produce the operational reports of the enterprise; it exists to avoid disruption on the operation of the Operational Systems due to resources (i.e. complex and expensive queries) required to produce the operational reports. The Operational Data Repository only contains data for a limited period of time.

The Unstructured Data Repository is probably the biggest storage area, it contain different types of documents.

The Stream Data Repository contains data from Stream Data Sources, like real time sensors. There are high volumes of data arriving at high frequency, but we reduce the space used for this kind of data by identifying patterns, variations and tendencies and instead of saving the raw data, we save the new processed results.

Some can argue that the Operational Reporting Data Repository and the Big Data Repository are not part of the Data Warehouse Architecture, but all the reporting need of the enterprise, including the operational reports, and the repositories for data mining (the case of the big data store) should be part of the responsibilities of the data warehouse team.

The Enterprise Data Warehouse is a mix of normalized and de-normalized data structures that contain the memories of the enterprise; this is implemented using relational database(s). This component is highly complex, I will detail more about it in future blogs.

The Data Marts are de-normalized data structured that has been pre-processed and structured to serve as the high performance source for the Business Intelligence and Decision Support Systems.

The Presentation Layer

Business intelligence Application and Decision Support Systems are not included in the Presentation Layer because they are mainly compound of the component included in this section, making them implicitly part of the Presentation Layer.

The presentation layer is the front end of the Data Warehouse; it is compose of all the tools required to obtain insight from the data stored in the Storage Area of the Data Warehouse Architecture, from simple reporting tools to complex data mining tools.

The Hardware Infrastructure

The hardware infrastructure includes all the required equipments (i.e. Servers, Clusters, storage devices, etc) necessary to implement all the other Data Warehouse components.

In future blogs, I will give more detail of all deferent components and subcomponents of the Data Warehouse Architecture.

Read Further

DW Architecture: Operational Reports Data Repository

DW Architecture: The Enterprise Data Repository

DW Architecture: The Unstructured Data Repository

DW Architecture: The Streamed Data Repository

DW Architecture: The ETL Process

DW Architecture: The Enterprise Data Warehouse

Advertisements

From → DW Architecture

One Comment

Trackbacks & Pingbacks

  1. DW Architecture: The Enterprise Data Warehouse | ARBIME

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: