Data Warehouse Architecture
The Data Warehouse system is the main repository of all data from the organization, regardless of its source (sales data, financial results, marketing, etc.). In order to provide adequate reporting capabilities, flexibility and required data aggregation and processing capacity, the Data Warehouse architecture consists of 5 basic components:
Data Sources
- Internal data sources / operational databases. Any information produced, reported and consolidated within the organization; examples: orders and sales, transactions made by the organization, leads and prospects, data from CRM system
- External data sources.Any information coming from outside the organization; examples: information collected during customer surveys, information from business intelligence agencies
ETL tools
A Data Warehouse using typical ETL (Extract – Transform – Load) tools is based on processes of structuring, integration and levels of access to data depending on the role a user plays in the organization
- Structuring server: a database storing raw source data for further processing before it is loaded into the Warehouse. Possibility to create many different databases depending on needs (e.g. separate databases for separate departments of an organization)
- Integration layer – integration of unrelated data takes place on this layer by means of Operational Data Store (ODS)
- Integrated data is then loaded into the database Warehouse, which is used for further processing
Each ETL step provides monitoring and control of possible errors to eliminate them before the integrated data is loaded into the Data Warehouse.
Data Warehouse Server (DWH)
The Data Warehouse Server (DWH server) is the physical memory used to store the data models for the Data Warehouse. Mapping data from source systems (Section 1) to the data model used by the Data Warehouse allows you to ensure that you have enough data from specific ranges to meet your reporting requirements. Optimization of DWH server configuration is crucial to ensure adequate capacity to perform the set analytical processes while storing, managing and protecting historical data. The main task of Data Warehouse is to report and publish specific, required data sets in order to support all decision-making processes in the organization in the most effective way.
OLAP cubes
OLAP (Online Analytical Processing) is the process of forming data into a multidimensional model defined by an OLAP cube, for future use in analysis. Using OLAP cubes, an organization can:
- Generate and simultaneously validate complex calculations and analyses
- Recognize and define relationships and differences
- Recognize and define trends and exceptions
Data Mining / Reporting Tools
Data Mining is the analysis of data sets to uncover the meaning behind the data – predicting behavior, forecasting trends – and reporting that information in an accessible and understandable format.
If you want to know more about Data Warehouses, write to us info@cogit.pl or call +48 22 496 60 00.
Also, check out this video about the capabilities that the Microsoft Azure cloud provides for Data Warehousing. Although the cloud may not always be the best solution… this option is also worth considering when deciding on implementation or migration.