Data design – the building blocks

It is common knowledge that good BI is based on a standardized and consistent data model. It goes without saying that developing ideal systems is too expensive for the real world. We cannot achieve the standardization and consistency right away because of the complexities and change associated with running a real life business, but there is a core of information in any business that is fairly well-defined and stable that can be used as a foundation.

The first stage is capturing and assembling data into a data warehouse. Whatever the data, it must come from valid and reliable sources. Data design begins with understanding of what are the questions that data will allow users to answer and what data do you have available in various data sources throughout your organization. Generally, the core is developed as a de-normalized, flat structure, with the most basic, initial fact tables. Ideally from here you build the structure as a star formation (see Figure 1), identifying separate logical entities as their own dimensions, and isolating them from the flat structure.


Figure 1: Star structure

Eventually, your star could turn into a snowflake formation (see Figure 2) with the central fact table at the hub, and more and more granular data extending out from there. Such a structure is not only good for logical data storage, but also for reporting and future transformations. It is important to identify the right level of granularity for your facts and dimensions very early on, as granularity changes have quite profound impact on ETL processes, reporting, and analytics.

At this point it is also worth mentioning other approaches to DW modeling, such as Data Vault Modeling, that could be more resilient to both changes in the business as well as analytics requirements. It is a good topic for another blog post, however.

Figure 2: Snowflake structure


Reporting – building out from ad-hoc queries

Having captured and catalogued the data, you now need to put it into reports for review and analysis. With the wide range of users and needs, you will find that you need to create many different types of reports. Upper management may prefer to see scorecards and dashboards – static tables and charts that summarize key performance indicators. But yield management departments may need dynamic analytical models to sort and compose data according to their needs.

Initially reports are usually built based on ad hoc queries, but as the number of users and reports grow, simple querying will become difficult to manage. At this point it’s time to work on a common reporting model which can be made using Oracle, Cognos, SSRS/SSAS, etc. Within one model you can dramatically reduce the amount of work for solving issues (in case you have several data marts with +/- same data). It is also helpful to have direct dialog between business and technical people. They often use different terminology and view the business’s needs differently, which can result in misunderstood targets.

From what I see, BI is developing the way I’ve described above. On a recent project we started from flat structured tables in a data warehouse (and similar structures in Cognos Framework Manager and other technologies) and arrived at a true snowflake model. There is a deepening understanding of data structure and its use as people on all sides continue to develop better systems and processes. It’s exciting to be part of the process!

What are your issues in structuring your data?

Is your data warehouse built for optimal reporting? 

–Alex M