OLAP: the art of slicing and dicing data

Typical COE projects involve developing some kind of OR model to better understand a given problem. In this task, consolidating and exploring data is a crucial step to understand the problem and build math models, although it’s not always the focus of project outcomes. Online Analytical Processing (OLAP) has been tackling the problem of systematically consolidating and delivering data for analysis, and provides a couple of tools that can be applied in COE projects to easily slice and dice data, and provide more value to industry partners.

Facts, Measures and Dimensions

OLAP views data as dimensions and measures that characterize business facts. Facts are the business events to be analyzed, such as sales or shippings. Measures represent the numerical properties of facts that users wants to analyze and optimize, such as number of orders or total revenues. Dimensions provide context for facts and are used to specify level of detail required by the analysis.

Distinguishing between measures and dimensions can be a helpful first step when analyzing data, especially when your dataset has many fields and you are not sure how to start analyzing it. Start Schemas materialize these concepts in a data structure and can be very useful -even if it’s not materialized- to understand the analysis directions provided by a dataset.

Star Schema model for a sales dataset. The fact table in the center contains the measures and references to the corresponding dimensions. Dimensions have their own tables with the attributes which characterize facts. Source: DWH Wiki.

Data Cubes

An example of an OLAP cube with three dimensions. They are also called hypercubes since they can have multiple dimensions. Source: wikipedia.org.

Another useful concept from OLAP are data cubes. A cube is a structure for data analysis, which helps to look for trends by slicing and dicing the data. They summarize data across a set of dimensions, using aggregation functions, such as sum, count, and average.

An Excel pivot table is an example of a data cube. By adding and removing fields you can  get the desired view of your data. However, when your data is too large or too complex (because it has data from different tables, contains non-additive dimensions, or uses distinct counts to aggregate), a pivot table might not do the work. You can still get the slicing and dicing functionality using a database system, such as SQL Server or Oracle, or business intelligence software such as Tableau or SAS. Even Excel offers more advanced cube functionalities with Power Pivot.

So, how can cubes help in COE projects?

Cubes provide a framework to structure and analyze the data that can be easily used by  users with little or no knowledge of data analysis techniques. Slicing and dicing are simple operations that can enable a wider audience to explore multidimensional datasets. Using cubes you can provide not only insights from a dataset, but the possibility to customize the analysis to respond particular questions from the end user, which is the one that has more knowledge about the context and meaning of facts. Using cubes, the analyst can provide a template for analysis, that the end user can then refine by dragging and dropping dimensions and that way look for trends in specific periods of time, narrow the analysis to particular groups or explore the difference between different categories.

As Michael Trick, former president of INFORMS, puts it “sometimes just getting great data and doing predictions followed by a simple decision model is enough to make better decisions”. Not everything requires a sophisticated data mining algorithm or integer programming optimization. Putting the data on the users hands in a structure that allows them to answer their own questions might be the best (and simplest) way to deliver value.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.