Long Live the Traditional Data Warehouse
A data warehouse is more than a storage repository. Don't lose sight of the benefits a traditional warehouse provides.
- By Mike Schiff
- April 9, 2018
I have recently seen several articles, whitepapers, and advertisements alleging that traditional data warehouses are no longer appropriate for a variety of reasons. Some argue that "virtual data warehouses" (which directly access operational systems) obviate the need to store a copy of the data in a data warehouse. Others claim that data warehouses require relational databases and that other data types cannot be supported. Proponents of the latter argument are often vendors of NoSQL data structures or consultants specializing in data warehouse modernization.
The Benefits of Data Warehousing
Although I have stated it before, I would like to again emphasize that a data warehouse is much more than a repository for storing data. It is a refinery for consolidating and purifying data sourced from multiple heterogeneous operational systems.
To fully appreciate this, it may be helpful to remember the days when most organizations (as many still do!) allowed departments to develop siloed operational systems that described the same entities with different data definitions, units of measure, edit rules, value lists, etc. It was not unheard of for organizations to create reconciliation systems between every two systems when the need arose to transfer data between them.
Data warehouse practitioners resolved this problem by leading or facilitating efforts to create enterprisewide data definitions. Data extracted from each operational system was then transformed to conform to these definitions and associated value lists when loaded into the data warehouse. After all, each department considered its parochial definitions to be the correct ones and was quite reluctant to modify existing operational systems to conform to corporate definitions that might provide little direct benefit for departmental needs.
The concept of virtual data warehouses and the appeal of simultaneously directly accessing multiple operational systems, instead of moving data to a data warehouse, ignores the possibility of data inconsistencies among the data sources as well as the need to access historical data values. In fact, one of the fundamental purposes of a data warehouse is to contain time-variant snapshots of data for comparison purposes -- to discover trends and compare current and historical values.
Much has been written about data lakes and their ability to store data in structures such as Hadoop Distributed File System (HDFS) for potential use in data mining and other analytical purposes. However, simply saving data without procedures for ensuring data quality often leads to the creation of a data swamp or, even worse, a polluted data cesspool.
The Multiplatform Future
We must also remember that a successful data warehouse environment does not necessarily consist of a single platform. Although technology advances and cost reductions in processing power, memory, storage, cloud computing, and database and analytics software have made it possible to perform tasks in near real time that were not feasible just a few years ago, this does not displace the need for traditional data warehouses.
Our data warehouse architectures will likely embrace multiple platforms including data marts, operational data stores, data lakes, and traditional data warehouses. Our architectures can even accommodate some of the benefits of a virtual data warehouse as long as it's integrated with a traditional data warehouse containing historical values and each architectural component conforms to the organization's data definitions.
About the Author
Michael A. Schiff is founder and principal analyst of MAS Strategies, which specializes in formulating effective data warehousing strategies. With more than four decades of industry experience as a developer, user, consultant, vendor, and industry analyst, Mike is an expert in developing, marketing, and implementing solutions that transform operational data into useful decision-enabling information.
His prior experience as an IT director and systems and programming manager provide him with a thorough understanding of the technical, business, and political issues that must be addressed for any successful implementation. With Bachelor and Master of Science degrees from MIT's Sloan School of Management and as a certified financial planner, Mike can address both the technical and financial aspects of data warehousing and business intelligence.