Modernizing Data Integration and Data Warehousing with Data Hubs
As data and its management continue to evolve, users should consider a variety of modernization strategies, including data hubs.
By Philip Russom, TDWI Research Director for Data Management
This week, I spoke in a webinar run by Informatica Corporation, sharing the stage with Informatica’s Scott Hedrick. Scott and I had an interactive conversation where we discussed modernization trends and options, as faced today by data management professionals and the organizations they serve. Since data hubs are a common strategy for capturing modern data and for modernizing data integration architectures, we included a special focus on hubs in our conversation. We also drilled into how modern hubs can boost various applications in analytics and application data integration operations.
Scott and I organized the webinar around a series of questions. Please allow me to summarize the webinar by posing the questions with brief answers:
What is data management modernization?
It’s the improvement of tools, platforms, and solutions for data integration and other data management disciplines, plus the modernization of both technical and business users’ skills for working with data. Modernization is usually selective, in that it may focus on server upgrades, new datasets, new data types, or how all the aforementioned satisfy new data-driven business requirements for new analytics, complete views, and integrating data across multiple operational applications.
What trends in data management drive modernization?
Just about everything in and around data management is evolving. Data itself is evolving into more massive volumes of greater structural diversity, coming from more sources than ever and generated faster and more frequently than ever. The way we capture and manage data is likewise evolving, with new data platforms (appliances, columnar databases, Hadoop, etc.) and new techniques (data exportation, discovery, prep, lakes, etc.). Businesses are evolving, too, as they seek greater business value and organizational advantage from growing and diversifying data – often through analytics.
What is the business value of modernizing data management?
A survey run by TDWI in late 2015 asked users to identify the top benefits of modernizing data. In priority order, they noted improvements in analytics, decision making (both strategic and operational), real-time reporting and analytics, operational efficiency, agile tech and nimble business, competitive advantage, new business requirements, and complete views of customers and other important business entities.
What are common challenges to modernizing data management?
The TDWI survey mentioned above uncovered the following challenges (in priority order): poor stewardship or governance, poor quality data or metadata, inadequate staffing or skills, funding or sponsorship, and the growing complexity of data management architectures.
What are the best practices for modernizing data management?
First and foremost, everyone must assure that the modernization of data management aligns with the stated goals of the organization, which in turn assures sponsorship and a return on the investment. Replace, update, or redesign one component of data management infrastructure at a time, to avoid a risky big bang project. Don’t forget to modernize your people by training them in new skills and officially supporting new competencies on your development team. Modernization may lead you to embrace best practices that are new to you. Common ones today include: agile development, light-weight data prep, right-time data movement, multiple ingestion techniques, non-traditional data, and new data platform types.
As a special case, TDWI sees various types of data hubs playing substantial roles in data management modernization, because they can support a wide range of datasets (from landing to complete views to analytics) and do so with better and easier data governance, audit trail, and collaboration. Plus, modernizing your data management infrastructure by adding a data hub is an incremental improvement, instead of a risky, disruptive rip-and-replace project.
What’s driving users toward the use of modern data hubs?
Data integration based on a data hub replaces two of the biggest problems in data management design and development: point-to-point interfaces (which limit reuse and standards, plus are impossible to maintain or optimize) and traditional waterfall or other development methods (which take months to complete and are difficult to keep aligned with business goals).
What functions and benefits should users expect from a vendor-built data hub?
Vendor-built data hubs support advanced functions that are impossible for most user organizations to build themselves. These functions include: controlled and governable publish and subscribe methods; the orchestration of workflows and data flows across multiple systems; easy-to-use GUIs and wizards that enable self-service data access; and visibility and collaboration for both technical and business people across a range of data.
Data hubs are great for analytics. But what about data hubs for operational applications and their data?
Instead of consolidating large operational applications in the multi-month or year project, some users integrate and modernize them quickly at the data level via a shared data hub, perhaps on a cloud. For organizations with multiple customer facing applications for customer relationship management (CRM) and salesforce automation (SFA), a data hub can be a single, trusted version of customer data, which is replicated and synchronized across all these applications. A data hub adds additional functions that users of operational applications can use to extend their jobs, namely self-service data access and collaboration over operational data.
What does a truly modern data hub offer as storage options?
Almost all home-grown data hubs and most vendor-built hubs are based on one brand of relational database management system, despite the fact that data’s schema, formats, models, structures, and file types are diversifying aggressively. A modern data hub must support relational databases (because these continue to be vital for data management), but also support newer databases, file systems, and – very importantly – Hadoop.
If you’d like to hear more of my discussion with Informatica’s Scott Hedrick, please click here to replay the Informatica Webinar.
Posted on March 29, 2016