Data Decentralized: Empowering Teams with a Data Mesh Approach
Why forward-thinking organizations are turning to a data mesh architecture.
- By Natalie Waller
- April 15, 2024
In the era of AI, advanced analytics, and machine learning, the ability to manage and access data efficiently has become a cornerstone of corporate strategy. As enterprises grow, the demand for timely and accurate data from various departments intensifies, posing a significant challenge for centralized data teams. A scenario where five different departments each make five data requests quickly escalates to 25 action items, leading to bottlenecks and potential privacy breaches as individuals attempt to self-serve data.
To navigate this complex landscape, forward-thinking organizations are turning to a data mesh architecture, offering a scalable, secure solution to data access and management challenges.
The Limitations of Centralized Data Management
Centralized data management, although effective to a degree, struggles under the weight of enterprise-scale demands. There are just too many unique data sources, different types of users, and complex regulatory restrictions to keep in mind for a central data team to manage all incoming requests.
When an organization spans multiple geographies or has evolved through mergers and acquisitions to have a wide variety of data sources, use cases, and systems at each subsidiary, a purely centralized strategy can become unworkable. Diverse tools, systems, and regulatory requirements in use at different business units can create a patchwork that’s tough to manage using home-built or legacy approaches to data management. With each department using their own tools and processes, an organization’s data can end up siloed in SaaS applications, IoT services, and mainframe applications and databases.
The Data Mesh Solution
A data mesh architecture addresses these challenges by decentralizing data ownership to domain-specific teams while maintaining a cohesive governance structure. This approach alleviates the pressure on central data teams to serve users <em>and</em> empowers domain experts to manage and use data more effectively under the oversight of a centralized governance framework.
The transition to a data mesh involves moving from siloed data and a centralized data fabric to domain-oriented decentralization, ultimately establishing a fully functional data mesh where teams autonomously manage their data while the central data team oversees organization-wide governance protocols. Improving visibility, control, and scale are key steps to successfully deploying a data mesh architecture.
By moving to a data mesh strategy, an enterprise can eliminate bottlenecks and rogue data access by democratizing pipeline ownership to select business domains, who process data under the procedures set in place and monitored by the core data team, which has complete visibility into the process and the ability to intervene if necessary.
With this approach, the core data team members can shift their focus to policy enforcement as they oversee the management of data across business domains.
Building a Center of Excellence for Data Mesh
The foundation of a successful data mesh strategy lies in establishing a data center of excellence (DCoE). This entity serves as a guiding force, providing resources, technology solutions, and platforms to facilitate the transition to a resilient and scalable data architecture. A DCoE enhances data observability, controllability, and scalability, ensuring that data services can grow without compromising security or compliance.
Key benefits of a data center of excellence include:
Data observability. This involves understanding the life cycle of data, from its origin to its current state, and monitoring its access and use. It answers critical questions about data lineage, usage, and security, ensuring transparency across the organization.
Data controllability. Implementing robust control measures safeguards sensitive information and ensures that data access is appropriately managed, meeting compliance requirements and protecting against data breaches.
Data scalability. Through standardization and automation, a data CoE enables organizations to expand their data services efficiently, ensuring that new users and platforms are integrated seamlessly without sacrificing security. By allowing more self-service options for individual business groups, central data teams can focus on role-specific guidelines around data access for a whole team, without having to focus on as many individual requests. This helps a data team scale to serve more parts of the enterprise. Scaled data access and reduced bottlenecks are some of the main benefits of a data mesh, but companies need to establish full visibility and control of the data before scaling up the volume and scope of data covered by the mesh.
Metadata: The Key to Visibility and Governance
At the heart of any data management strategy is the effective use of metadata. Metadata provides the context needed for comprehensive observability and governance, enabling organizations to track data provenance, access, and current location. Visibility, monitoring, and auditability are key components of a data mesh and metadata helps enable these components.
By normalizing and sharing metadata across a centralized metadata catalog, companies can streamline monitoring, auditing, and analysis, ensuring that data governance is maintained even as data is decentralized across the mesh. By monitoring metadata, central data teams can oversee user requests without having to approve every single ask for data.
Effective metadata management enhances data quality, trust, and discoverability, enabling advanced analytics and machine learning applications. It adds valuable context to data, facilitating easier root cause and impact analysis. Notably, well-managed metadata unlocks easy auditing of compliance and security, reducing the risk of data breaches and privacy-violation fines.
Sophisticated data teams can also leverage granular role-based access control (RBAC) settings to make sure users only have access to the data they need to perform their jobs. For highly regulated data, enterprises should leverage advanced security measures such as private networking to avoid exposure to the public internet, customer managed keys to maintain full control over access to data, and cloud-region selection for data residency requirements.
A Real-World Application: Care.com's Journey to Data Mesh
Care.com's evolution from siloed data management to a fully realized data mesh architecture highlights the practical benefits of this approach. Facing the complexity of connecting a diverse user base across 17 countries with a large network of service providers, Care.com transitioned from centralized data management to a domain-oriented, decentralized structure. The company has a high volume of site visits and transactions every day, but a legacy architecture and homegrown overnight batch processing tool could only generate daily performance reports once a day.
The company’s teams of data engineers focus on day-to-day business operations and innovation. Database administrators manage various data warehouses and databases, and an analytics engineer leads cross-functional data enablement.
Their first project was to centralize all their advertising, social media, and marketing data to visualize marketing attribution efforts and streamline conversion campaigns. Next, the team took on migrating to the cloud to modernize their data infrastructure and improve reporting capabilities. Care.com leveraged role-based access control, metadata, and ”shoppable” Snowflake data marts to eliminate data silos and bottlenecks.
Since kick off, Care.com has migrated almost all of its raw data out of its legacy databases and into Snowflake, where it created self-service data marts for their analysts. Care.com analysts can “shop” for clean, queryable data in Snowflake instead of waiting for engineering to build new pipelines and prepare the data for them. Care.com is able to deliver intra hour, near-real time reporting, as data is uploaded to the cloud warehouse and then pulled into Tableau to refresh executive performance reports every hour.
By adopting advanced access control settings as part of a data mesh, Care.com has the flexibility to expand access to data by sharing platform access and the pipelining workload with employees outside the core data team. Care.com’s marketing data owners have custom access that allows them to introduce and incorporate new marketing sources and make changes to their existing pipelines without going through the core data team. With this managed access, the marketing team can iterate faster than before without accidentally accessing or sharing something it shouldn’t. The rest of the company’s more sensitive data is still owned by the core data team, but this team’s work is streamlined by automation.
Conclusion: The Strategic Imperative of Data Mesh Architecture
For enterprises aiming to harness the full potential of their data in a scalable, secure manner, adopting a data mesh architecture is not just an option, it's a necessity. By improving visibility, control, and automation with a data mesh, an organization can safely scale its data infrastructure. With the tangible benefits of a DCoE, a data mesh approach enables an organization to navigate the complexities of modern data ecosystems. By empowering domain-specific teams with the autonomy to manage data within a robust governance framework, enterprises can unlock new levels of efficiency, innovation, and competitive advantage.
As working with data becomes part of everyone’s job, approaches like using a data center of excellence and implementing a data mesh will help reduce roadblocks around data ownership between departments while making it easier for central data teams to serve users in a seamless way that meets their timely needs. A data mesh architecture improves observability, controllability, and scalability while allowing individual departments to innovate and manage their data effectively so they can respond to business demands quickly.