The hub-and-spoke model: an alternative to data networking

We are excited to bring Transform 2022 back in person July 19th and virtually July 20th – 28th. Join AI and data leaders for informative talks and exciting networking opportunities. Register today!

Data mesh is a hot topic in the data and analytics community. Datamesh, introduced in 2020 by Zhamak Dehghani in his paper “Data Mesh Principles and Logical Architecture”, is a new distributed model for organizing analytical teams to deliver data products and is intended to meet the challenges of both centralized and decentralized data to address. But is this approach really the best approach for today’s businesses?

Organizational models for analysis

Over the years, we have seen both centralized and decentralized organizational models to provide analyzes to the business. Although both models have their advantages, each has some serious disadvantages that make them insufficient to meet the needs of today’s data hungry consumers.

1. Centralized model

The data warehouse enables businesses to store data in a single, composite location so that, in theory, everyone can find and query their data with confidence. With central control over the data platform and standards, data can be consistently defined and delivered reliably.

In practice, however, there are some major problems with this approach. First, the data must be compiled and loaded so carefully that only IT has the necessary skills to build the data warehouse. It presents IT as a bottleneck for integrating new data. Second, since the IT team usually does not understand the business, they struggle to translate business requirements into technical requirements – thus aggravating the bottleneck, which frustrates their customers. Finally, business users struggle to analyze through thousands of data warehouse tables, which only makes the centralized data warehouse attractive to the most sophisticated users.

2. Decentralized model

Driven by end-user frustration and the explosion in popularity of visualization tools like Tableau, business users have taken matters right into their own hands with a decentralized approach. Instead of waiting for IT to deliver data, business users have created their own data extracts, data models and reports. By decentralizing data preparation, business users have broken away from IT and avoided the “lost in translation” issue associated with the centralized, IT-led approach.

In practice, however, this approach, like the centralized approach, has also brought with it some major challenges. First, with a lack of control over business definitions, business users have created their own versions of reality with each dashboard they have written. As a result, competing business definitions and results have destroyed management’s confidence and confidence in analytics outputs. Second, the decentralized approach has resulted in a proliferation of competitive and often incompatible platforms and tools, making the integration of business unit analysis difficult or impossible.

The data network

Data networking is meant to address the challenges of both models. It assumes that today’s data is distributed and allows all users in an organization to obtain and analyze business insights from virtually any data source, without the intervention of expert data teams. It’s more based on people and organization than technology, and that’s why it’s so compelling. The distributed architecture of a mesh decentralizes the ownership of each business domain. This means that each domain has control over the quality, privacy, freshness, accuracy and compliance of data for analytical and operational use cases.

However, the data networking approach advocates a fully decentralized organizational model by abolishing the centralized team altogether. I would like to propose an alternative to this approach that establishes a center of excellence to make a decentralized model of data management viable for most businesses.

Hub-and-speak model: an alternative to data networking

It is clear that no approach, centralized or decentralized, can deliver agility and consistency at the same time. These goals are in conflict. However, there is a model that can deliver the best of both worlds if implemented with proper tools and processes.

The “hub-and-spoke” model is an alternative to the data mesh architecture with some critical differences. Namely, the pivot-and-ghost model introduces a central data team, or center of excellence (the “hub”). This team owns the data platform, tools and process standards, while the business domain teams (the “spokes”) own the data products for their domains. This approach solves the “anything goes” phenomenon of the decentralized model, while empowering experts (SMEs), or data stewards, to create autonomous data products that meet their needs.

Figure 3: Hub-and-ghost model for data and analytics management

The critical link: The data model

Supporting a decentralized, pivot-and-ghost model for creating data products requires teams to speak a common data language, and that is not SQL. What is needed is a logical way to define data relationships and business logic that are separate and distinct from the physical representation of the data. A semantic data model is an ideal candidate to serve as the Rosetta Stone for diverse data domain teams because it can be used to create a digital twin of the business by mapping physical data in business-friendly terms. Domain experts can encrypt their business knowledge in digital form for others to inquire, connect and improve.

For this approach to scale work, it is critical to implement a common semantic layer platform that supports data sharing, matching dimensions, collaboration, and ownership. With a semantic layer, the central data team (hub) can define general models and conformed dimensions (ie time, product, customer) while the domain experts (speakers) own and define their business process models (eg “invoicing,” “shipping,” “leaden” “). With the ability to share model assets, business users can combine their models with models from other domains to create new compositions to answer deeper questions.

Figure 4: Combination of shared models and domain-specific models

The pivot-and-ghost model succeeds because it plays the strengths of the centralized and business domain teams: the centralized team owns and operates the technical platform and publishes shared models, while the business teams create domain-specific data products using a consistent set. of business definitions and without the need to understand other domains’ business models.

How to get there

It does not have to be disruptive to move to a pivot-and-ghost model for the delivery of data products. There are two paths to success, depending on your existing model for analytics delivery.

If your current analytics organization centralized, the central team and business teams must jointly identify key data domains, assign data stewardship and include an analytical engineer in each. The analytical engineer can come from the central team or the business team. Using a semantic layer platform, the embedded analytics engineer can work within the business domain team to create data models and data products for that domain. The embedded analysis engineer works with the central data team to set standards for tools and process while identifying common models.

If your current organization is decentralized, you can create a central data team to set standards for tools and process. In addition to managing the semantic layer platform and its shared objects and models, the central data team can manage data pipelines and data platforms shared by the domain teams.

Built to scale

The optimal organizational model for analysis will depend on your organization’s size and maturity. However, it is never too early to build for scale. No matter how small, investing in a hub-and-ghost, decentralized model for creating data products will pay dividends now and in the future. By promoting data guardianship and ownership by domain experts, using a common set of tools and semantic definitions, your entire organization will be empowered to create data products on a scale.

David P Mariani is CTO and co-founder of AtScale, Inc.

DataMakers

Welcome to the VentureBeat community!

DataDecisionMakers is where experts, including the tech people who do data work, can share data-related insights and innovation.

If you want to read about the latest ideas and updated information, best practices and the future of data and data technology, join us at DataDecisionMakers.

You might even consider contributing an article of your own!