Data fabrics have gained importance over the last year as more organizations seek to leverage their data and intellectual property in generative AI solutions. Data fabrics help address the challenges of centralizing data across business units and data sets housed in public clouds, data centers, and SaaS solutions. As a result, more AIs, machine learning models, and people can use real-time data faster and easily.
When I wrote about data fabrics, data meshes, and cloud databases last year, I focused on how data leaders could explain these technologies to business executives without drowning in jargon. I empathized with chief data officers facing pushback from executives who recall similar investments in big data, data lakes, lakehouses, and cloud migrations.
The question for many large enterprises may not be whether they need a data fabric but how to evaluate them; which ones address business needs; and how to implement them efficiently. Evaluating new vendor offerings is not a trivial undertaking; the 2024 Forrester Wave on Enterprise Data Fabric reports two dozen new vendors offering fabric capabilities since their 2022 evaluation.
Why large businesses need data fabrics
Let’s consider a couple of business scenarios where data fabrics would provide value.
First, a large global manufacturer running SAP for its financials seeks to build end-to-end genAI-enabled operational workflows with data in its other enterprise systems, SaaS platforms, and cloud databases. The company needs a seamless way to connect these different data sources to perform real-time analytics and enable employees to use genAI prompts to query for information. SAP Datasphere is their solution to the integration challenges.
The second example is a government agency that uses a low-code platform for case management and interdepartmental workflows but now needs to integrate with data stores in its human resources and financial systems. The company investigates two low-code business process automation platforms, Appian Data Fabric and Pega Process Fabric. Both help integrate data and workflows between their platforms and the wider ecosystem of enterprise solutions.
Other data fabric solutions featured in Forrester’s Wave are Cloudera, Informatica, Denodo, Google, Hewlett Packard Enterprise IBM, InterSystems, K2view, Microsoft, Oracle, Qlik, Solix Technologies, Teradata, and TIBCO Software.
“A data fabric integrates various structured and unstructured data sources to provide a unified view and access across your enterprise to accelerate business insights,” says Armon Petrossian, CEO and co-founder of Coalesce. “When implementing a data fabric, it’s important to consider scalability to handle large data volumes, flexibility for different data types, and robust security measures.”
Data fabrics take an application- and people-centric approach by centralizing access and providing management services. A data fabric answers the question: How can data engineers simplify standard access patterns for consuming applications without additional data management and excessive engineering work?
“A data fabric is a solution designed to integrate, manage, and orchestrate data across diverse sources and environments,” says Kaycee Lai, CEP of Promethium. “It delivers a unified and consistent view of the relevant data in an enterprise coupled with capabilities enabling seamless data discovery, virtual data integration, and data product delivery.”
Lai shares three indicators for organizations needing a data fabric:
- The company is experiencing data silos and fragmentation.
- Business users require real-time analytics for immediate decision-making.
- Leadership wants to enable generative AI and empower self-service analytics for business users.
“When you decide to go with a data fabric architecture, you are going with a central data strategy,” says Hema Raghavan, head of engineering and co-founder at Kumo AI. “If your company has organized itself into lines of businesses (LOBs), and if data, insights, and models from one LOB can help another, a data fabric architecture will help you quickly realize value across different parts of your enterprise.”
How data fabrics differ from data integration platforms
Many organizations have already invested in data integration platforms to help move data between databases, data lakes, and other systems. Data pipelines and data streaming technologies introduce automation and real-time data processing capabilities, while integration platforms as a service (iPaaS) help connect data and workflow across systems.
So, how are data fabrics different from these other types of platforms?
“A data fabric is a combination of data architecture and dedicated software solutions that connect, manage, and govern metadata and data across different IT systems and business applications,” says JG Chirapurath, chief marketing and solutions officer at SAP BTP. “Implementing a data fabric strategy empowers an organization’s data users to access data in real-time, maintain a comprehensive source of an organization’s collective knowledge, and automate their data management processes.”
So, while real-time data integration and performing data transformations are key capabilities of data fabrics, their defining capability is in providing centralized, standardized, and governed access to an enterprise’s data sources.
“When evaluating data fabrics, it’s essential to understand that they interconnect with various enterprise data sources, ensuring data is readily and rapidly available while maintaining strict data controls,” says Simon Margolis, associate CTO of AI/ML at SADA. “Unlike other data aggregation solutions, a functional data fabric serves as a “one-stop shop” for data distribution across services, simplifying client access, governance, and expert control processes.”
Data fabrics thus combine features of other data governance and dataops platforms. They typically offer data cataloging functions so end-users can find and discover the organization’s data sets. Many will help data governance leaders centralize access control while providing data engineers with tools to improve data quality and create master data repositories. Other differentiating capabilities include data security, data privacy functions, and data modeling features.
Business and technical benefits of data unification
Data unification implies a broad set of capabilities for business end-users and data professionals. Business leaders seek simplified and self-service capabilities, while data professionals need automation and operation capabilities to manage the organization’s disparate data sets and data types in standard ways. For organizations with many data sources and platforms, unification can efficiently connect trustworthy data with greater business capabilities.
“A robust data fabric revolutionizes data exploration by integrating industry best practices, ensuring structured and reliable processes,” says Hillary Ashton, chief product officer at Teradata. “This intelligent approach enhances the trustworthiness of your data, ultimately driving greater business value.”
One way to evaluate and justify data fabric investments is to review the complexities, cost, and time to make data available for data science initiatives. Data scientists and engineers spend 50% to 80% of their time on data wrangling, and data unification efforts can help reduce repeat efforts to join and cleanse data sources.
“The reality is that more than half of AI projects fail to move into production due to the lack of a solid enterprise data foundation,” says Midhat Shahid, VP of product management, data fabric, IBM. “Without a unified view of data across disparate silos and systems, organizations struggle to integrate and manage their data effectively. A data fabric architecture is essential for organizations to unlock the value of data across hybrid cloud IT environments.”
Unification must offer IT and data professionals options for working across different data types, out-of-the-box integrations with common platforms, automation capabilities to standardize datasets, and tools for integrating with application development and data science initiatives.
“Data unification means the ability to collect all structured, unstructured, and semi-structured data in a single data catalog view whether or not the data is physically stored on the platform,” says John Ottman, executive chairman of Solix Technologies. “With this unified data capability, practitioners can establish data governance and ACID transactions with version control throughout the data lifecycle. Data fabrics provide value by enabling data transformations required by downstream applications such as machine learning, advanced analytics, generative AI, and other NoSQL applications seeking to monetize enterprise data.”
Challenges of implementing data fabrics
Data fabrics sound too good to be true, so I asked professionals to share some of the implementation challenges.
“Lots of businesses implementing data fabric significantly underestimate the complexity of their existing data architecture and simply dive into data fabric solutions without a comprehensive understanding of their data silos,” says Ashton of Teradata. “The second biggest mistake is overlooking the importance of data governance, trust, and security, which are critical elements for ensuring data quality and compliance.”
To successfully implement a data fabric solution, IT teams must define a vision statement, outline objectives, prioritize business needs, and evaluate platforms’ technical capabilities. Because centralization and unification are the goals, ensuring governance and security are at the forefront of planning for a data fabric implementation.
Defining data requirements and underlying models is one area to dive into. Jay Allardyce, general manager of data and analytics at insightsoftware, says, “Despite offering a standardized approach, most data fabric solutions initially lack domain-specific context.”
For example, enterprise resource planning systems (ERPs) store rich information on the organization’s financials, products, and supply chains, while customer data platforms (CDPs) help centralize customer and prospect information from multiple marketing and sales systems. To what extent can data fabrics represent the rich and interconnected data stored in these domains?
“Ultimately, there isn’t going to be one data fabric that works for everyone’s needs because data is as diverse and unique as the people using it,” says Anais Dotis-Georgiou, lead developer advocate at InfluxData. “Toting that one data fabric could suffice for any organization is like saying that one supply chain could fit every business. Regardless of your domain, you’ll need experts who can understand the idiosyncratic features of the data, the unique challenges associated with data engineering, and how to leverage that data for meaningful data science tasks.”
Another challenge for technology teams is not paying enough attention to change management and end-user adoption.
David Cassel, author and CTO at 4V Services, says, “Data owners may fear security requirements may not be respected or that sharing data may threaten their role in the organization. It’s important to convince them they won’t lose control as more people benefit from their data.”
Identifying what data should be centralized and implementing data governance best practices are essential steps in the implementation plan.
“Organizations need to break down data silos by automating the integration of essential data and increase metadata maturity to continuously catalog, profile, and identify the most frequently used data,” says Emily Washington, SVP of product management at Precisely. “It’s also critical to establish robust data governance policies and practices to ensure data quality, security, and compliance, and create user-friendly ways to make that data readily available for confident decision-making across the business.”
One more recommendation comes from Jerod Johnson, senior technology evangelist at CData. “You’ll need to work with your data users and determine when they need real-time data and when historical data is better, and you’ll need to design your system, policies, and processes accordingly,” he recommends.
Who needs a data fabric?
“Data fabrics are expensive, but don’t let that distract you from the role that real-time data with low latencies can play in enhancing your customer experience,” says Khawaja Shams, founder and CEO of Momento.
Any data unification initiative connecting hundreds or more data platforms, applications, SaaS, and other services isn’t easy, fast, or inexpensive. If they were, we would have solved the challenges already with data warehouses, big data platforms, data lakes, lakehouses, and other data management platforms.
But AI is driving up the importance of unifying data, and platforms are doing a little bit of unification themselves by bringing data integration, data ops, automation, self-service business capabilities, and data governance under one data fabric umbrella. Organizations feeling the weight of more data and friction in delivering analytics capabilities should investigate data fabrics as a potential solution.