It’s a problem that almost all enterprise AI projects face: accessing and governing data. Today’s AI project teams must overcome data silos, deal with copies and permutations of data, and face other challenges that complicate AI project goals and objectives.
Companies often copy their data in order to gather it in a single place, but that’s expensive and can lead to data security and compliance issues during the life cycle of the data. However, reasons still exist to consolidate that data. A data fabric can be an architectural alternative that many companies can pull from their toolbox and put to use in allowing them to:
- Access the data in place
- Manage the life cycle of the data
- Use automation to move the data
Here are three distinct areas –
- The data fabric concept and how to deploy and its approach
- How to bring the data fabric concept to life within your enterprise
- How to communicate to the leadership the value of data infrastructure and a data fabric
The data fabric is an architectural alternative to data consolidation.
The concept of the data fabric is guided by three ideas: accessing the data in place, managing the entire lifecycle of the data in a distributed way, and using automation to offer a convenient means to move the data.
AI goes beyond just automation.
We often see AI as automation and efficiencies, but AI supersedes this ability. AI can provide new ways to report and find connections and similarities in the data through data fabrics that provide BI and help us unlock new capabilities.
Data fabric is a design concept that serves as an integrated layer (fabric) of data and connecting processes. A data fabric utilizes continuous analytics over existing, discoverable, and inferenced metadata assets to support the design, deployment, and utilization of integrated and reusable data across all environments, including hybrid and multi-cloud platforms. Data fabric leverages both human and machine capabilities to access data in place or support its consolidation in the required cases. It continuously identifies and connects data from disparate applications to discover unique, business-relevant relationships between the available data points. A supply chain leader using a data fabric can add newly encountered data assets to known relationships between supplier delays and production delays more rapidly and improve decisions with the new data (or for new suppliers or new customers).
- Data fabric is not merely a combination of traditional and contemporary technologies but a design concept that changes the focus of human and machine workloads
- New and upcoming technologies such as semantic knowledge graphs, active metadata management, and embedded machine learning (ML) are required to realize the data fabric design
- The design optimizes data management by automating repetitive tasks such as profiling datasets, discovering and aligning schema to new data sources, and at its most advanced, healing the failed data integration jobs
If you are trying to put data to work, and it’s for data science so that you could train the model to do a prediction, then data fabric plays a key role. It could be customer segmentation, trying to have an attractive promotion that you’ve analyzed, just doing business analytics or trying to understand critical insights. Data fabric will help you understand the performance of your business. Today, most of our customers are copying data from wherever it originates, and they are consolidating it into a single place. The problem with that is, it’s expensive. It’s proliferating data that then causes all sorts of nasty ripple effects, data quality issues. If you’re to remediate in one place it’s hard to remediate in another place again. You will have governance and compliance problems, not to mention security issues because every single copy of this thing needs to be protected. In some cases, especially if it has PII data, it can’t be in certain locations in order to deal with the compliance obligations.
Managing the entirety of that life cycle of the data in a distributed way while essentially managing the governance and policies of it is required. There is a good reason to copy data and consolidate it into, say an authoritative source, like a data fabric, and make it easy through automation to onboard that data. So, the data fabric is an architectural alternative to the consolidated answer that most companies have had in their toolbox for putting data to work. And it is just guided by those three ideas. Accessing data in place, managing the entire life cycle of the data in a distributed way or in a distributed way with central governance, and then using automation to offer convenient means to move the data if you want.
The non-obvious cost would be a data breach. Imagine, most analytics are in one way or another, going to touch the customer data. What would it cost you if you had hundreds of thousands, maybe millions of rows in that excel sheet that had personal information on your customer, that you do analytics on. And because there was no governance or data protection on, it got disclosed to the world, like what would that cost?
As your data is increasing, and it has a bunch of processes that are getting in the way and a complete lack of automation to that process. That could be the core issue, but the data silo problem could also be the root cause of it. Having multiple steps and consensus across data stewards of multiple lines of business could be the underlying problem.You are unable to access data in place, use metadata, analyze the data quality of that, and standardize the information through data virtualization. With built-in governance, it will allow you to control who has access to this.
Data fabric should be compatible with various data delivery styles (including, but not limited to, ETL, streaming, replication, messaging, and data virtualization or data microservices). It should support all types of data users including IT users (for complex integration requirements) and business users (for self-service data preparation).
Knowledge graphs enable data and analytics leaders to derive business value by enriching data with semantics. It adds depth and meaning to the data usage and content graph, allowing AI/ML algorithms to use the information for analytics and other operational use cases.