Generative AI: An opportunity to modernize your data strategy
The popularity of generative AI has led many CEOs to analyze how this innovation might provide their company with a competitive advantage as well as improved productivity. This technology will require managers to identify its best possible use cases within their companies, but also what it will require in terms of data, platforms, processes and personnel so it can be employed to full advantage.
For most of these businesses, integrating this technological advancement will involve customizing the use of various generative AI systems (ChatGPT, Gemini, etc.) based on their sector of activity, but also their particular business reality.
Competitive advantage will arise from an ability to adapt these new tools to each field of expertise within the organization (marketing, finance, customer service, human resources, etc.) to make them more accurate and relevant.
This challenge can’t be overcome without good quality data. Data are a strategic advantage in the customization of generative AI algorithms and models.
According to a study conducted by Databricks and MIT (CIO vision 2025: Bridging the gap between BI and AI), 72% of chief information officers think that data (specifically the lack of reliable data) represents the main obstacle to achieving their AI-related goals. But what they all agree on is the fact that AI development needs to be part of a company’s data strategy for 2025.
The results of this study have started us thinking about what the major guiding principles of a modern data strategy should be.
There are four main ideas to consider to maximize integration of generative AI into your organizational shift: data mesh, composable business, vector databases and data governance for the AI era.
Data mesh: Data life cycles based on field of expertise
The strategic advantage of generative AI will lie in the ability to adapt various models (algorithms) to a company’s internal data, and mainly to its fields of activity (marketing, operations, sales, customer service, products, etc.). In other words, generative AI models need to be contextualized to a company’s fields of expertise.
It happens that a data architecture based on fields of activity has been around for a few years now—it’s called data mesh. Data mesh was born out of the idea that it was hard for fields of expertise to quickly leverage their data due to long wait times caused by IT and BI (because of the centralization of data management within a single team).
Data mesh is also based on the assumption that only the field itself can leverage data to its full potential (given the field’s detailed understanding of that data). It’s an approach that encourages the decentralization of data management.
The four (simplified) principles of data mesh are:
-
Every field owns the life cycle of its data (from collection to activation, from personnel to technology)
-
Data need to be handled as the product of each field, with a usage guide, documentation, contracts and service and quality guarantees
-
Underlying infrastructure must enable self-service for the rapid design of data products; in other words, engineers and data scientists in that field of activity must be able to deploy their products/applications quickly
-
Federated governance between fields is necessary to avoid silos; data governance must be federated in order to facilitate the exchange of reliable data between fields, but also so that these fields will have interoperable components
The chart below illustrates a data mesh architecture by field.
The goal of data mesh is to enable the accelerated leveraging of data for each of a company’s fields of activity.
Composable business: Seeing your technological ecosystem and data as Lego
In 2024, it’s hard to figure out which generative AI system will be best for your company or field of activity. Many organizations will need to test and prototype their solutions in order to choose one that will be best adapted to their use case. These models must then be integrated into the infrastructure and processes already in place.
To be able to equip yourself with some amount of flexibility, you need to adopt composable thinking. This approach means building your ecosystem in a modular and iterative way in order to obtain needed flexibility for testing and choosing the best tools/solutions/platforms to support your company’s various processes.
This composable approach lets businesses adapt to change and integrate innovation more rapidly. The opposite of a composable system is an integrated/monolithic system. Such systems are extremely hard to change, because all the components are tightly interrelated. Changing an integrated system can sometimes involve changing the system in its entirety, which can be a costly, risky and lengthy undertaking.
In 2020, consulting and research firm Gartner published an article promoting the composable business. It pointed out that companies adopting this philosophy adapted to change more quickly. The guiding principles of composable systems are:
-
Adopting composable thinking: for example, viewing your data ecosystem as a collection of interchangeable components, from collection through to exploitation
-
Adopting a composable business architecture: ensure your organization is built in a flexible and resilient way so you can adapt quickly to change
-
Adopting composable technologies: Can your technology ecosystem respond to your needs both today and in the future? Creating your technology ecosystem in a composable way means you can change certain components without having to change the whole system.
The idea of the modern data stack tends to follow the same guiding principles as composable technologies by promoting the creation of data ecosystems in a modular way.
As a result, different vendors can be integrated (and changed) to support each stage of the data life cycle so long as they are easy to integrate and maintain.
Here are some examples of vendors for each stage of the life cycle:
-
Data sources (collection): can be composed from many sources: Google Analytics, Adobe, Salesforce
-
Data ingestion: can be supported by a range of vendors, such as Matillion, Fivetran, Supermetrics, Airbyte and Talend
-
Data cloud: can be supported by vendors such as Google BigQuery, Snowflake, Databricks, Amazon Redshift and Firebolt
- Data activation:
- Audience management: Census, Hightouch
- Visualization: Looker, Power BI, Tableau, Mode
- Generative AI applications: OpenIA, Google, Amazon, Meta
The field of marketing is an ideal candidate for this type of architecture given the diversity of the data consumed.
This type of system provides flexibility to organizations by allowing them to incrementally create their ecosystems while mitigating risk.
Gartner believes that by 2026, most cloud computing platforms will offer composable modules, in their marketplaces, that will fulfill most of the needs of organizations.
The popularity of composable thinking led to the rise of composable CDPs (architecture presented below):
The goal of the composable/modular approach is to offer flexibility and speed for integrating solutions and innovation.
Vector databases: The need to support generative AI (text, images, audio, video)
To customize the use of generative AI models (mainly large language models or LLMs, such as ChatGPT and Gemini), it’s often necessary to provide them with external data (for example, data from the organization’s field of expertise) to contextualize their answers.
The framework that aims to provide external data to generative AI models is called retrieval-augmented generation or RAG. These models can be based on knowledge bases from a specific domain in order to provide up-to-date, relevant, precise and reliable responses. The goal of RAG is to improve the effectiveness of LLMs.
These knowledge bases are often warehoused in vector databases (Redis Store, MongoDB Atlas Vector Search, BigQuery Vector Search, Meta FAISS, Snowflake Cortex, Chroma DB, Lance DB) for the processing of text, images and video.
This type of architecture supports use cases such as: questions and answers (Q&A), recommendation systems, image searches and document retrieval.
If you’re hoping to use generative AI within your organization, your 2024-2025 data strategy shouldn’t ignore the concepts of RAG and vector databases. Model tuning or optimization won’t be covered in this article, but it’s still an important tool in customizing this kind of technology.
Future architectures will combine traditional data warehouses (structured data) with vector databases, as illustrated in the diagram below.
A data lakehouse is a platform that combines a data warehouse with a data lake.
The goal of RAG and vector databases is to augment generative AI models in order to make them more precise and relevant to a company’s field of activity.
Data governance in the era of generative AI
The opportunities created by generative AI also come with ethical as well as security challenges.
In one of its reports (Top Strategic Technology Trends 2024), Gartner mentions five major principles for governing generative AI. It refers to this framework as Trust, Risk and Security Management (TRiSM). In simple terms, companies will need to refine their data governance to address:
-
measures put in place to ensure the reliability of data created by generative AI systems (ChatGPT, Gemini, LLaMA);
-
measures established to ensure fair, unbiased and ethical answers supplied to users;
-
measures deployed to ensure the robustness and security of applications and platforms;
-
measures that aim to ensure transparency in the use of data; and lastly
-
measures taken to ensure the security and protection of customer and company data.
I hope this article has provided you with a better understanding of the potential changes you need to test or perform as part of your data strategy, given the arrival of generative AI and its various models available on the market. Don’t hesitate to get in touch with us for more information on any of these topics.