Generative AI has inspired a surge of interest in using data to improve the accuracy of business decisions. Business managers, data analysts, and citizen data scientists can now use prompts instead of SQL queries to ask questions, interact with large language models rather than dashboards, and scan ML-generated recommendations instead of exploring data for insights.
According to the 2024 AI at Wharton report, 72% of respondents were using genAI at least once a week. Over 80% of respondents working in IT, business intelligence, customer service, marketing, operations, and product development stated that using genAI had a medium-to-high impact in their work.
Data teams and specialists—including data scientists, engineers, architects, and data governance specialists—should take the opportunity to provide more data services to departments adopting genAI. These early and mid-adopters are using genAI tools, automation, machine learning capabilities, and data visualization to redefine the future of work.
According to Deloitte’s State of Generative AI in the Enterprise report (Q3/2024), 75% of organizations have increased their technology investments for data lifecycle management to support genAI initiatives. The top actions taken include enhancing data security, improving data quality, updating governance frameworks, and increasing collaboration with cloud service providers or IT integrators.
“Data teams are transforming the future of work within their organizations by democratizing data access and ensuring a solid foundation for data-driven decisions,” says Irfan Khan, president and chief product officer of SAP Data & Analytics. “Through the management, governance, and analysis of data, they do more than automate calculations or create dashboards; they uncover deeper insights and help employees perform their tasks more efficiently while reducing the backlog of demands on resource-strapped IT departments.”
Below are five ways data professionals can support data discovery and transformation for business teams adopting generative AI.
Make data security non-negotiable
Security is a growing challenge for data governance. According to a recent third-party risk management study, 61% of companies reported a third-party data breach or security incident—a 49% increase over the last year. Data access governance is a critical first step to protecting the organization as business teams aim to become more data-driven while leveraging LLM capabilities.
“Imagine your data environment as a sprawling mansion—everyone wants a key, but you can’t just hand out a master key to every room,” says Amer Deeba, GVP of Proofpoint DSPM Group. “Data access governance is about giving each user the exact key they need; no more, no less.”
Deeba recommends, “Start by discovering and cataloging all your data assets so you have a clear understanding of what’s stored, where, and its sensitivity. With this foundational insight, you can enforce least privilege principles, ensuring users access only what they need, supporting zero trust, and minimizing risks to valuable and sensitive information.”
When there’s a high business demand for capabilities, data teams have much more opportunity to require non-negotiable data practices such as improving unstructured data security, performing third-party risk assessments, and defining AI governance policies.
Extend data quality to LLM document processing
Data teams are responsible for ensuring that unstructured data sources go through data cleansing, preparation, and cataloging as more business teams want to use them in RAGs and LLMs.
“The future of work depends on data-informed decision-making, with prioritization exercises often anchored in the accuracy and timeliness of data,” says Jeremy Kellway, VP of engineering for analytics, data, and AI at EDB. “Data teams must ensure that the data feeding analytics and AI applications truly reflect the organization’s goals, and in RAG AI applications, documentation prep is a critical step in determining what data is appropriate to drive meaningful outputs.”
Steps to create robust data pipelines for unstructured data include entity extraction, sentiment analysis, and bias detection. Before LLM technology, natural language processing for data extraction required a mix of document parsing, keyword searches, and leveraging specialized algorithms for sentiment and bias. Generative AI and machine learning offer more advanced capabilities for document processing.
“Employing AI at all levels of the data pipeline can jump-start new projects and get them to provide business value faster,” says Colin Dietrich, data scientist at SADA. “AI and ML can act as accelerators throughout the data warehousing, curation, and publishing processes. They can automate the creation of derived data, improve predictive algorithms, and enhance decision-support products with natural language.”
Empower citizen data scientists by centralizing data
Going beyond security non-negotiables and LLM document processing, data teams should consider their data management strategies and how to enable easier and faster access to data sources. Among the data management technologies architects consider are data warehouses, data lakes and lakehouses, and data fabrics. Regardless of the technology, ease of use for citizen data scientists and business teams is key.
“Data fabric, an architectural approach simplifying data access and enabling quality data for real-time analytics, is transforming how teams work by enabling citizen data science—empowering more departments to create, access, and leverage data through user-friendly dashboards,” says Midhat Shahid, VP of product management at IBM. “By fostering a self-service culture, they equip every department to contribute to and act on data-driven decisions, creating a scalable business culture grounded in data.”
Before LLMs, the primary use cases for citizen data scientists were developing dashboards, conducting data discovery steps on new data sources, and performing ad-hoc queries. Today, business teams and data scientists have expanded needs, including developing RAGs, embedding knowledge in SaaS LLMs, and leveraging AI agents. Data teams should have APIs available to primary data sources and knowledge repositories available to use in these and future use cases.
“Integrating LLM knowledge with enterprise data unlocks predictive insights and enables real-time decisions, turning information workers into proactive decision-makers and catalysts for innovation,” says Ariel Katz, CEO of Sisense. “Data teams must evolve from gatekeepers to enablers, offering data API services that abstract complexity and empower every creator—whether pro-code, low-code, or no-code—to embed analytics effortlessly.”
APIs are not just for accessing data sources. When data teams create visualization components, machine learning models, RAGs, and AI agents, having robust and easy-to-use APIs should be the first way to deliver the service.
Michael Berthold, CEO of KNIME, says having guardrails around data quality and access is important before putting models in production. “Companies are realizing that models can make bad predictions or leak sensitive information. Effective tools help govern data flow, model use, and add safeguards to reduce these risks.”
Establish data marketplaces to simplify data discovery
Data teams should consider citizen data scientists as one of their end-user personas, but other less technically advanced business users also must be able to discover and access data sources. Using data catalogs and creating data dictionaries is an important first step for enabling broader data access. In the process of establishing data marketplaces, organizations can take the opportunity to scale their self-service data and AI programs.
“Layers of IT and governance bureaucracy are slowing down data access and making it harder to speed new innovations, improve supply-chain logistics, and deploy innovative AI applications,” says Moritz Plassnig, chief product officer at Immuta. “With the acceleration and adoption of AI, killer apps are no longer the focus; data is the new app, and data teams have the power to enable anyone in the organization to become data consumers by cultivating an internal data marketplace that automates discovery and access, while still providing enterprise-grade governance and security.”
Data marketplaces can be an accelerating capability in industries where integrating several primary high-volume data sources is needed for many departmental use cases. Companies in manufacturing, construction, energy, and other industrials can use data catalogs and marketplaces to aggregate and simplify using real-time data sources for decision-making in marketing, field operations, supply chain, finance, and other departments.
“Data teams are essential in industries like manufacturing, where data is abundant but hard to navigate,” says Artem Kroupenev, VP of strategy at Augury. “Their role isn’t just about making data operational; it’s about empowering everyone to become a data scientist by ensuring data is accessible, easy to use, and impactful.”
Develop data products that foster collaboration
Marketplaces aren’t only for discovering, accessing, and integrating data sources. Data teams may now consider their advanced dashboards, machine learning models, LLM capabilities, and AI agents as data products and manage them as product development initiatives. Each product has a defined customer segment, value proposition, and strategic objective, which can be defined in a vision statement and managed through a product roadmap.
Pete DeJoy, SVP of products at Astronomer, adds, “The concept of data products has evolved from a buzzword into a crucial element of modern data-driven organizations. This alignment with physical product and supply chain analogies helps clarify the end-to-end data lifecycle, bridging communication gaps between technical and non-technical teams.”
As more business teams become data-driven and AI becomes an increasingly important business capability, the lines separating data and business teams are blurring. The future of work requires data teams to restate their mission and deliver enhanced data governance, dataops, marketplaces, and data products that service more departments and use cases.