As enterprise demand for building multi-agent systems continues to grow, data infrastructure services providing firm, Databricks, is updating its Mosaic AI Agent Evaluation module with a new synthetic data generation API that will help enterprises evaluate agents faster.
Multi-agent systems, otherwise known as Agentic AI, have caught the fancy of enterprises as these agents go further than just generating code or content for human review. Agentic AI systems can follow instructions, make decisions, and take actions much as a human worker would, without human intervention.
The new API, which is currently in public preview, is designed to speed up the agent development and testing process so that agents can be deployed in production faster.
Synthetic data generation is the process of creating artificial datasets that mimic real-world data and can be used to test or train agents or models.
The new Databricks API leverages an enterprise’s proprietary data to generate evaluation datasets tailored to suit the use case that the agent is being used for.
In contrast, the manual evaluation data-building process is time consuming and might not always be accurate to test the functionality of the agent.
The synthetic data generation API will also reduce development costs, according to the company, as it allows developers to quickly generate evaluation data – skipping the weeks to months of time required for labeling evaluation data with subject matter experts.
Databricks said its enterprise customers are already seeing the benefits of the new API. One such customer is engineered components manufacturer, Lippert, which used synthetic data to improve model response by 60%.
How does it work?
The API, essentially, works in three steps, including calling the API, asking it to generate the number of questions, and then setting natural language guidelines to assist the synthetic generation.
Once input is fed, the API helps generate a set of question-synthetic answer-source document groupings based on the enterprise data in the Agent Evaluation schema.
“Enterprises can then pass this generated evaluation set to mflow.evaluate(…), which runs Agent Evaluation’s proprietary LLM judges to assess the agent’s quality and identify the root cause of any quality issues,” Databricks explained in a blog post.
In order to review the results of the quality analysis, enterprises can use the MLflow Evaluation UI and through this UI can make changes to agents to improve quality, the company added.
The improved agent can be again tested by re-running mlflow.evaluate(…).
Databricks also offers enterprises the choice to analyze the synthetic data generated to be analyzed by subject matter experts.
“The subject matter expert review UI is a new feature that enables your subject matter experts to quickly review the synthetically generated evaluation data for accuracy and optionally add additional questions,” Databricks explained, adding that these UIs are designed to make business experts efficient in the review process, ensuring they only spend minimal time away from their day jobs.