The Microsoft Azure Incubations team is one of the more interesting components of Microsoft’s hyperscale cloud. It is something between a traditional software development team and a research organization working to build solutions to the problems of distributed systems at massive scale.
Those solutions might be extensions to Kubernetes, such as the KEDA (Kubernetes event-driven autoscaling) system, or developer tools such as Dapr and Radius. Their latest public release is a hybrid between systems management tools and a new application platform.
What is Drasi?
Announced in a recent blog post from Azure CTO Mark Russinovich, Drasi is a tool for detecting critical events and responding to them immediately. That response might be to reconfigure a platform architecture if there’s a hardware or software failure, or it might be to trigger a critical alert in an industrial IoT system, for example, running over pressure or even fire responses when a sensor detects an issue in a chemical process.
Most of the Microsoft Azure Incubations projects are open source, and Drasi is no exception. It’s already been submitted to the Cloud Native Computing Foundation (CNCF), with an Apache 2.0 license and a GitHub repository. You can find more details on its documentation site.
Event-driven architectures like this are a relatively common design pattern in distributed systems. Like other distributed development models, they have their own problems, especially at scale. When you’re getting tens or hundreds of events a minute, it’s easy to detect and respond to the messages you’re looking for. But when your application or service grows to several hundred thousands or even millions of messages across a global platform, what worked for a smaller system is likely to collapse under this new load.
At scale, event-driven systems become complex. Messages and events are delivered in many different forms and stored in independent silos, making them hard to extract and process and often requiring complex query mechanisms. At the same time, message queuing systems become slow and congested, adding latency or even letting messages time out. When you need to respond to events quickly, this fragile state of affairs becomes hard to use and manage.
That’s where Drasi comes in. It provides a better way to automate the process of detecting and responding to relevant events, an approach Microsoft describes as “the automation of intelligent reactions.” It is intended to be a lightweight tool that doesn’t need a complex, centralized store for event data, instead taking advantage of decentralization to look for events close to where they’re sourced, in log files and change feeds.
How does Drasi process changes?
Even though data is decentralized and stored in many different formats, Drasi lets you use familiar development techniques to build queries and set up triggers that respond to changes in the results of those queries. At the heart of this process are three concepts: Sources, Continuous Queries, and Reactions.
The Sources in a Drasi application are all the places where data is collated and where changes can be observed. These can be anything from a log file, a database update, or events passed through a publish-and-subscribe tool like Azure Event Grid, or even the output of an Azure Function.
Continuous Queries, written in Cypher Query Language (CQL), monitor Sources for changes in data, acting as switches that are triggered by a change. Once a query is triggered, the system sends a Reaction.
A Reaction can be as simple as an alert, or it could be an input that triggers a set of preconfigured processes. Those processes depend on what you’re using Drasi for. If it’s in an industrial IOT system, the Reaction might act on a set of hardware controls to shut down an out-of-control industrial process. In a systems administration support scenario, a Reaction might start a failover process to a disaster recovery site or a database replica. Drasi Reactions can be as simple or as complex as you need.
Bring all your events together
What’s perhaps most interesting about Drasi’s approach to event-driven computing is its support for what, in the past, would have been many different event management tools. A single Drasi instance can work with manually updated data alongside live telemetry. For example, Drasi might be able to read the maintenance logs for a set of machine tools along with live telemetry from the same devices. A query could monitor both a scheduled maintenance window and known telemetry that indicates possible issues (which itself might be an event raised by a machine learning application that uses sound to detect problems).
Instead of having separate alerts, Drasi is the glue that brings all these different systems together. Like many Azure tools, it’s very scalable, able to deliver results from single sites or across global organizations. It comes with a command line tool that wraps its various APIs into a way to manage Drasi resources. As all management is through APIs, there’s the opportunity to build your own management tool.
The heart of a Drasi application is a set of Continuous Queries. This is a very different way of working with data than traditional queries. By running queries continuously, Drasi can build a map of changes to its underlying data sources as they happen, with the ability to get point-in-time results as well as a dynamic feed that’s conceptually like SQL Server’s change feed, which delivers data to Azure Synapse analytics without needing complex ETL.
Build change queries with CQL
Working with multiple sources and attempting to get change data from them is much like working with a graph database, so it’s not surprising that Drasi uses a version of Neo4J’s Cypher graph database query language (CQL) to build its Continuous Queries. If you’re familiar with SQL, CQL supports many similar constructs to build its queries. You can use MATCH to find paths, WITH and WHERE clauses, along with common data types and properties. Much of what you need to build queries is here, as you’re working with a relatively constrained set of data sources.
The aim is to use CQL queries to describe the changes you’re looking for in your data. As it supports building logic into a query, you can build a single query that encapsulates both the data you’re looking at and its relationships with the rest of your data across all your sources. As CQL treats all your sources as a single interconnected graph, there’s no need to write complex joins to bring sources together. They’re all part of the same n-dimensional event space, which can be as richly populated or sparse as needed.
Microsoft has added its own Drasi-specific extensions to CQL. These include some interesting features that make you look at data in a different way—something that’s quite important when thinking about Continuous Queries. One feature is what Microsoft calls Future functions, which go beyond the existing temporal features to set future boundaries on data. They include the ability to set a time and check if a specific Boolean is true at that point, or if it’s true up to that point.
These are relatively simple functions, but they let you add new boundaries to your events. You can build a set of query expressions that evaluate to a Boolean and then use that result to trigger future events based on the way your system changes over time. You can use these functions along with the temporal functions, which let you see values at specific points in the past. With Drasi you’re now able to look at how key data changes over time, without having to write complex code to do it, building it all into functions inside your CQL queries.
Microsoft has provided a set of sample data in the shape of a PostgreSQL data set to help you get started with CQL, but we’re still missing development tools. A Visual Studio Code extension would be helpful to build and test queries while reducing the risk of errors. For now, it’s best to work with sample data from across your various event sources and with a set of expected outputs. Using lists and time-based operations will take some getting used to, especially if you’re planning on using Future functions.
Deliver Reactions to the world
Drasi outputs Reactions, which act based on the results of a Continuous Query. A Reaction can get data from more than one query, allowing you to deliver complex behaviors from relatively simple combinations of queries. Currently there are a limited number of Reaction types, but they should cover most scenarios. One key option is Azure Event Grid, which gives you many more onward actions. Others use the web-based SignalR protocol or work with Microsoft’s Dataverse line-of-business data platform.
One final type of Reaction, Debut, provides a continuously updated table of results from a Continuous Query, allowing you to explore how a query works against your data sources. This isn’t a tool for production; instead, it’s a way of helping developers understand how Continuous Queries work and how to structure event handling around their outputs.
Things that change are interesting for many different reasons, and Drasi has been designed to capture details of those changes and deliver them so that the information can guide actions. In some cases, Drasi might help us fix problems quickly, using preventative maintenance on hardware or software. In other cases, it might give us early warning of intrusions and other security breaches. What you do with it is up to you.
A lightweight framework for processing change data is one of those things that you didn’t know you needed until it arrived. That’s a good thing, as Drasi offers what looks to be a truly new and innovative way of working with systems that generate a constant flow of events so that you can identify those that really matter.
If you’re building cloud infrastructures and applications at scale, then Drasi should be on your list of tools to investigate—especially as it crosses the boundaries between application development, platform engineering, and system management.