Microsoft uses its events to lift the hood on Azure’s hardware and the infrastructure it uses to manage and run its services. Ignite 2024 was no different. Azure CTO Mark Russinovich’s regular Inside Azure Innovations presentation went into some detail about how Microsoft is working to make its data centers more efficient.

Data center efficiency is becoming critically important, as hyperscalers like Azure are now a significant part of the load on the network, especially with the power requirements of large generative AI models such as ChatGPT. Much of that load is part of the cost of training AIs, but inference has its own costs. Microsoft has made ambitious climate goals, and increasing data center efficiency is key to meeting them.

Under the hood with Azure Boost

Filling a data center with racks of servers isn’t the best way to run them. Azure and other cloud platforms have a different view of their hardware than we do, treating them as discrete elements of compute, networking, and storage. These are put together in the basic building blocks of the service, virtual machines. No one gets direct access to the hardware, not even Microsoft’s own services; everything runs on top of VMs that are hosted by a custom Windows-based Azure OS.

But virtual machines are software, and that makes it hard to optimize the stack. Microsoft and its cloud competitors have been working to remove these dependencies for years, using the Open Compute Project to share hardware solutions to data center problems. Some of Microsoft’s solutions have included external controllers for NVMe memory and hardware-based network compression tools.

Inside Azure, Microsoft has been developing hardware-based tools to offload functionality from its Azure Hypervisor, with a range of improvements it’s calling Azure Boost. This adds a new card to its servers hosting networking and storage functions, as well as improved I/O capabilities. Azure Boost sits outside the tenant boundaries, so its functions can be shared securely by everyone using the same server to host VMs.

More FPGAs in the cloud

Russinovich showed one of the first production cards, built around an Intel Agilex FPGA. Using FPGAs for hardware like this allows Microsoft to develop new versions of Azure Boost and deploy it to existing servers without requiring new cards and extended downtime. The card itself runs Azure Linux (the new name for CBL-Mariner) on a set of Arm cores.

Azure Boost hardware has several distinct roles focused on improving VM performance. One use is to accelerate remote storage, delivering it via hardware NVMe interfaces rather than hypervisor-managed SCSI. The result is a significant speed improvement, offering 15% more IOPS and 12% higher bandwidth. Although it may not seem to be as much as it could be, when you’re dealing with a data center’s worth of VMs and remote storage, Azure Boost allows Microsoft to run more virtual machines on the same hardware.

Many Azure VMs use local storage, and Azure Boost has even more of an effect here, especially when hosting Kubernetes or similar containerized workloads. Performance jumps from 3.8M IOPS to 6.6M, and storage throughput from 17.2GBps to 36GBps. It’s not just storage that gets a boost, overall network performance through Azure Boost’s dual top-of-rack links now allows up to 200Gbps throughput, which is a nine-times improvement.

One of the key requirements for a cloud data center is making sure that all hardware is used as much as possible, with minimum downtime for infrastructure updates. Azure Boost helps here too, avoiding must-move workloads between servers for a simple update to server network hardware.

Russinovich demonstrated updating the network stack on Azure Boost, which can be done in under 250ms, with minimal network freeze and no effect on current connections. At the same time, Azure Boost can host complex software-defined networking rules, speeding up complex policies. The aim here is to be able to scale out the networking stack on demand.

Improving Azure’s networking hardware

Dynamic scaling of the Azure networking stack starts with custom smart switch hardware, based on Microsoft’s SONiC software-defined networking (SDN) stack. Azure is now deploying its own SDN hardware, using this new smart switch software, Dash, with custom data processing units (DPUs) to offload processing. This allows the SDN appliance to manage more than 1.5 million connections per second. To increase performance, all Azure needs to do is add more DPUs to its SDN appliances, ready to support demand.

Hardware innovations like these support software innovations and how Microsoft runs its platforms. As Russinovich noted, “We believe the future of cloud is serverless,” and hardware features like these allow Azure to quickly add capacity and instances to support serverless operations. I recently wrote about one of these new features, Hyperlight, which Russinovich described in his Ignite session. Other tools he touched on included Dapr, Drasi, and Radius.

Supporting secure cloud workloads at scale

One area these technologies are being used in is Azure Container Instances (ACI), Microsoft’s managed serverless Kubernetes. Here you’re using a new version of Azure’s virtual node technology to create standby node pools to support bursty workloads, adding capacity as needed.

ACI’s virtual nodes can be connected to Azure Kubernetes Service to scale out Kubernetes workloads without the expense associated with having scaling nodes ready to run. The new standby nodes in this mode allow Kubernetes to launch new containers quickly. Russinovich showed a demo of ACI launching 10,000 pods in around 90 seconds.

One interesting feature of ACI is that its pods are designed to be what Russinovich calls “hostile multitenant safe.” Workloads running on the same pod are isolated from one another, so you can use this technique to support many different users. The implication, of course, is that this is how Microsoft runs many of its Azure services as well as its serverless Azure App Service application platform. It’s likely how other non-Microsoft services take advantage of Azure’s scale. You can see this tool being used by big customers like OpenAI to host inferencing instances for ChatGPT and other services.

Another ACI feature Russinovich detailed was NGroups. NGroups allow you to group together sets of containers and then manage them as a group. For example, you can use NGroups to set up a deployment of an application across several availability zones. If one fails, it will automatically restart, reducing the amount of management code you need to deploy for an ACI application. Interestingly ACI and NGroups are going to be a target for the Radius application definition and deployment framework, taking it beyond its Kubernetes roots.

Keeping computing confidential

Russinovich described a set of new confidential computing features, starting with a new addition to Azure’s server hardware. Until recently Microsoft had relied on third-party hardware security modules (HSMs) to manage keys. It has now introduced its own integrated HSM, which has a local interface for VM guest OSes. This ensures keys are never exposed when they cross the hypervisor boundary or left in the virtual machine memory where they could be recovered after a VM has been shut down.

At the same time, Microsoft is extending its confidential computing trusted execution environments (TEEs) to GPUs. Here GPU code runs in its own TEE alongside a trusted VM. Data is exchanged via encrypted messaging channels. This approach secures OpenAI inference as part of what Microsoft calls OpenAI Whisper. Here the entire inference process is encrypted, from your prompt to the GPU and back again.

Using Azure to share data confidentially

The same basic architecture hosts Azure Confidential Clean Rooms, where organizations can secure both code and data, allowing them to share functionality without exposing data to each other.

So, if I am a company with an AI model, and a customer wants to fine-tune the model with their own confidential data, I can set up a clean room with explicit policies for what can be done inside its encryption. My customer uploads their data encrypted with the clean room’s keys and runs an operation on both my data and theirs.

If the operation is approved by the clean room policies, it’s run, delivering the results to where the policies require. If it’s not, it’s blocked. The idea is that data can be shared without being exposed, and results are delivered only to the party that runs a trusted operation on that shared data. The resulting fine-tuned model can then be evaluated by the AI company before being delivered to their customer.

There’s a lot to unpack around Azure Confidential Clean Rooms, but at first glance, they appear to be an intriguing answer to questions about sharing data in highly regulated environments, for example allowing two sides in a legal dispute to work on the same set of e-discovery data without either side knowing how the other is using that data. Similarly, two companies involved in a merger or an acquisition could use a Confidential Clean Room to share sensitive business data without exposing customer data or other commercially sensitive data.

Russinovich’s Ignite sessions are one of the highlights of the conference. It’s always interesting to learn about the infrastructure behind the Azure portal’s web pages. Beyond that, lifting the hood on Azure also allows us to see what’s possible beyond simply lifting and shifting existing data center workloads to the cloud. We’re getting ready for a serverless future and what we can build with it.