Generative AI model and repositories provider Hugging Face this week launched an alternative to Nvidia’s NIM (Nvidia Inference Microservices).
Hugging Face Generative AI Services, or HUGS, is the only available alternative to NIM presently.
NIM, which was first introduced in March and later rolled out in June as part of Nvidia’s AI Enterprise suite, was a first-of-its kind tool to help enterprises deploy generative AI foundational models across any cloud or data center by packing optimized inference engines, APIs, and support for custom or generic AI models into containers as microservices.
NIM caught the attention of developers as it was itself an alternative to the likes of vLLM, TensorRT LLM, and LMDeploy — all of which are frameworks and packages that help deploy foundational models for inferencing, but are arguably time-consuming to configure and run.
In contrast, NIM offers developers the option to quickly access a preconfigured setup for a foundational model via a container image in Docker or Kubernetes and connect to it using APIs.
HUGS, too, are optimized, zero-configuration inference microservices that are aimed at easing and accelerating the development of AI applications.
Hugging Face said the inference microservices offered via HUGS are built using open-source libraries and frameworks such as Text Generation Inference (TGI) and Transformers and can run models on GPU accelerators from Nvidia or AMD.
Support for AWS Inferentia and Google TPUs (tensor processing units) will be added soon, the company added. However, its blog post on the offering had no mention of support for Intel hardware.
No free HUGS, but there are cost advantages
A key difference between NIM and HUGS is the pricing, which suggests enterprises may make savings by choosing the new contender.
On Google Cloud and AWS, HUGS charges $1 per hour per container, where NIM charges $1 per hour per GPU, along with the Nvidia AI Enterprise suite license fee.
According to Docker’s documentation, by default “a container has no resource constraints and can use as much of a given resource as the host’s kernel scheduler allows,” suggesting that HUGS will cost less per container to operate.
Other availability options for HUGS
Other than AWS and Google Cloud, where HUGS can be deployed via the AWS Marketplace and GCP Marketplace, Hugging Face is offering access to HUGS via its Enterprise Hub — a platform to access models and build AI-based applications that can be subscribed to for a fee of $20/user/month.
Separately, for AWS, the company is offering a 5-day free trial period for developers to test HUGS for free.
HUGS is also available via DigitalOcean for free, but compute costs apply, the company said.
Only for open models
As of now, HUGS seems limited only to models with open weights (or open models, as the industry calls them), which isn’t the case with NIM.
HUGS supports 13 models: Llama-3.1-8B-Instruct, Llama-3.1-70B-Instruct, Llama-3.1-405B-Instruct-FP8, Hermes-3-Llama-3.1-8B, Hermes-3-Llama-3.1-70B, Hermes-3-Llama-3.1-405B-FP8, Nous-Hermes-2-Mixtral-8x7B-DPO, Mixtral-8x7B-Instruct-v0.1, Mistral-7B-Instruct-v0.3, Mixtral-8x22B-Instruct-v0.1, Gemma-2-27b-it, Gemma-2-9b-it, and Alibaba’s Qwen2.5-7B-Instruct.
The documentation page on HUGS shows that Hugging Face is expected to add support for models such as Deepseek, T5, Phi, and Command R soon. Other multimodal and embeddings models that are expected to be added soon includes Idefics, Llava, BGE, GTE, Micbread, Arctic, Jina, and Nomic.
Nvidia says NIM supports more, though, including its proprietary Nemotron models, models from Cohere, A121, Adept, Getty Images, Shutterstock, and open models from Google, Hugging Face, Meta, Microsoft, Mistral AI, Stability AI.
However, Nvidia’s NIM documentation shows that NIM is presently available for models, such as Code Llama 13B Instruct, Code Llama 34B Instruct, Code Llama 70B Instruct, Llama 2 7B Chat, Llama 2 13B Chat, Llama 2 70B Chat, Llama 3 Swallow 70B Instruct V0.1, Llama 3 Taiwan 70B Instruct, Llama 3.1 8B Base, Llama-3.1-8b-instruct, Llama 3.1 70B Instruct, Llama 3.1 405B Instruct, Meta Llama 3 8B Instruct, Meta Llama 3 70B Instruct, Mistral 7B Instruct v0.3, Mistral NeMo 12B Instruct, Mistral NeMo Minitron 8B 8K Instruct, Mixtral 8x7B Instruct v0.1, Mixtral 8x22B Instruct v0.1, Nemotron 4 340B Instruct, Nemotron 4 340B Reward, and Phi 3 Mini 4K Instruct.