Microsoft has released a new, updated set of its Phi small language models on Hugging Face, which it claims outperforms similar offerings from rival model providers, including Meta and Google.

In an update to the Phi 3 platform released in April, the cloud services provider unveiled three Phi 3.5 models under the open MIT License — Phi 3.5-MoE-instruct, Phi 3.5-mini-instruct, and Phi 3.5-vision-instruct.

The Phi-3.5-MoE-instruct model, according to the company, is a lightweight model built upon datasets used for Phi-3 synthetic data and filtered publicly available documents with a focus on very high-quality, reasoning-dense data.

The model offers multilingual support and comes with 128K context length (in tokens), the company said, adding that the model is intended for commercial and research use in multiple languages.

“The model provides uses for general purpose AI systems and applications which require: memory/compute constrained environments, latency bound scenarios, and strong reasoning (especially code, math and logic),” the model’s description on Hugging Face read.

“Our model is designed to accelerate research on language and multimodal models, for use as a building block for generative AI-powered features,” the description further read.

Across benchmarks rating models on reasoning and multilingual skills, such as BigBench, MMLU, and ARC Challenge, the MoE-instruct model, although with fewer parameters than rivals (6.6 billion) performed better than Llama 3.1-8B-instruct, Gemma 2-9b-It, and Gemini 1.5-Flash. However, it could not match the performance of OpenAI’s GPT-4o-mini-2024-07-18 (chat).

However, the company pointed out that the model is still fundamentally limited by its size for certain tasks.

“The model simply does not have the capacity to store too much factual knowledge, therefore, users may experience factual incorrectness,” it said, adding that this weakness can be resolved by augmenting Phi-3.5 with a search engine, particularly when using the model under RAG settings.

Microsoft used 512 Nvidia H100-80G GPUs to train the model over a period of 23 days on 4.9 trillion tokens as training data.

Similarly, the 128K token context length-supporting mini-instruct model also performed better than most rivals but fell behind OpenAI’s latest 4o-mini chat model.

The mini-instruct model is an update over the June 2024 instruction-tuned Phi-3 Mini release based on user feedback, the company said, adding that it used additional post-training data leading to substantial gains in multilingual, multi-turn conversation quality, and reasoning capability.  

The mini, which has 3.8 billion parameters and is a dense decoder-only transformer model using the same tokenizer as Phi-3 Mini, was trained on 512 Nvidia H100-80G GPUs over a period of 10 days on 3.4 trillion tokens.

Additionally, the company said that the third new model, Phi 3.5-vision-instruct, also outperformed rival offerings despite having fewer parameters, including Claude-3.5-Sonnet and GPT-4o-mini.

The model, which has 4.2 billion parameters and contains an image encoder, connector, projector, and Phi-3-Mini language model, supports 128K tokens and was trained on 256 Nvidia A100-80G GPUs over six days on 500 billion vision and text tokens.