TL;DR: Microsoft announced the MAI model series, featuring highly efficient Mixture-of-Experts architectures integrated directly into VS Code and partner APIs.
Summary: Microsoft has introduced the MAI model series, including the 1T parameter MAI-Thinking-1 reasoning model and the developer-focused MAI-Code-1-Flash model. Both utilize sparse Mixture-of-Experts (MoE) architectures with 256K context windows to optimize active parameter counts down to 35B and 5B respectively. These proprietary models are being integrated directly into Visual Studio Code and GitHub Copilot to deliver higher performance at lower costs.
Why it matters: This shows Microsoft's focus on deploying massive yet highly sparse MoE architectures directly into developer workflows to cut down on latency and API costs. Keep an eye on how these integrations perform in VS Code compared to smaller open-weight alternatives like Qwen and Gemma.
Source: r/localllama