Lynn LLM Inference Engine for NVIDIA Blackwell

Tools OpenSource Architecture

TL;DR: A native LLM inference engine optimized for NVIDIA Blackwell architectures using custom kernels.

Summary: Lynn is a native LLM inference engine designed specifically for NVIDIA Blackwell GPUs. It features W4A8 and NVFP4 quantization, custom-written CUDA and Triton kernels, and supports Mixture-of-Experts (MoE) models. The engine also incorporates speculative decoding to accelerate performance.

Why it matters: It provides builders with a highly optimized, hardware-specific engine to maximize performance on NVIDIA's latest Blackwell GPUs. Developers targeting next-gen hardware should look into its custom CUDA and Triton implementations for low-latency inference.

Source: github.com