TL;DR: A native LLM inference engine optimized for NVIDIA Blackwell architectures using custom kernels.
Summary: Lynn is a native LLM inference engine designed specifically for NVIDIA Blackwell GPUs. It features W4A8 and NVFP4 quantization, custom-written CUDA and Triton kernels, and supports Mixture-of-Experts (MoE) models. The engine also incorporates speculative decoding to accelerate performance.
Why it matters: It provides builders with a highly optimized, hardware-specific engine to maximize performance on NVIDIA's latest Blackwell GPUs. Developers targeting next-gen hardware should look into its custom CUDA and Triton implementations for low-latency inference.
Source: github.com