Top Open-Source AI Models
A curated guide to the best open-source models across every major AI category — from text generation and image synthesis to audio and video. Cut through the noise and find what actually works.
Text Generation Models
Frontier open-source LLMs for coding, reasoning, and agentic workflows.
-
01Qwen Latest
Providing state-of-the-art performance across multimodal and reasoning tasks.
-
02DeepSeek Latest
Next-generation model from DeepSeek featuring advanced architecture and groundbreaking benchmark results.
-
03GLM 744B MoE
Zhipu AI's flagship model leading in agentic engineering and long-horizon coding tasks. Features a 40B active parameter count from a massive 744B total.
-
04Gemma 4 2B–31B
Google's hybrid-attention family excelling in reasoning, coding, and on-device multimodal use, spanning from lightweight to full-size variants.
-
05Kimi-K2.5 1T MoE · 32B active
Moonshot's multimodal MoE built for visual coding and orchestrating agent swarms of up to 100 parallel sub-agents.
-
06MiniMax-M2.7 Self-improving
A self-improving agentic LLM that tops SWE-Pro benchmarks, designed for real-world software engineering and autonomous task completion.
-
07MiMo-V2-Flash 309B MoE · 150 t/s
Xiaomi's efficient 309B MoE (15B active) delivering 150 tokens/sec throughput, ideal for high-volume coding agents in production environments.
Image Generation Models
Open-source diffusion models for photorealism, fine-tuning, and real-time generation.
-
01FLUX.1 [schnell] Speed · Consumer GPU
The fastest open-source model balancing quality and speed for consumer GPUs — ideal for quick iteration and batch generation.
-
02FLUX.1 [dev] Top Benchmarks
Black Forest Labs' flagship model and current benchmark leader for high-fidelity generation of complex scenes and fine details.
-
03Stable Diffusion 3.5 Large Fine-tuning King
The reigning ecosystem king for fine-tuning, editing workflows, and community LoRA support — versatile and widely adopted.
-
04GLM-Image Typography · Apache 2.0
Specialist model for bilingual infographic typography and text-in-image generation, released under the permissive Apache 2.0 license.
-
05HiDream-I1-Full Photorealism
The raw photorealism expert for premium high-resolution outputs, delivering exceptional detail and image coherence.
-
06SANA-Sprint 1.6B Low-VRAM · Fast
Ultra-efficient 1.6B parameter model designed for low-VRAM setups and rapid experimentation without sacrificing quality.
-
07Z-Image-Turbo 6B · Edge & Batch
A lightweight 6B real-time generator optimized for edge deployments and high-throughput batch processing workflows.
-
08HunyuanImage-3.0 Research-grade
Tencent's research-grade model for advanced coherence and diversity, pushing the limits of compositional image generation.
Image-to-Video Models
Open-source models that transform still images into high-quality video sequences.
-
01LTX-2.3 4K · 50fps · Audio
The leading open-source image-to-video model with native 4K at 50fps and synchronized audio support for production-ready results.
-
02LTX-2.3-GGUF 21B · Quantized
A quantized 21B-parameter variant of LTX-2.3 optimized for efficient inference on consumer hardware with minimal quality loss.
-
03WAN2.2-14B-Rapid 14B MoE · Local
Rapid all-in-one 14B image-to-video model using a MoE architecture for fast local runs without requiring cloud infrastructure.
-
04Wan2.2-I2V-A14B-GGUF 480p/720p · Mid-GPU
A 14B quantized Wan2.2 variant tailored for 480p and 720p video generation on mid-range GPUs at an accessible cost.
-
05HY-OmniWeaving Multi-style · Tencent
Tencent's omni-modal image-to-video model with multi-style weaving capabilities for creative and cinematic output.
-
06LTX-2.3-Transition-LORA LoRA · Scene Transitions
A LoRA fine-tune specifically trained for smooth, cinematic scene transitions within LTX-2.3 video generation pipelines.
Best Image-to-Text Models
OCR, captioning, and vision-language models for extracting meaning from images.
-
01GLM-OCR Best OCR 2026
The top open-source OCR model of 2026 for speed and accuracy on complex documents, tables, and structured layouts.
-
02nemotron-ocr-v2 NVIDIA · Multilingual
NVIDIA's high-precision OCR excelling at scene text extraction and multilingual recognition across diverse real-world conditions.
-
03Falcon-OCR TII UAE · Efficient
An efficient OCR model from TII UAE built for reliable text extraction in varied real-world image conditions.
-
04BLIP Large (Captioning) Salesforce · Captions
Salesforce's BLIP large model for detailed, high-quality image captioning — widely used as a strong captioning baseline.
-
05trocr-base-handwritten Microsoft · Handwriting
Microsoft's TrOCR base model specialized for accurate transcription of handwritten text across different writing styles.
-
06manga-ocr-base Japanese Manga OCR
A specialized OCR model fine-tuned for extracting Japanese text from manga panels and comic book images.
Best Audio Generation Models
TTS, voice cloning, music generation, and speech recognition open-source models.
-
01Qwen3-TTS TTS · Best Overall
The best overall TTS model balancing quality and speed — ideal for production deployments that need both natural voice and fast inference.
-
02Fish Speech Voice Cloning
Highly capable voice cloning model delivering realistic, high-fidelity synthetic speech from short audio references.
-
03CosyVoice 3.0 Multilingual · Streaming
A very solid multilingual TTS model with streaming support, making it great for real-time applications across multiple languages.
-
04Kimi-Audio Multimodal Audio
A strong multimodal audio model supporting expressive voices, useful for scenarios that combine speech generation with other modalities.
-
05ACE-Step 1.5 Music Generation
Currently the best open-source music generation model, capable of producing coherent, full-track compositions from text prompts.
-
06FunASR ASR · Multilingual
A strong multilingual speech recognition model with streaming capabilities, widely used for transcription pipelines.
-
07AudioX Multimodal → Audio
The most complete multimodal-to-audio stack, supporting text, image, and video inputs to synthesize matching audio output.
-
08NVIDIA A2SB Audio Restoration
NVIDIA's model for audio restoration and inpainting — best in class for cleaning up degraded speech and background noise removal.
Coding Models
Open-source LLMs specialized in code generation, debugging, and software engineering.
-
01Qwen State-of-the-art
Leading open-source coding model offering unparalleled performance in code generation, refactoring, and technical reasoning.
-
02DeepSeek Advanced MoE
A highly efficient Mixture-of-Experts model that excels in complex algorithmic tasks and repository-level code understanding.
-
03StarCoder BigCode
The latest in the StarCoder family, trained on a massive multilingual codebase with permissive licensing for enterprise use.