🤖 Open-Source AI

Top Open-Source AI Models

A curated guide to the best open-source models across every major AI category — from text generation and image synthesis to audio and video. Cut through the noise and find what actually works.

🧠 Text 💻 Coding 🎨 Images 🎬 Video 📄 OCR 🎙️ Audio

🧠

Text Generation Models

Frontier open-source LLMs for coding, reasoning, and agentic workflows.

01

Qwen Latest

Providing state-of-the-art performance across multimodal and reasoning tasks.
02

DeepSeek Latest

Next-generation model from DeepSeek featuring advanced architecture and groundbreaking benchmark results.
03

GLM 744B MoE

Zhipu AI's flagship model leading in agentic engineering and long-horizon coding tasks. Features a 40B active parameter count from a massive 744B total.
04

Gemma 4 2B–31B

Google's hybrid-attention family excelling in reasoning, coding, and on-device multimodal use, spanning from lightweight to full-size variants.
05

Kimi-K2.5 1T MoE · 32B active

Moonshot's multimodal MoE built for visual coding and orchestrating agent swarms of up to 100 parallel sub-agents.
06

MiniMax-M2.7 Self-improving

A self-improving agentic LLM that tops SWE-Pro benchmarks, designed for real-world software engineering and autonomous task completion.
07

MiMo-V2-Flash 309B MoE · 150 t/s

Xiaomi's efficient 309B MoE (15B active) delivering 150 tokens/sec throughput, ideal for high-volume coding agents in production environments.

🎨

Image Generation Models

Open-source diffusion models for photorealism, fine-tuning, and real-time generation.

01

FLUX.1 [schnell] Speed · Consumer GPU

The fastest open-source model balancing quality and speed for consumer GPUs — ideal for quick iteration and batch generation.
02

FLUX.1 [dev] Top Benchmarks

Black Forest Labs' flagship model and current benchmark leader for high-fidelity generation of complex scenes and fine details.
03

Stable Diffusion 3.5 Large Fine-tuning King

The reigning ecosystem king for fine-tuning, editing workflows, and community LoRA support — versatile and widely adopted.
04

GLM-Image Typography · Apache 2.0

Specialist model for bilingual infographic typography and text-in-image generation, released under the permissive Apache 2.0 license.
05

HiDream-I1-Full Photorealism

The raw photorealism expert for premium high-resolution outputs, delivering exceptional detail and image coherence.
06

SANA-Sprint 1.6B Low-VRAM · Fast

Ultra-efficient 1.6B parameter model designed for low-VRAM setups and rapid experimentation without sacrificing quality.
07

Z-Image-Turbo 6B · Edge & Batch

A lightweight 6B real-time generator optimized for edge deployments and high-throughput batch processing workflows.
08

HunyuanImage-3.0 Research-grade

Tencent's research-grade model for advanced coherence and diversity, pushing the limits of compositional image generation.

🎬

Image-to-Video Models

Open-source models that transform still images into high-quality video sequences.

01

LTX-2.3 4K · 50fps · Audio

The leading open-source image-to-video model with native 4K at 50fps and synchronized audio support for production-ready results.
02

LTX-2.3-GGUF 21B · Quantized

A quantized 21B-parameter variant of LTX-2.3 optimized for efficient inference on consumer hardware with minimal quality loss.
03

WAN2.2-14B-Rapid 14B MoE · Local

Rapid all-in-one 14B image-to-video model using a MoE architecture for fast local runs without requiring cloud infrastructure.
04

Wan2.2-I2V-A14B-GGUF 480p/720p · Mid-GPU

A 14B quantized Wan2.2 variant tailored for 480p and 720p video generation on mid-range GPUs at an accessible cost.
05

HY-OmniWeaving Multi-style · Tencent

Tencent's omni-modal image-to-video model with multi-style weaving capabilities for creative and cinematic output.
06

LTX-2.3-Transition-LORA LoRA · Scene Transitions

A LoRA fine-tune specifically trained for smooth, cinematic scene transitions within LTX-2.3 video generation pipelines.

📄

Best Image-to-Text Models

OCR, captioning, and vision-language models for extracting meaning from images.

01

GLM-OCR Best OCR 2026

The top open-source OCR model of 2026 for speed and accuracy on complex documents, tables, and structured layouts.
02

nemotron-ocr-v2 NVIDIA · Multilingual

NVIDIA's high-precision OCR excelling at scene text extraction and multilingual recognition across diverse real-world conditions.
03

Falcon-OCR TII UAE · Efficient

An efficient OCR model from TII UAE built for reliable text extraction in varied real-world image conditions.
04

BLIP Large (Captioning) Salesforce · Captions

Salesforce's BLIP large model for detailed, high-quality image captioning — widely used as a strong captioning baseline.
05

trocr-base-handwritten Microsoft · Handwriting

Microsoft's TrOCR base model specialized for accurate transcription of handwritten text across different writing styles.
06

manga-ocr-base Japanese Manga OCR

A specialized OCR model fine-tuned for extracting Japanese text from manga panels and comic book images.

🎙️

Best Audio Generation Models

TTS, voice cloning, music generation, and speech recognition open-source models.

01

Qwen3-TTS TTS · Best Overall

The best overall TTS model balancing quality and speed — ideal for production deployments that need both natural voice and fast inference.
02

Fish Speech Voice Cloning

Highly capable voice cloning model delivering realistic, high-fidelity synthetic speech from short audio references.
03

CosyVoice 3.0 Multilingual · Streaming

A very solid multilingual TTS model with streaming support, making it great for real-time applications across multiple languages.
04

Kimi-Audio Multimodal Audio

A strong multimodal audio model supporting expressive voices, useful for scenarios that combine speech generation with other modalities.
05

ACE-Step 1.5 Music Generation

Currently the best open-source music generation model, capable of producing coherent, full-track compositions from text prompts.
06

FunASR ASR · Multilingual

A strong multilingual speech recognition model with streaming capabilities, widely used for transcription pipelines.
07

AudioX Multimodal → Audio

The most complete multimodal-to-audio stack, supporting text, image, and video inputs to synthesize matching audio output.
08

NVIDIA A2SB Audio Restoration

NVIDIA's model for audio restoration and inpainting — best in class for cleaning up degraded speech and background noise removal.

💻

Coding Models

Open-source LLMs specialized in code generation, debugging, and software engineering.

01

Qwen State-of-the-art

Leading open-source coding model offering unparalleled performance in code generation, refactoring, and technical reasoning.
02

DeepSeek Advanced MoE

A highly efficient Mixture-of-Experts model that excels in complex algorithmic tasks and repository-level code understanding.
03

StarCoder BigCode

The latest in the StarCoder family, trained on a massive multilingual codebase with permissive licensing for enterprise use.