🤖 Open-Source AI

Top Open-Source AI Models

A curated guide to the best open-source models across every major AI category — from text generation and image synthesis to audio and video. Cut through the noise and find what actually works.

🧠

Text Generation Models

Frontier open-source LLMs for coding, reasoning, and agentic workflows.

  • 01
    Qwen Latest

    Providing state-of-the-art performance across multimodal and reasoning tasks.

  • 02
    DeepSeek Latest

    Next-generation model from DeepSeek featuring advanced architecture and groundbreaking benchmark results.

  • 03
    GLM 744B MoE

    Zhipu AI's flagship model leading in agentic engineering and long-horizon coding tasks. Features a 40B active parameter count from a massive 744B total.

  • 04
    Gemma 4 2B–31B

    Google's hybrid-attention family excelling in reasoning, coding, and on-device multimodal use, spanning from lightweight to full-size variants.

  • 05
    Kimi-K2.5 1T MoE · 32B active

    Moonshot's multimodal MoE built for visual coding and orchestrating agent swarms of up to 100 parallel sub-agents.

  • 06
    MiniMax-M2.7 Self-improving

    A self-improving agentic LLM that tops SWE-Pro benchmarks, designed for real-world software engineering and autonomous task completion.

  • 07
    MiMo-V2-Flash 309B MoE · 150 t/s

    Xiaomi's efficient 309B MoE (15B active) delivering 150 tokens/sec throughput, ideal for high-volume coding agents in production environments.

🎨

Image Generation Models

Open-source diffusion models for photorealism, fine-tuning, and real-time generation.

  • 01
    FLUX.1 [schnell] Speed · Consumer GPU

    The fastest open-source model balancing quality and speed for consumer GPUs — ideal for quick iteration and batch generation.

  • 02
    FLUX.1 [dev] Top Benchmarks

    Black Forest Labs' flagship model and current benchmark leader for high-fidelity generation of complex scenes and fine details.

  • 03

    The reigning ecosystem king for fine-tuning, editing workflows, and community LoRA support — versatile and widely adopted.

  • 04
    GLM-Image Typography · Apache 2.0

    Specialist model for bilingual infographic typography and text-in-image generation, released under the permissive Apache 2.0 license.

  • 05
    HiDream-I1-Full Photorealism

    The raw photorealism expert for premium high-resolution outputs, delivering exceptional detail and image coherence.

  • 06
    SANA-Sprint 1.6B Low-VRAM · Fast

    Ultra-efficient 1.6B parameter model designed for low-VRAM setups and rapid experimentation without sacrificing quality.

  • 07
    Z-Image-Turbo 6B · Edge & Batch

    A lightweight 6B real-time generator optimized for edge deployments and high-throughput batch processing workflows.

  • 08
    HunyuanImage-3.0 Research-grade

    Tencent's research-grade model for advanced coherence and diversity, pushing the limits of compositional image generation.

🎬

Image-to-Video Models

Open-source models that transform still images into high-quality video sequences.

  • 01
    LTX-2.3 4K · 50fps · Audio

    The leading open-source image-to-video model with native 4K at 50fps and synchronized audio support for production-ready results.

  • 02
    LTX-2.3-GGUF 21B · Quantized

    A quantized 21B-parameter variant of LTX-2.3 optimized for efficient inference on consumer hardware with minimal quality loss.

  • 03
    WAN2.2-14B-Rapid 14B MoE · Local

    Rapid all-in-one 14B image-to-video model using a MoE architecture for fast local runs without requiring cloud infrastructure.

  • 04
    Wan2.2-I2V-A14B-GGUF 480p/720p · Mid-GPU

    A 14B quantized Wan2.2 variant tailored for 480p and 720p video generation on mid-range GPUs at an accessible cost.

  • 05
    HY-OmniWeaving Multi-style · Tencent

    Tencent's omni-modal image-to-video model with multi-style weaving capabilities for creative and cinematic output.

  • 06
    LTX-2.3-Transition-LORA LoRA · Scene Transitions

    A LoRA fine-tune specifically trained for smooth, cinematic scene transitions within LTX-2.3 video generation pipelines.

📄

Best Image-to-Text Models

OCR, captioning, and vision-language models for extracting meaning from images.

  • 01
    GLM-OCR Best OCR 2026

    The top open-source OCR model of 2026 for speed and accuracy on complex documents, tables, and structured layouts.

  • 02
    nemotron-ocr-v2 NVIDIA · Multilingual

    NVIDIA's high-precision OCR excelling at scene text extraction and multilingual recognition across diverse real-world conditions.

  • 03
    Falcon-OCR TII UAE · Efficient

    An efficient OCR model from TII UAE built for reliable text extraction in varied real-world image conditions.

  • 04
    BLIP Large (Captioning) Salesforce · Captions

    Salesforce's BLIP large model for detailed, high-quality image captioning — widely used as a strong captioning baseline.

  • 05
    trocr-base-handwritten Microsoft · Handwriting

    Microsoft's TrOCR base model specialized for accurate transcription of handwritten text across different writing styles.

  • 06
    manga-ocr-base Japanese Manga OCR

    A specialized OCR model fine-tuned for extracting Japanese text from manga panels and comic book images.

🎙️

Best Audio Generation Models

TTS, voice cloning, music generation, and speech recognition open-source models.

  • 01
    Qwen3-TTS TTS · Best Overall

    The best overall TTS model balancing quality and speed — ideal for production deployments that need both natural voice and fast inference.

  • 02
    Fish Speech Voice Cloning

    Highly capable voice cloning model delivering realistic, high-fidelity synthetic speech from short audio references.

  • 03
    CosyVoice 3.0 Multilingual · Streaming

    A very solid multilingual TTS model with streaming support, making it great for real-time applications across multiple languages.

  • 04
    Kimi-Audio Multimodal Audio

    A strong multimodal audio model supporting expressive voices, useful for scenarios that combine speech generation with other modalities.

  • 05
    ACE-Step 1.5 Music Generation

    Currently the best open-source music generation model, capable of producing coherent, full-track compositions from text prompts.

  • 06
    FunASR ASR · Multilingual

    A strong multilingual speech recognition model with streaming capabilities, widely used for transcription pipelines.

  • 07
    AudioX Multimodal → Audio

    The most complete multimodal-to-audio stack, supporting text, image, and video inputs to synthesize matching audio output.

  • 08
    NVIDIA A2SB Audio Restoration

    NVIDIA's model for audio restoration and inpainting — best in class for cleaning up degraded speech and background noise removal.

💻

Coding Models

Open-source LLMs specialized in code generation, debugging, and software engineering.

  • 01
    Qwen State-of-the-art

    Leading open-source coding model offering unparalleled performance in code generation, refactoring, and technical reasoning.

  • 02
    DeepSeek Advanced MoE

    A highly efficient Mixture-of-Experts model that excels in complex algorithmic tasks and repository-level code understanding.

  • 03
    StarCoder BigCode

    The latest in the StarCoder family, trained on a massive multilingual codebase with permissive licensing for enterprise use.

Scroll to Top

By continuing to use the site, you agree to the use of cookies. more information

The cookie settings on this website are set to "allow cookies" to give you the best browsing experience possible. If you continue to use this website without changing your cookie settings or you click "Accept" below then you are consenting to this.

Close