Image Model
FLUX
FLUX.1 dev
FLUX.1 dev is Black Forest Labs' 12 billion parameter open-weight text-to-image model released in 2024 for non-commercial applications. Trained using guidance distillation from FLUX.1 Pro, it delivers cutting-edge output quality with competitive prompt following that matches closed-source alternatives. Supporting resolutions between 0.1 and 2.0 megapixels with efficient 25-50 step generation, it offers exceptional visual quality and prompt adherence. Released under a non-commercial license with open weights available on HuggingFace, FLUX.1 dev enables researchers and developers to drive scientific innovation while empowering artists with advanced creative workflows.
Official Site: https://bfl.ai/
FLUX Pro
FLUX.1 Pro (FLUX 1.1 Pro) is Black Forest Labs' flagship proprietary text-to-image model delivering state-of-the-art performance with superior prompt following, visual quality, and output diversity. Released in October 2024, FLUX 1.1 Pro features 6x faster generation speed than its predecessor while supporting native 2K resolution generation. With exceptional photorealism, intricate detail rendering, and the highest Elo score on Artificial Analysis image arena, it excels at commercial applications from product photography to brand campaigns. Available exclusively through BFL API and partners (Replicate, fal.ai, Together.ai), FLUX Pro offers Ultra mode (4MP resolution) and Raw mode (hyper-realistic candid photography style) for production-grade workflows.
Official Site: https://bfl.ai/models/flux-pro
FLUX.1 Krea
FLUX.1 Krea [dev] is Black Forest Labs' "opinionated" 12 billion parameter text-to-image model developed in collaboration with Krea AI, released in July 2025. Specifically trained to overcome the oversaturated "AI look," it achieves exceptional photorealism with distinctive aesthetics and diverse outputs. Outperforming previous open models and matching FLUX 1.1 Pro in human preference assessments, it excels at generating realistic images without oversmoothed textures. Released under FLUX.1 dev non-commercial license with open weights on HuggingFace, FLUX.1 Krea serves as a flexible base model for downstream fine-tuning and demonstrates successful collaborative development between foundation model labs and application teams.
Official Site: https://bfl.ai/blog/flux-1-krea-dev
FLUX.2 dev
FLUX.2 dev is Black Forest Labs' 32 billion parameter open-weight model released in November 2025, combining image generation and editing in a single architecture. Built on latent flow matching with Mistral-3 24B vision-language model, it delivers frontier performance with multi-reference support (up to 10 images), 4MP resolution output, enhanced typography, and superior prompt adherence. Featuring significant improvements in photorealism, world knowledge, and spatial logic, it enables character-consistent campaigns and complex text rendering. Released under FLUX.2 dev non-commercial license with weights on HuggingFace, it requires substantial VRAM but offers quantized versions for consumer hardware through Hugging Face Diffusers.
Official Site: https://bfl.ai/blog/flux-2
FLUX.2 Pro
FLUX.2 Pro is Black Forest Labs' production-grade proprietary model released in November 2025, delivering state-of-the-art quality at maximum speed. Built on 32B parameter latent flow matching architecture, it offers exceptional photorealism, multi-reference support (up to 10 images), 4MP resolution output, and reliable typography generation. With enhanced world knowledge, accurate object positioning, and coherent lighting across complex scenes, it excels at character-consistent campaigns, product placement, and brand-accurate rendering. Available through BFL API and partners (Replicate, fal.ai, Cloudflare), FLUX.2 Pro provides optimal balance of quality and affordability for commercial production workflows requiring zero compromise between speed and visual fidelity.
Official Site: https://bfl.ai/models/flux-2
FLUX.2 Flex
FLUX.2 Flex is Black Forest Labs' specialized proprietary model released in November 2025, offering maximum precision and fine-grained control over generation parameters. With adjustable step counts (6-50 steps) and guidance scale, it provides developers complete control to balance typography accuracy, image detail, quality, and latency for specific use cases. Excelling at complex text rendering, UI mockups, infographics, and maintaining small details, FLUX.2 Flex trades off between speed and precision based on parameter settings. Built on the same 32B parameter architecture as FLUX.2 Pro, it delivers production-ready results with unprecedented flexibility for applications demanding exact control over visual output quality and generation characteristics.
Official Site: https://bfl.ai/models/flux-2
Seedream
Seedream 4.5
Seedream 4.5 is ByteDance's latest AI image generation model achieving comprehensive improvements through model scaling. Released in 2025, it excels at multi-image editing with accurate subject identification, strict reference detail preservation, and enhanced typography rendering for professional visual creatives. Supporting up to 4K resolution output with 14 reference image fusion, it delivers cinematic realism, character consistency, and improved spatial logic. With designer-level composition capabilities and clear small text rendering, it ranks #10 on LM Arena leaderboard. Available through BytePlus API and partners, Seedream 4.5 offers production-ready results for e-commerce, marketing, and brand visuals.
Official Site: https://seed.bytedance.com/en/seedream4_5
Seedream 4.0
Seedream 4.0 is ByteDance's new-generation image creation model integrating generation and editing in unified architecture, released in 2025. With 12 billion parameters, it flexibly handles complex multimodal tasks including knowledge-based generation, complex reasoning, and reference consistency. Supporting batch processing with multiple reference images and up to 4K resolution output, it delivers faster inference speed than predecessors. Excelling at prompt adherence, alignment, and aesthetics on benchmarks like Artificial Analysis, Seedream 4.0 enables high-quality image creation and precise editing with single-sentence commands. Ideal for professional applications in film, advertising, and design workflows.
Official Site: https://seed.bytedance.com/en/seedream4_0
Seedream 3.0
Seedream 3.0 is ByteDance's earlier-generation AI image model providing foundational capabilities in text-to-image generation and basic editing functions. While less advanced than Seedream 4.x series, it established core technologies including multi-modal understanding, prompt following, and artistic style rendering. Serving as the predecessor to Seedream 4.0, it demonstrated ByteDance's capabilities in AI visual generation and helped develop the architectural foundation for subsequent model improvements. Though now superseded by newer versions, Seedream 3.0 contributed to the evolution of ByteDance's image generation technology and commercial applications.
Official Site: https://seed.bytedance.com/
Dreamina
Dreamina 3.1
Dreamina 3.1 is ByteDance's advanced 4MP text-to-image model released in 2025, focused on enhanced visual presentation with significant improvements in aesthetic quality, precise style diversity, and image detail richness. Developed by CapCut's creative team, it excels at professional cinematic quality with nuanced lighting, atmospheric depth, and diverse photographic styles including underwater, portrait, and pet photography. Supporting commercial applications with optimized handling of graphic design and poster scenarios, it maintains strong text rendering capabilities and offers accurate style control across artistic movements like Fauvism and Baroque. Available through CapCut's Dreamina platform with 225 free daily tokens, it enables high-resolution image generation for social media, marketing, and creative projects.
Official Site: https://dreamina.capcut.com/
Grok
Grok 2
Grok 2 is xAI's flagship AI model released in August 2024, featuring advanced reasoning, coding capabilities, and image generation through the Aurora autoregressive mixture-of-experts network. Built on billions of internet examples, Aurora excels at photorealistic rendering, precise text instruction following, and multimodal input support for editing user-provided images. Available on X platform (formerly Twitter) for Premium and Premium+ users, Grok 2 demonstrates state-of-the-art performance in entity generation, artistic text, meme creation, realistic portraits, and celebrity rendering. With improved accuracy, instruction-following, and multilingual capabilities over Grok-1.5, it offers web search integration, citations, real-time information access, and API access for developers through xAI's enterprise platform.
Official Site: https://x.ai/news/grok-image-generation-release
Reve
Reve Create (Reve Image 1.0)
Reve Image 1.0 is a Palo Alto-based startup's advanced text-to-image model (code-named "Halfmoon") released in March 2025, excelling at prompt adherence, aesthetics, and typography. Built on hybrid diffusion architecture with context-aware prompt interpreter and proprietary typography engine trained on 50 million font samples, it delivers superior text rendering and multi-character consistency. Supporting native 2048x2048 resolution with optional 4K upscaling and natural language image editing, Reve offers unlimited free generations without registration. With 92% detail accuracy in upscaling tests and 89% consistency in multi-character prompts, it combines drag-and-drop editing interface with AI-powered creation tools for professional-grade results.
Official Site: https://app.reve.com/
Ideogram
Ideogram 3.0 Turbo
Ideogram 3.0 Turbo is the fastest and most cost-efficient variant of Ideogram's March 2025 flagship model, optimized for rapid iterations and high-volume tasks. Part of the three-tier Ideogram 3.0 family, Turbo delivers stunning realism, creative designs, and consistent styles with significant advancements in image-prompt alignment, photorealism, and text rendering quality. Supporting Style Reference feature with up to 3 reference images, Random style exploration from 4.3 billion presets, and various aspect ratios, it excels at professional-quality logos, promotional posters, product photography, and graphic design. Consistently outperforming other text-to-image models in human evaluations with highest ELO ratings, Turbo enables professional creators to ideate quickly and customize graphics at scale.
Official Site: https://ideogram.ai/features/3.0
Ideogram 3.0 Quality
Ideogram 3.0 Quality (also called Ideogram V3 Quality) is the highest-fidelity variant of Ideogram's March 2025 flagship model, delivering maximum precision and detail for professional creative work. Building on the same revolutionary foundation as the Turbo variant, Quality mode provides superior photorealism with enhanced clarity for intricate elements like fabrics, water droplets, and animal fur, while excelling in both photorealistic and abstract styles. Supporting up to 2K resolution with flexible aspect ratios and game-changing typography capabilities for complex text compositions, it enables professional design work including greeting cards, posters, comics, landing page concepts, and marketing materials. The model's sophisticated spatial compositions feature nuanced lighting, precise colors, and lifelike environmental detail that blur the line between generated and real imagery.
Official Site: https://ideogram.ai/features/3.0
Imagen
Imagen 4
Imagen 4 is Google DeepMind's most advanced text-to-image model released at Google I/O 2025 (May 20), featuring significantly improved text rendering, up to 2K resolution, and enhanced prompt adherence over Imagen 3. Built on latent diffusion architecture with Gemini-generated synthetic captions, it offers three variants: standard Imagen 4 for general use, Imagen 4 Fast with 10x faster generation for rapid ideation, and Imagen 4 Ultra for highest precision and detail. Excelling in diverse art styles from photorealism to impressionism, it delivers superior lighting accuracy, fine detail rendering, clean typography, and spatial logic for complex multi-character scenes. Available through Gemini API, Google AI Studio, and Vertex AI, all outputs include imperceptible SynthID watermarking for AI-generated content identification.
Official Site: https://deepmind.google/models/imagen/
Imagen 4 Ultra
Imagen 4 Ultra is Google DeepMind's highest-precision variant of the Imagen 4 family released in May 2025, designed for maximum detail and strict adherence to complex text prompts. Built for professional creative workflows demanding highest fidelity, Ultra delivers superior results in photorealistic rendering, nuanced lighting, fine detail accuracy, and sophisticated text rendering for greeting cards, posters, and comics. Supporting up to 2K resolution with enhanced prompt alignment capabilities, it excels at complex multi-character compositions, intricate spatial logic, and maintaining clean, artifact-free outputs. Available through Gemini API, Google AI Studio, and Vertex AI, all outputs include imperceptible SynthID watermarking for responsible AI transparency and traceability.
Official Site: https://deepmind.google/models/imagen/
Luma
Luma Photon
Luma Photon is Luma Labs' revolutionary text-to-image model released in December 2024, featuring breakthrough Universal Transformer architecture delivering ultra-high-quality 1080p/2MP images with exceptional efficiency. Designed to eliminate the generic "AI look" through specially crafted aesthetics, Photon excels at photorealistic rendering, advanced natural language understanding with large context windows, and multi-turn iterative editing workflows. Supporting character consistency from single reference images, multi-image reference system for style transfer, and outperforming competitors in blind evaluations for creativity and prompt fidelity, it enables designers, filmmakers, and architects to explore vast idea spaces. Available through Luma API and Dream Machine service alongside faster Photon Flash variant.
Official Site: https://lumalabs.ai/photon
Recraft
Recraft 3.0
Recraft V3 (code-named "Red Panda") is the #1 ranked text-to-image model on Hugging Face's industry-leading benchmark released in 2024, achieving ELO rating of 1172 and outperforming Midjourney, OpenAI, and all major competitors. Revolutionary for professional designers, it uniquely supports both raster and vector (SVG) image generation with exceptional text rendering capabilities for any size and length. Featuring precise style control for brand consistency without retraining, advanced inpainting/outpainting, drag-and-drop text positioning, and superior anatomical accuracy, V3 delivers photorealistic images with professional-grade quality. Available through desktop app, mobile (iOS/Android), and API for seamless integration into design workflows.
Official Site: https://www.recraft.ai/
Qwen
Qwen Image
Qwen Image (Qwen-Image-2512) is Alibaba Cloud's fully open-source 20B parameter MMDiT image generation model released in December 2024, achieving first place rankings across 9 public benchmarks including GenEval, DPG, and OneIG-Bench. Licensed under Apache 2.0 for free commercial use, it excels at commercial-grade Chinese and English text rendering with support for complex multi-line layouts, paragraph-level semantics, and fine-grained visual details. Optimized for deployment on single RTX 3090 GPU through DFloat11 quantization and CPU offloading, it delivers superior performance in precise image editing while maintaining semantic integrity and visual realism. Available through Qwen Chat, Hugging Face, ModelScope, and Alibaba Cloud Model Studio for text-heavy structured visual generation including infographics, posters, and multilingual enterprise documentation.
Official Site: https://qwen.ai/
SeeEdit
SeeEdit 3.0
SeeEdit 3.0 (SeedEdit 3.0) is ByteDance's state-of-the-art generative image editing model released in June 2025, achieving 56.1% usability rate significantly outperforming SeedEdit 1.6 (38.4%), GPT-4o (37.1%), and Gemini 2.0 (30.3%) in real/synthetic image editing tasks. Built with enhanced meta-information embedding pipeline and joint diffusion-reward learning, it excels at context-aware editing with superior instruction-following and image content preservation, particularly for identity/intellectual property retention. Processing high-definition images above 1K resolution, it delivers professional-grade edits in 10-15 seconds with exceptional capabilities in background swaps, object removals, lighting shifts, text edits, and character consistency. Supporting dual-language (Chinese/English) prompts and achieving 4.07/5 image consistency scores, it's optimized for photographers, product teams, and creative professionals requiring precision control.
Official Site: https://seed.bytedance.com/
Nano Banana
Nano Banana
Nano Banana (Gemini 2.5 Flash Image) is Google DeepMind's fast, conversational image generation and editing model released in August 2025, ranking as the top-rated image editing model globally with unmatched character consistency across multiple prompts. Designed for rapid, multi-turn creative workflows, it excels at maintaining perfect character appearance, enabling targeted transformations through natural language commands including background blurring, object removal, pose alterations, and photo colorization. Supporting seamless multi-image composition and visual template adherence for consistent brand assets, it leverages Gemini's deep semantic understanding for complex visual reasoning beyond simple photorealism. Available in Gemini app with visible and invisible SynthID watermarks, it enables casual creators to transform ideas into professional visuals through simple text prompts.
Official Site: https://gemini.google/overview/image-generation/
Nano Banana Pro
Nano Banana Pro (Gemini 3 Pro Image) is Google DeepMind's state-of-the-art professional image generation and editing model released in November 2025, built on Gemini 3 Pro with enhanced reasoning and real-world knowledge for studio-quality results. Designed for enterprise-grade production workflows, it excels at advanced text rendering with legible, multi-language text generation, supporting up to 14 input reference images for complex compositions and advanced creative controls with 1K/2K/4K resolution outputs. Featuring "thinking mode" for complex prompt reasoning, Google Search grounding for factual accuracy, and superior character consistency for up to 5 people, it delivers professional visuals for mockups, posters, infographics, and marketing assets. Available on Vertex AI, Google Workspace (Slides, Vids), Gemini Enterprise, and integrated into Adobe Firefly, Photoshop, Canva, and Figma.
Official Site: https://blog.google/technology/ai/nano-banana-pro/
GPT Image
GPT Image 1.5
GPT Image 1.5 is OpenAI's latest production-grade image generation and editing model released in December 2025, featuring native multimodal architecture that processes text and images in unified neural network for superior editing precision. Built with internal codename "Hazel," it delivers up to 4x faster generation than DALL-E 3 with enhanced instruction following, robust facial and identity preservation across multi-turn edits, and reliable text rendering with crisp lettering and consistent layouts. Supporting both text-to-image generation and targeted image editing workflows, it excels at complex structured visuals including infographics, UI mockups, comic strips, and marketing materials while maintaining composition, lighting, and character consistency. Available through ChatGPT, OpenAI API, and Microsoft Foundry with flexible quality-latency tradeoffs and built-in world knowledge for contextually accurate content.
Official Site: https://openai.com/index/new-chatgpt-images-is-here/
Reve
Reve Edit
Reve Edit is Reve AI's specialized image editing model ranking as top 3 on LMArena for image editing tasks, featuring spatial intelligence that understands depth, perspective, and three-dimensional object relationships for seamless edits. Designed for professional workflows requiring composition preservation, it combines natural language editing with drag-and-drop interface for targeted transformations without affecting unedited regions. Excelling at product photography variations, photo restoration, landscape editing with realistic weather/lighting adjustments, and creative iteration from single source images, it maintains proper texture, material rendering, and visual coherence throughout modifications. Built by 10-person research team advancing rapidly with superior prompt adherence and aesthetic quality, it enables multi-image composition and style reference capabilities for consistent brand asset creation.
Official Site: https://app.reve.com/
Flux Kontext
Flux Kontext Max
FLUX.1 Kontext [max] is Black Forest Labs' premium in-context image generation and editing model released in May 2025, delivering maximum performance with exceptional prompt adherence and advanced typography handling. Part of the revolutionary Kontext suite using generative flow matching architecture, it unifies text-to-image generation with instant text-based editing for character consistency, local editing, and style reference capabilities. Supporting both text and image inputs with 3-5 second inference speeds at 1MP resolution, it enables iterative creative workflows through multi-turn refinements while preserving unique visual elements across scenes and environments. Achieving top rankings in text editing and character preservation on KontextBench benchmarks while operating 8x faster than competing models like GPT-Image, it's accessible through BFL Playground and API partners for professional creative production.
Official Site: https://bfl.ai/models/flux-kontext
Flux Kontext Pro
FLUX.1 Kontext [pro] is Black Forest Labs' flagship iterative editing model released in May 2025, built for fast, multi-turn workflows combining generation and refinement. Designed for professional creative production, it handles text and reference images as inputs, enabling targeted local edits and complex scene transformations while maintaining character and stylistic consistency across iterations. Achieving top performance in text editing and character preservation benchmarks with 3-5 second inference speeds, it operates 8x faster than competing models. Supporting character consistency, local editing, and style reference capabilities in unified architecture, it's accessible through partners including KreaAI, OpenArt, Replicate, and BFL Playground for production workflows.
Official Site: https://bfl.ai/models/flux-kontext
Flux Kontext Dev
FLUX.1 Kontext [dev] is Black Forest Labs' open-weight variant released in May 2025 under non-commercial license for research and safety testing. Built on the same generative flow matching architecture as Pro and Max variants, it provides developers with customizable foundation for experimentation and integration into node-based pipelines like ComfyUI. Supporting in-context image generation with both text and visual inputs for character consistency and editing capabilities, it's available through Hugging Face, GitHub, and infrastructure partners including Replicate, FAL, and TogetherAI. Commercial use available through licensing, with usage tracking integrated for compliance.
Official Site: https://bfl.ai/models/flux-kontext
Qwen-Image-Edit
Qwen Image Edit
Qwen-Image-Edit (Qwen-Image-Edit-2511) is Alibaba's open-source 20B parameter image editing model released in November 2025, built on Qwen-Image foundation with dual-pipeline architecture. Combining Qwen2.5-VL for semantic control and VAE for visual appearance, it supports precise bilingual (Chinese/English) text editing and dual semantic/appearance editing modes. Excelling at multi-person consistency in group photos, identity-preserving portrait edits, style transfer, object rotation, and text modification while maintaining font and layout, it's available under Apache 2.0 license. Accessible through Qwen Chat, Hugging Face, ModelScope, and Alibaba Cloud Model Studio for professional design and creative workflows.
Official Site: https://qwen.ai/
Upscaler
Crisp Upscaler
Recraft's Crisp Upscale is a fast, precision-focused AI upscaler designed for professional print and web use. It increases image resolution up to 4096x4096 pixels while maintaining sharpness and clarity without modifying original content. Built for designers, marketers, and sellers needing quick turnaround, it processes images in seconds with minimal computational cost. Ideal for preparing illustrations, logos, product photography, and digital assets for high-quality output. Available free through Recraft's web platform and as API integration via partners including Replicate and Kie.ai for automated workflows.
Official Site: https://www.recraft.ai/image-upscaler
Creative Upscaler
Recraft's Creative Upscale is an advanced AI upscaler that enhances resolution while refining complex details, textures, and facial features. Unlike standard upscaling that preserves exact pixels, it intelligently adds depth by improving fine details and recovering lost information. Processing images from 256px to 16MP, it excels at portrait enhancement, product photography refinement, and artistic image improvement. Results take longer than Crisp Upscale but deliver superior quality for professional creative work. Available through Recraft platform and API partners including Replicate and fal.ai for integration into production pipelines.
Official Site: https://www.recraft.ai/image-upscaler
Topaz Image Upscaler
Topaz Gigapixel AI is the industry-leading professional image upscaler from Topaz Labs, first commercially available since 2019. Utilizing deep learning models trained by PhD researchers, it upscales images up to 600% (6x) with nine specialized AI models for different image types. Supporting portraits, landscapes, architecture, and compressed images, it preserves detail while reducing noise and artifacts. Available as standalone desktop application (Windows/Mac), iOS mobile app, and plugin for major editing software including Photoshop and Lightroom. Widely adopted by professional photographers, fine artists, commercial studios, and creative teams worldwide for printing, restoration, and cropping workflows.
Official Site: https://www.topazlabs.com/topaz-gigapixel
Last updated