Alibaba has launched its unified image generation and editing model, Wan2.7-Image. The model aims to address common challenges in AI-generated imagery, such as aesthetic fatigue from "standardized faces" and unpredictable color outputs, by enabling highly personalized and realistic human depictions with precise color control.
Wan2.7-Image offers end-to-end capabilities, including text-to-image generation, image-to-image set creation, image instruction editing, and interactive editing. In human preference blind tests, its text-to-image performance surpassed that of GPT-Image1.5 and leading domestic models, with text rendering, photorealistic imaging, and world knowledge metrics approaching the level of Nano Banana Pro.
To move away from repetitive "AI faces," Wan2.7-Image enhances its virtual avatar customization feature, allowing full control over facial structure—from bone structure and eye shape to detailed facial features. Users can modify face shapes (such as oval, round, square, or oblong) and eye characteristics (like almond-shaped eyes, deep-set eyes, round eyes, or upturned eyes) to achieve unique, individualized appearances.
Artists and designers often require precise color control, particularly for commercial posters where color schemes are strictly defined. Addressing the issue of unpredictable color outputs in AI image generation, Wan2.7-Image introduces a new "color palette" function. This allows users to extract or input color references via Hex codes, enabling the generation of images in specific color schemes—from Matisse's vibrant reds and Van Gogh's bright yellows to Picasso's cool blues. Users can also adjust the proportion and quantity of colors to create custom palettes.
Another common issue in AI image generation is the poor rendering of long texts, which often results in blurry, disorganized, or missing content. Wan2.7-Image tackles this with its long-context text encoder, which processes ultra-long sequences and delivers print-quality rendering of lengthy texts, tables, and complex formulas. The model supports 12 languages and can handle input of up to 3K tokens, equivalent to a full A4 page of academic text.
Comments