Alibaba Unveils Wan2.7-Image Model: Enables Ultra-Long Text Rendering and Creates Lifelike Human Figures

Deep News04-01 16:31

Alibaba has launched its unified image generation and editing model, Wan2.7-Image. The model aims to address common challenges in AI-generated imagery, such as aesthetic fatigue from "standardized faces" and unpredictable color outputs, by enabling highly personalized and realistic human depictions with precise color control.

Wan2.7-Image offers end-to-end capabilities, including text-to-image generation, image-to-image set creation, image instruction editing, and interactive editing. In human preference blind tests, its text-to-image performance surpassed that of GPT-Image1.5 and leading domestic models, with text rendering, photorealistic imaging, and world knowledge metrics approaching the level of Nano Banana Pro.

To move away from repetitive "AI faces," Wan2.7-Image enhances its virtual avatar customization feature, allowing full control over facial structure—from bone structure and eye shape to detailed facial features. Users can modify face shapes (such as oval, round, square, or oblong) and eye characteristics (like almond-shaped eyes, deep-set eyes, round eyes, or upturned eyes) to achieve unique, individualized appearances.

Artists and designers often require precise color control, particularly for commercial posters where color schemes are strictly defined. Addressing the issue of unpredictable color outputs in AI image generation, Wan2.7-Image introduces a new "color palette" function. This allows users to extract or input color references via Hex codes, enabling the generation of images in specific color schemes—from Matisse's vibrant reds and Van Gogh's bright yellows to Picasso's cool blues. Users can also adjust the proportion and quantity of colors to create custom palettes.

Another common issue in AI image generation is the poor rendering of long texts, which often results in blurry, disorganized, or missing content. Wan2.7-Image tackles this with its long-context text encoder, which processes ultra-long sequences and delivers print-quality rendering of lengthy texts, tables, and complex formulas. The model supports 12 languages and can handle input of up to 3K tokens, equivalent to a full A4 page of academic text.

Disclaimer: Investing carries risk. This is not financial advice. The above content should not be regarded as an offer, recommendation, or solicitation on acquiring or disposing of any financial products, any associated discussions, comments, or posts by author or other users should not be considered as such either. It is solely for general information purpose only, which does not consider your own investment objectives, financial situations or needs. TTM assumes no responsibility or warranty for the accuracy and completeness of the information, investors should do their own research and may seek professional advice before investing.

Comments

We need your insight to fill this gap
Leave a comment