Alibaba Cloud’s “AI + Cloud Strategy” and Initiatives

Alibaba Cloud’s “AI + Cloud Strategy” and Initiatives Speaker: Yuichi
Fujikawa May 26th, 2026

About Me Alibaba Cloud AISolutions Architect Yuichi Fujikawa Hobby: Travel

3 Alibaba Group Founded in 1999. Provides a wide range
of life-integrated services centered on One of the world's leading internet company groups Alibaba Cloud Serves as the information infrastructure for Alibaba Group As a cloud computing business division Established in 2009. Services launched in Japan in 2016 Alibaba Group and Alibaba Cloud

Cross- Border EC Domestic EC Intellige nt Office Solution s
Search Engine Model as a Service (Maas) Lifestyl e Servic es Healthcare Entertainm ent Trav el Map Services AIDC International Sites OKKI Alimama Marketing Platform Text-to-Image Generation Product Management Production Origin Assistant Data Insights Marketing Assistant General Search Healthcare Tongyi AI Model Family Smart Park & Devices Location & Operations Analytics Pre-Sales Consulting Customer Service QA Hyper-Realistic Digital Humans Virtual Production Real-Time Data Insights Precision Marketing AI Health Assistant Structured Medical Records Precision Medicine Medication Guidance Assistant Operation Assistant Taobao & Tmall DingTalk Quark Ele.me Alibaba Health Damai Entertainment Youku Fliggy Amap HR Assistant Alibaba Group 1688 Live Multilingual Translation Business Assistant User Understanding Logistics Assistant Ask Taobao Merchant Assistant Search Optimization Customization Meeting Assistant Q & A Schedule Coordination Document Assistant Office Work Education & Learning Recipe Analysis Food Safety Management High-Precision User Targeting Monitoring Real-Time Effects AI-Powered Short Video Creation AI Portraits Semi-Managed Internal Services Multilingual Translation Traffic Doctor Traffic Health Assessment Spatiotemporal Smart City ALL IN AI Logistics Cainiao WMS Assistant Payment Assistant Customer Service Assistant Pangu Assistant Local Community Assistant Station Manager Assistant

Diverse Generative AI Models • Diverse in-house models offered as
open weights and API versions • Optimal balance of accuracy and cost Models available for use Cost-Optimized and Scalable Infrastructure • Environment optimized for AI model inference • Optimized costs for AI model training and fine-tuning environments • High cost-performance and scalable cloud platform Industry-Specific AI Solutions • AI Agent adoption in enterprise • Compliant with industry-specific regulations and requirements • Future next-gen physical AI initiatives (including Robotics) 5 Alibaba Cloud's Generative AI Strategy

Alibaba's Generative AI Model Family Fun Qwen-Flash Qwen-Max Qwen-Plus LLMs
Multimodal Qwen-Audio Qwen-Image Qwen-Omni Qwen-VL Qwen-TTS Specialized Models Qwen-Coder Qwen-Embedding Qwen-Math Qwen-MT Multimodal Qwen-Audio Video Generation Wan- FLF2V Wan- VACE Wan-T2V Wan-TI2V Scenario Models Sketch-to-Image Inpainting Digital Human Image-to-Video Music Generation ASR Fun-ASR TTS CosyVoice Diverse models including multimodal 6

Qwen: The World's Most Widely Used Open-Weight Model Family No.1
Number of Derived Models 200,000+ World's Largest Open-Weight Model Open-Weight Model Downloads 1 Billion Number of Open- Weight Models 400+ 7 12 1 State-of-the-Art Architecture 397B parameters, 17B activated. 2 Multilingual Multimodal Model Supports 201 languages. Capable of processing 2-hour videos. 3 Agent-Centric Strong agent understanding capabilities with various tool integrations.

High Japanese Accuracy, Fully Viable for Real Business Use ※https://huggingface.co/spaces/llm-jp/open-japanese-llm-leaderboard
8

Qwen3.6-Plus: Towards Real-World Agents Long Context Understanding Next-Gen Autonomous Development
Capabilities Enhanced Cross-Modal Recognition and Reasoning 1M Token Context Agentic Coding Multimodal Intelligence 9

Qwen 3.6 Model Comparison International Region (Singapore) Official Pricing As
of April 2026 MAX Performance-First Qwen3.6-Max (Preview) Input $1.3/1M Output $7.8/1M Context: 262,144 tokens Strengths • #1 in 6 benchmarks worldwide • SWE-Bench Pro 58.4% • Highest coding accuracy • Supports preserve_thinking Best for: When you need ultimate coding quality PLUS Balance-Focused Qwen3.6-Plus Input $0.5/1M Output $3.0/1M Context: 1,000,000 tokens Strengths • Maintains 1M context • Image & video multimodal support • #1 in MCPMark Agent benchmark • Supports 201 languages (including Japanese) Best for: When balancing cost, performance, and multilingual needs FLASH Cost-First Qwen3.6-Flash Input $0.25/1M Output $1.5/1M Context: 1,000,000 tokens Strengths • Maintains 1M context • Supports context caching • Optimal for large batch processing • Generally available (04-16) Best for: Processing large volumes of classification, translation, and extraction requests Source: alibabacloud.com/help/en/model-studio/models (Flagship models / International tab

Use Cases: Cost Optimization for Vibe Coding Traditional Challenges •
Using expensive models for all tasks • High costs even for simple tasks • API costs straining budgets → Solution through Routing • Routine tasks cost-reduced with Qwen • Third-party models used only for complex reasoning • Long context handled with Qwen's 1M tokens Routing to multiple LLMs based on task difficulty Task Type Example Model Used Estimated Cost Routine Coding Qwen3.6-Plus Approx. 1/8 (vs. competitors) Background Processing Qwen2.5-Coder（Ollama） Free Complex Reasoning & Planning Other Models Only when needed Long Context (60K+) Qwen3.6-Plus (1M tokens) Long-Text Specialized Web Search Integration Qwen2.5-VL Lightweight & Fast Significant Cost Reduction

Text Generation Model - Tongyi Xingchen Roleplay-Specialized LLM Japanese-specialized conversational
roleplay LLM - Xingchen. Model ID: qwen-plus-character-ja ▪ Features １）32k context window 2) Strictly adheres to character settings, preventing character breakage and meta-comments 3) Smooth conversational experience with distinct personality and style Example System Prompt: Her name is "Suzuho". Suzuho loves sweets and has a gentle but slightly clumsy personality. When she takes her fox demon form, she awakens the power to control fire and unconsciously projects an intimidating presence. She usually lives in the human world, but sometimes her true identity is about to be revealed. Please answer using Suzuho's identity. Responses must be within 50 characters. 12

WAN) Multimodal Image / Video Generation Model Family 390M Number
of Generated Images by Sep,2025 Total Downloads 30M Number of Generated Videos by Sep,2025 70M Video Generation Image Generation World Model

Video Generation – WAN 2.7 Video Reference: wan2.7-r2v ① 1
Multi-Subject Reference + Audio Customization (Multi-Character Dialogue) 「Img2 holds Img4, sitting on Img5's chair, playing a smoothing folk song with guitar: "The sunshine is so nice today." Img1 holds Img3, passes by Img2, places Img3 on the table: “That sounds lovely, could you sing it again?” 」 Img1 Img2 Img3 Img4 Img5 Output Video ② 2 Multi-Panel Storyboard (Romantic Proposal Scene) 「Reference the image, cinematic romantic style, dusk seaside scene, consistent character appearance, no text. Atmosphere: romantic, tender, surprise, sunset.」 Multi-Panel Input Output Video 14

Video Generation – WAN 2.7 Video Editing: wan2.7-videoedit ① Change
element (replace film reel with plate) Input Video Reference Image Output Video ② Remove element (remove train) Remove the train from the video, keep everything else unchanged Input Video Output Video ③ Change environment (overcast to sunny) Keep character actions unchanged, change the scene from overcast to sunny, keep everything else the same Input Video Output Video ④ Replicate motion (reproduce hand gestures) Make the person mimic the hand gestures from the video, keeping both hands coordinated and transitions visible Input Video Reference Image Output Video ⑤ Change style (convert to felt wool style) Convert the entire scene to felt wool style Input Video Output Video ⑥ Change camera (change to steady forward dolly shot) Change camera to a steady forward dolly, focusing on the flower in the woman's hand for a close -up Input Video Output Video 15

HappyHorse 1.0 World's #1 AI Video Generation Model Instantly generates
cinematic videos from text and images. Next-generation AI revolutionizing advertising, e- commerce, and social media marketing, available on Alibaba Cloud. #1 Global Ranking 1080p HD Output Max 15 sec Max Generation Duration Powered by Alibaba Cloud Model Studio

What is HappyHorse? A next-generation AI video generation model developed
by Alibaba ATH (Alibaba Token Hub) AI Innovation Unit. Global Ranking #1 Achieved top-tier performance across all 4 categories in Artificial Analysis Video Arena (April 2026) Cinematic Quality Delivers shallow depth of field, multi-shot consistency, and high-quality texture representation Instantly Available on Alibaba Cloud Easily integrates into enterprise systems via Model Studio API. Enterprise SLA supported Artificial Analysis Video Arena T2V (No Audio) 1st Place 1389 Elo T2V (With Audio) 1st Place — I2V (No Audio) 1st Place 1416 Elo I2V (With Audio) 1st Place — * As of April 2026

HappyHorse-1.0 HappyHorse 1.0 supports two core capabilities: multimodal video generation
and video editing. Generation includes T2V, I2V, and R2V, enabling video creation from scratch and creative expansion of existing assets. Model Feature Input / Output Price happyhorse-1.0-i2v Generates realistic and smooth video from the first frame image according to text Image + Text -> Video 720P: ¥0.9/secon d 1080P: ¥1.6/secon d Free trial allowance: 10 seconds happyhorse-1.0-t2v Generates physically realistic and smooth video from text prompts Text -> Video happyhorse-1.0-r2v Uses reference images and text prompts to integrate reference subjects into smooth video Reference Images + Text -> Video happyhorse-1.0-video-edit Performs editing such as style transformation and local replacement using text and optional reference images Video + Text (+ Reference Images) -> Video Note: HappyHorse supports Mandarin, Cantonese, English, Japanese, Korean, German, and French by default.

Image Generation Text-to-Image and Image-to-Image Generation Ultra-Fast Generation! Z-image Photo-level
photorealistic quality, detailed image rendering, natural and realistic light and shadow WAN2.7-image Professional infographics, refined photorealism Qwen-image-2.0

Image Generation - WAN 2.7 A Unified New Paradigm for
Image Generation and Editing Model Positioning Resolution Core Features Wan2.7-image Integrated Image Generation & Editing Model 1K/2K Text-to-Image, Text-to-Multi-Image, Image-to-Multi-Image, Instruction Editing, Interactive Editing Wan2.7-image-pro Flagship Version 1K/2K/4K All Features + 4K HD Output 21

Image Generation and Editing Achieve consistent visual storytelling with up to 12 images. Batch generation with text prompts produces a unified image series. 1 Logical Storyboard (Input Image + Prompt -> Sequential Frames) Use Cases: Poster Series / Comics / PPT / Photo Series / E-commerce / Picture Books / Ad Storyboards Input Image Output 1 Output 2 Output 3 Output 4 “A four-step handcraft process of making a mushroom-shaped figure using the materials shown in Image 1.” Up to 9 reference images. Enables composite editing while maintaining high consistency of ID, IP, and object elements. 2 Cyber Girl Group Fusion Poster Use Cases: Conferences / Movie Posters / Family / Team Photos / Marketing / IP Group Photos Input Images Output 22

Image Generation and Editing Simply click where you want to change. Pixel-level precision with bounding box editing. 1 Multi-Image Bounding Box Replacement ”Replace ice cubes from Img1 with fruits from Img2” Input 1 Input 2 Output 3 Image Fission (Model + Outfit -> Multi-Scene Coordination) Input 1 Input 2 Output 1 Output 2 Output 3 "The girl in Img1 wearing the outfit from Img2, generate 3 images in different scenes." 2 OOTD Coordination Showcase (4 Items -> Styled Look) Input Images Output “Full body shot of a young Asian woman(Img5) modeling a streetwear outfit. She is wearing Img1, Img2, Img3, Img4. Next to her are flat lay images of the clothes: the orange jacket(Img1), the yellow top(Img2), the pants(Img3), and sneakers(Img4). Arrows point to the items with text labels like "JACKET", "SPORTY BRA". Clean white background, high quality, fashion photography.” 4 Extract Reference Colors -> Cross-Category Color Transfer Color styles: Cinematic / Cool / Macaron / Vintage Film / Warm / Morandi / Monochrome / Dopamine / Cyber Neon / Dunhuang Sample prompt: Under a massive lush tree canopy filling the frame, rich foliage, cinematic film texture Input (Reference + Palette) Output 23

Voice Models - Speech Recognition (ASR) High-Accuracy Multi-Language Speech Recognition
Model Voice Messages Short Audio Files (Under 3 Minutes) • qwen3-asr-flash Meeting/Call Recording Live Audio / Voice Calls Real-Time Audio Stream Processing: • qwen3-asr-flash-realtime ✓Emotion Recognition ✓Multi-Language Recognition Support Long Audio Files: • fun-asr ✓ Supports Up to 12 Hours ✓ Speaker Diarization ✓ Word-Level Timestamp Output ✓ Multi-Language Recognition Support 24

Voice Models - Text-to-Speech (TTS) Designed to deliver high-quality, natural-sounding
voice synthesis. Clone Voice Default Voice Custom Voice Sample Audio File qwen-voice-enrollment qwen3-tts-vc A composed middle-aged male announcer with a deep, rich and magnetic voice. Voice Prompt qwen-voice-design Inference Stream Output / File Output Inference Stream Output / File Output qwen3-tts-instruct-flash qwen3-tts-flash qwen3-tts-vd Inference Stream Output / File Output General Scenarios - Call Centers - Text-to-Speech Large amount of original voice needed (e.g., RPG apps) Clone a specific voice (e.g., dubbing scenes) 25

26 We are planning a major campaign for Japanese users
soon! Qoder: AI Agent Product Qoder(IDE) Smart IDE with AI Agent. QoderWork (Desktop App) Security-focused, desktop- native AI Agent.

Token Plan Team Edition Provided by Alibaba Cloud Model Studio
Team-Oriented AI Large Model Subscription Supports text generation and image generation models with Credits as a unified billing unit. Integrates with mainstream AI coding and agent tools for centralized management of team-wide usage. "One Subscription, Unified AI Experience Management for the Entire Team" Flexible Model Switching Switch multiple models on demand Multi-Tool Support Supports 10+ AI tools No Data Training Conversation data is not used for training Stable Operation at Peak Times Multi-Tenant Isolation Architecture * Currently available only in the Singapore region

Advanced Global Case Studies 28 World-Class Luxury Vehicle & Motorcycle
Manufacturer AI Agent for Intelligent Vehicles Powered by Qwen LLM Digital Ecosystem Integration Two AI Agents: Car Genius and Travel Companion Multi-Agent Coordination Human-Like Response Capability National Program Launched by NRF (Singapore) Advanced Multilingual AI for Southeast Asia SEA-HELM On the Leaderboard Ranked #1 Ease of Deployment and Cost Efficiency No.1 119 Multilingual Support

Alibaba Cloud’s “AI + Cloud Strategy” and Initi...

Alibaba Cloud’s “AI + Cloud Strategy” and Initiatives

Open Data Circle

More Decks by Open Data Circle

Other Decks in Technology

Featured

Transcript

Alibaba Cloud’s “AI + Cloud Strategy” and Initiatives Speaker: Yuichi

About Me Alibaba Cloud AISolutions Architect Yuichi Fujikawa Hobby: Travel

3 Alibaba Group Founded in 1999. Provides a wide range

Cross- Border EC Domestic EC Intellige nt Office Solution s

Diverse Generative AI Models • Diverse in-house models offered as

Alibaba's Generative AI Model Family Fun Qwen-Flash Qwen-Max Qwen-Plus LLMs

Qwen: The World's Most Widely Used Open-Weight Model Family No.1

High Japanese Accuracy, Fully Viable for Real Business Use ※https://huggingface.co/spaces/llm-jp/open-japanese-llm-leaderboard

Qwen3.6-Plus: Towards Real-World Agents Long Context Understanding Next-Gen Autonomous Development

Qwen 3.6 Model Comparison International Region (Singapore) Official Pricing As

Use Cases: Cost Optimization for Vibe Coding Traditional Challenges •

Text Generation Model - Tongyi Xingchen Roleplay-Specialized LLM Japanese-specialized conversational

WAN) Multimodal Image / Video Generation Model Family 390M Number

Video Generation – WAN 2.7 Video Reference: wan2.7-r2v ① 1

Video Generation – WAN 2.7 Video Editing: wan2.7-videoedit ① Change

HappyHorse 1.0 World's #1 AI Video Generation Model Instantly generates

What is HappyHorse? A next-generation AI video generation model developed

HappyHorse-1.0 HappyHorse 1.0 supports two core capabilities: multimodal video generation

Image Generation Text-to-Image and Image-to-Image Generation Ultra-Fast Generation! Z-image Photo-level

Image Generation - WAN 2.7 A Unified New Paradigm for

Image Generation - WAN 2.7 A Unified New Paradigm for

Image Generation - WAN 2.7 A Unified New Paradigm for

Voice Models - Speech Recognition (ASR) High-Accuracy Multi-Language Speech Recognition

Voice Models - Text-to-Speech (TTS) Designed to deliver high-quality, natural-sounding

26 We are planning a major campaign for Japanese users

Token Plan Team Edition Provided by Alibaba Cloud Model Studio

Advanced Global Case Studies 28 World-Class Luxury Vehicle & Motorcycle

29