Wan 2.1 is Alibaba’s latest open-source AI model for text-to-video generation, offering advanced capabilities in creating high-quality, realistic videos from textual descriptions. It builds on previous versions with enhanced multimodal support, allowing users to generate and edit videos using text and image references. The model ranks highly on VBench, a benchmark for video generative models, demonstrating its ability to handle complex interactions and dynamic scenes.
The key features of Wan 2.1 include:
Multilingual Support: You can generate videos in English and Chinese with accurate text effects.
Diverse Input Modalities: Wan 2.1 accepts text and images for flexible content creation.
High-Quality Output: Wan 2.1 produces cinematic-grade videos with detailed textures and smooth transitions.
Advanced Editing: Wan 2.1 allows fine-tuning with image and video references for better control.
Open-Source Access: Wan 2.1 is available on GitHub and Hugging Face for developers.
Benchmark Performance: Wan 2.1 ranks high on VBench for video realism and multi-object interactions.
Let's walk through the step-by-step process to generate videos with Wan 2.1 on Hyperstack:
Initiate Deployment
Select Hardware Configuration
Choose the Operating System
Select a Keypair
Network Configuration
Enable SSH Access
Add Firewall Rules
Please note: This will open your port to the public internet, allowing anyone with the public IP address and port number to access the dashboard.
Review and Deploy the Script
Refer to the blog's video generation durations section below for more details.
Once the initialisation is complete, you can access your VM:
Connect via SSH
Once the above steps are completed, follow the below steps to generate a video with Wan 2.1:
Once the deployment is complete, access the Stable Diffusion Web UI by navigating to[public-ip]:7860
in your web browser to generate videos with Wan 2.1. For example, open: http://000.00.00.221:7860/
Please note: This link will be accessible to anyone with the link. To restrict access, you can disable the Public IP and use SSH port forwarding instead.
ssh -i [path_to_ssh_key] -L 7860:localhost:7860 [os_username]@[vm_ip_address] # e.g: ssh -i /users/username/downloads/keypair_hyperstack -L 7860:localhost:7860 ubuntu@0.0.0.0
After running the above command, go to localhost:7860 in your browser to access the demo.
Example prompt:
"A white cat walking through a neon-lit pink and blue GPU data centre, exploring curiously with a playful expression. The cat moves gracefully, sniffing and pawing at the colorful cables and equipment. Medium shot, dynamic camera following the cat's movement."
Choose English (EN) for the prompt enhancement.
The Qwen/Qwen2.5-7B-Instruct model will be used to improve your prompt.
Click on ‘Generate Video’ to start the process. Check out the video generation durations below:
Video Generation Durations (based on default Gradio settings with prompt extender):
Please note: All these durations are with the default gradio settings and include using the prompt extender. That's why these durations can differ from the reported durations from Wan2.1 here
And voila, your generated video is ready! See our generated video below.
If you are having any issues, please follow the following instructions:
cat /var/log/cloud-init-output.log
When you're finished with your current workload, you can hibernate your VM to avoid incurring unnecessary costs:
To continue your work without repeating the setup process:
Wan 2.1 is Alibaba’s open-source AI model for text-to-video generation, capable of creating realistic, high-quality videos from text, images, and video references.
Yes, Wan 2.1 allows fine-tuning with image references for greater control over output quality.
You can hibernate your VM when not in use to stop compute billing while keeping your setup intact.
Wan 2.1 is available on GitHub and Hugging Face for developers to explore and integrate into their workflows