TABLE OF CONTENTS
Updated: 7 Mar 2025
NVIDIA A100 GPUs On-Demand
In our latest tutorial, we take you through the step-by-step process of deploying QwQ 32B on Hyperstack. QwQ 32B, Alibaba’s powerful 32.5 billion-parameter model, excels in math, coding, and logical problem-solving with an impressive 131,072-token context length. Whether you're tackling complex calculations, generating optimised code, or building AI-powered applications, this guide will help you get started seamlessly.
With Hyperstack’s high-performance GPUs and streamlined deployment process, you can launch QwQ 32B easily. Let’s get started!
What is QwQ 32B?
QwQ 32B, a 32.5 billion parameter model from the Qwen series by Alibaba, excels in reasoning tasks like math and coding. With a 131,072-token context, it beats benchmarks like 65.2% on GPQA, 90.6% on MATH-500 and rivals the latest DeepSeek-R1.
- Transformer Architecture: Incorporates RoPE, SwiGLU, RMSNorm, and attention QKV bias for efficient processing.
- Large Context Length: Supports up to 131,072 tokens for handling extended text coherently.
- Reasoning Focused: Optimised for math, coding, and logical tasks through reinforcement learning.
- Parameter Efficiency: 32.5 billion parameters (31 billion non-embedding) with 64 layers and grouped-query attention.
- Open Access: Available on Hugging Face, ModelScope and Qwen Chat with an Apache 2.0 license.
Steps to Deploy QwQ 32B
Now, let's walk through the step-by-step process of deploying QwQ 32B on Hyperstack.
Step 1: Accessing Hyperstack
- Go to the Hyperstack website and log in to your account.
- If you're new to Hyperstack, you'll need to create an account and set up your billing information. Check our documentation to get started with Hyperstack.
- Once logged in, you'll be greeted by the Hyperstack dashboard, which provides an overview of your resources and deployments.
Step 2: Deploying a New Virtual Machine
Initiate Deployment
- Look for the "Deploy New Virtual Machine" button on the dashboard.
- Click it to start the deployment process.
Select Hardware Configuration
- For QwQ 32B GPU requirements, go to the hardware options and choose the "1xA100-80G-PCIe or 1xH100-80G-PCIe.
Choose the Operating System
- Select the "Ubuntu Server 22.04 LTS R535 CUDA 12.4 with Docker".
Select a keypair
- Select one of the keypairs in your account. Don't have a keypair yet? See our Getting Started tutorial for creating one.
Network Configuration
- Ensure you assign a Public IP to your Virtual machine [See the attached screenshot].
- This allows you to access your VM from the internet, which is crucial for remote management and API access.
Enable SSH Access
- Make sure to enable an SSH connection.
- You'll need this to securely connect and manage your VM.
Configure Additional Settings
- Look for an "Additional Settings" or "Advanced Options" section.
- Here, you'll find a field for cloud-init scripts. This is where you'll paste the initialisation script. Click here to get the cloud-init script!
Please note: this cloud-init script will only enable the API once for demo-ing purposes. For production environments, consider using secure connections, secret management, and monitoring for your API.
Review and Deploy
- Double-check all your settings.
- Click the "Deploy" button to launch your virtual machine.
Please Note: This deploys QwQ 32B with a maximum context size of 12K tokens, optimised to run on a single GPU. For a larger context size, adjust the 'MAX_MODEL_LEN' variable in the cloud-init file and consider deploying it across multiple GPUs (e.g., 2x A100.
Step 3: Initialisation and Setup
After deploying your VM, the cloud-init script will begin its work. This process typically takes about 5-10 minutes. During this time, the script performs several crucial tasks:
- Dependencies Installation: Installs all necessary libraries and tools required to run QwQ 32B.
- Model Download: Fetches the QwQ 32B model files from the specified repository.
While waiting, you can prepare your local environment for SSH access and familiarise yourself with the Hyperstack dashboard.
Step 4: Accessing Your VM
Once the initialisation is complete, you can access your VM:
Locate SSH Details
- In the Hyperstack dashboard, find your VM's details.
- Look for the public IP address, which you will need to connect to your VM with SSH.
Connect via SSH
- Open a terminal on your local machine.
- Use the command ssh -i [path_to_ssh_key] [os_username]@[vm_ip_address] (e.g: ssh -i /users/username/downloads/keypair_hyperstack ubuntu@0.0.0.0.0)
- Replace username and ip_address with the details provided by Hyperstack.
Interacting with QwQ 32B
To access and experiment with Alibaba's latest model, SSH into your machine after completing the setup. If you have trouble connecting with SSH, watch our recent platform tour video (at 4:08) for a demo. Once connected, use this API call on your machine to start using the QwQ 32B:
# Test the API (wait +- 7 minutes for model download and start up)
MODEL_NAME="Qwen/QwQ-32B"
curl -X POST http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "'$MODEL_NAME'",
"messages": [
{
"role": "user",
"content": "Hi, how to write a Python function that prints \"Hyperstack is the greatest GPU Cloud platform\""
}
]
}'
If the API is not working after ~10 minutes, please refer to our 'Troubleshooting QwQ 32B section below.
Troubleshooting QwQ 32B
If you are having any issues, please follow the following instructions:
-
SSH into your VM.
-
Check the cloud-init logs with the following command: cat /var/log/cloud-init-output.log
- Use the logs to debug any issues.
Step 5: Hibernating Your VM
When you're finished with your current workload, you can hibernate your VM to avoid incurring unnecessary costs:
- In the Hyperstack dashboard, locate your Virtual machine.
- Look for a "Hibernate" option.
- Click to hibernate the VM, which will stop billing for compute resources while preserving your setup.
Why Deploy QwQ 32B on Hyperstack?
Hyperstack is a cloud platform designed to accelerate AI and machine learning workloads. Here's why it's an excellent choice for deploying QwQ 32B:
- Availability: Hyperstack provides access to the latest and most powerful GPUs such as the NVIDIA H100 on-demand, specifically designed to handle large language models.
- Ease of Deployment: With pre-configured environments and one-click deployments, setting up complex AI models becomes significantly simpler on our platform.
- Scalability: You can easily scale your resources up or down based on your computational needs.
- Cost-Effectiveness: You pay only for the resources you use with our cost-effective cloud GPU pricing.
- Integration Capabilities: Hyperstack provides easy integration with popular AI frameworks and tools.
You May Also Like to Explore
FAQs
What is QwQ 32B?
QwQ 32B, the latest Alibaba model is designed to tackle advanced reasoning challenges, especially in math and coding domains. It provides step-by-step solutions for complex problems and generates reliable, optimised code.
How many parameters does QwQ 32B have?
The QwQ 32B features 32.5 billion parameters, making it a medium-sized yet powerful LLM.
Out of these, 31 billion are non-embedding, driving its reasoning and task performance.
It competes effectively with larger models due to its efficient design and training.
What is the maximum context length of QwQ 32B?
QwQ 32B has a maximum context length of 131,072 tokens. This enables it to maintain coherence across lengthy documents or conversations. It’s ideal for tasks requiring deep contextual understanding over extended inputs.
What is the performance of QwQ 32B?
The QwQ 32B beats benchmarks like 65.2% on GPQA and 90.6% on MATH-500 and rivals the latest DeepSeek-R1.
What are the GPU requirements for QwQ 32B?
We recommend using 1xNVIDIA A100-80G-PCIe or 1xNVIDIA H100-80G-PCIe to deploy the QwQ 32B model on Hyperstack. Sign up here to access the GPUs.
Subscribe to Hyperstack!
Enter your email to get updates to your inbox every week
Get Started
Ready to build the next big thing in AI?