What is QwQ 32B?
QwQ 32B, a 32.5 billion parameter model from the Qwen series by Alibaba, excels in reasoning tasks like math and coding. With a 131,072-token context, it beats benchmarks like 65.2% on GPQA, 90.6% on MATH-500 and rivals the latest DeepSeek-R1.
Now, let's walk through the step-by-step process of deploying QwQ 32B on Hyperstack.
Initiate Deployment
Select Hardware Configuration
Choose the Operating System
Select a keypair
Network Configuration
Enable SSH Access
Configure Additional Settings
Please note: this cloud-init script will only enable the API once for demo-ing purposes. For production environments, consider using secure connections, secret management, and monitoring for your API.
Review and Deploy
Please Note: This deploys QwQ 32B with a maximum context size of 12K tokens, optimised to run on a single GPU. For a larger context size, adjust the 'MAX_MODEL_LEN' variable in the cloud-init file and consider deploying it across multiple GPUs (e.g., 2x A100.
After deploying your VM, the cloud-init script will begin its work. This process typically takes about 5-10 minutes. During this time, the script performs several crucial tasks:
While waiting, you can prepare your local environment for SSH access and familiarise yourself with the Hyperstack dashboard.
Once the initialisation is complete, you can access your VM:
Locate SSH Details
Connect via SSH
To access and experiment with Alibaba's latest model, SSH into your machine after completing the setup. If you have trouble connecting with SSH, watch our recent platform tour video (at 4:08) for a demo. Once connected, use this API call on your machine to start using the QwQ 32B:
# Test the API (wait +- 7 minutes for model download and start up)
MODEL_NAME="Qwen/QwQ-32B"
curl -X POST http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "'$MODEL_NAME'",
"messages": [
{
"role": "user",
"content": "Hi, how to write a Python function that prints \"Hyperstack is the greatest GPU Cloud platform\""
}
]
}'
If the API is not working after ~10 minutes, please refer to our 'Troubleshooting QwQ 32B section below.
If you are having any issues, please follow the following instructions:
SSH into your VM.
Check the cloud-init logs with the following command: cat /var/log/cloud-init-output.log
When you're finished with your current workload, you can hibernate your VM to avoid incurring unnecessary costs:
Hyperstack is a cloud platform designed to accelerate AI and machine learning workloads. Here's why it's an excellent choice for deploying QwQ 32B:
QwQ 32B, the latest Alibaba model is designed to tackle advanced reasoning challenges, especially in math and coding domains. It provides step-by-step solutions for complex problems and generates reliable, optimised code.
The QwQ 32B features 32.5 billion parameters, making it a medium-sized yet powerful LLM.
Out of these, 31 billion are non-embedding, driving its reasoning and task performance.
It competes effectively with larger models due to its efficient design and training.
QwQ 32B has a maximum context length of 131,072 tokens. This enables it to maintain coherence across lengthy documents or conversations. It’s ideal for tasks requiring deep contextual understanding over extended inputs.
The QwQ 32B beats benchmarks like 65.2% on GPQA and 90.6% on MATH-500 and rivals the latest DeepSeek-R1.
We recommend using 1xNVIDIA A100-80G-PCIe or 1xNVIDIA H100-80G-PCIe to deploy the QwQ 32B model on Hyperstack. Sign up here to access the GPUs.