How to Deploy QwQ 32B: A Comprehensive Guide

Written by Sebastian Panman de Wit | Mar 7, 2025 9:46:09 AM

What is QwQ 32B?

QwQ 32B, a 32.5 billion parameter model from the Qwen series by Alibaba, excels in reasoning tasks like math and coding. With a 131,072-token context, it beats benchmarks like 65.2% on GPQA, 90.6% on MATH-500 and rivals the latest DeepSeek-R1.

The key features of QwQ 32B include:

Transformer Architecture: Incorporates RoPE, SwiGLU, RMSNorm, and attention QKV bias for efficient processing.
Large Context Length: Supports up to 131,072 tokens for handling extended text coherently.
Reasoning Focused: Optimised for math, coding, and logical tasks through reinforcement learning.
Parameter Efficiency: 32.5 billion parameters (31 billion non-embedding) with 64 layers and grouped-query attention.
Open Access: Available on Hugging Face, ModelScope and Qwen Chat with an Apache 2.0 license.

Steps to Deploy QwQ 32B

Now, let's walk through the step-by-step process of deploying QwQ 32B on Hyperstack.

Step 1: Accessing Hyperstack

Go to the Hyperstack website and log in to your account.
If you're new to Hyperstack, you'll need to create an account and set up your billing information. Check our documentation to get started with Hyperstack.
Once logged in, you'll be greeted by the Hyperstack dashboard, which provides an overview of your resources and deployments.

Step 2: Deploying a New Virtual Machine

Initiate Deployment

Look for the "Deploy New Virtual Machine" button on the dashboard.
Click it to start the deployment process.

Select Hardware Configuration

For QwQ 32B GPU requirements, go to the hardware options and choose the "1xA100-80G-PCIe or 1xH100-80G-PCIe.

Choose the Operating System

Select the "Ubuntu Server 22.04 LTS R535 CUDA 12.4 with Docker".

Select a keypair

Select one of the keypairs in your account. Don't have a keypair yet? See our Getting Started tutorial for creating one.

Network Configuration

Ensure you assign a Public IP to your Virtual machine [See the attached screenshot].
This allows you to access your VM from the internet, which is crucial for remote management and API access.

Enable SSH Access

Make sure to enable an SSH connection.
You'll need this to securely connect and manage your VM.

Configure Additional Settings

Look for an "Additional Settings" or "Advanced Options" section.
Here, you'll find a field for cloud-init scripts. This is where you'll paste the initialisation script. Click here to get the cloud-init script!

Please note: this cloud-init script will only enable the API once for demo-ing purposes. For production environments, consider using secure connections, secret management, and monitoring for your API.

Review and Deploy

Double-check all your settings.
Click the "Deploy" button to launch your virtual machine.

Please Note: This deploys QwQ 32B with a maximum context size of 12K tokens, optimised to run on a single GPU. For a larger context size, adjust the 'MAX_MODEL_LEN' variable in the cloud-init file and consider deploying it across multiple GPUs (e.g., 2x A100.

Step 3: Initialisation and Setup

After deploying your VM, the cloud-init script will begin its work. This process typically takes about 5-10 minutes. During this time, the script performs several crucial tasks:

Dependencies Installation: Installs all necessary libraries and tools required to run QwQ 32B.
Model Download: Fetches the QwQ 32B model files from the specified repository.

While waiting, you can prepare your local environment for SSH access and familiarise yourself with the Hyperstack dashboard.

Step 4: Accessing Your VM

Once the initialisation is complete, you can access your VM:

Locate SSH Details

In the Hyperstack dashboard, find your VM's details.
Look for the public IP address, which you will need to connect to your VM with SSH.

Connect via SSH

Open a terminal on your local machine.
Use the command ssh -i [path_to_ssh_key] [os_username]@[vm_ip_address] (e.g: ssh -i /users/username/downloads/keypair_hyperstack ubuntu@0.0.0.0.0)
Replace username and ip_address with the details provided by Hyperstack.

Interacting with QwQ 32B

To access and experiment with Alibaba's latest model, SSH into your machine after completing the setup. If you have trouble connecting with SSH, watch our recent platform tour video (at 4:08) for a demo. Once connected, use this API call on your machine to start using the QwQ 32B:


# Test the API (wait +- 7 minutes for model download and start up)
MODEL_NAME="Qwen/QwQ-32B"
curl -X POST http://localhost:8000/v1/chat/completions \
    -H "Content-Type: application/json" \
    -d '{
        "model": "'$MODEL_NAME'",
        "messages": [
            {
                "role": "user",
                "content": "Hi, how to write a Python function that prints \"Hyperstack is the greatest GPU Cloud platform\""
            }
        ]
    }'

If the API is not working after ~10 minutes, please refer to our 'Troubleshooting QwQ 32B section below.

Troubleshooting QwQ 32B

Step 5: Hibernating Your VM

When you're finished with your current workload, you can hibernate your VM to avoid incurring unnecessary costs:

In the Hyperstack dashboard, locate your Virtual machine.
Look for a "Hibernate" option.
Click to hibernate the VM, which will stop billing for compute resources while preserving your setup.

Why Deploy QwQ 32B on Hyperstack?

Hyperstack is a cloud platform designed to accelerate AI and machine learning workloads. Here's why it's an excellent choice for deploying QwQ 32B:

Availability: Hyperstack provides access to the latest and most powerful GPUs such as the NVIDIA H100 on-demand, specifically designed to handle large language models.
Ease of Deployment: With pre-configured environments and one-click deployments, setting up complex AI models becomes significantly simpler on our platform.
Scalability: You can easily scale your resources up or down based on your computational needs.
Cost-Effectiveness: You pay only for the resources you use with our cost-effective cloud GPU pricing.
Integration Capabilities: Hyperstack provides easy integration with popular AI frameworks and tools.

FAQs

What is QwQ 32B?

QwQ 32B, the latest Alibaba model is designed to tackle advanced reasoning challenges, especially in math and coding domains. It provides step-by-step solutions for complex problems and generates reliable, optimised code.

How many parameters does QwQ 32B have?

The QwQ 32B features 32.5 billion parameters, making it a medium-sized yet powerful LLM.
Out of these, 31 billion are non-embedding, driving its reasoning and task performance.
It competes effectively with larger models due to its efficient design and training.

What is the maximum context length of QwQ 32B?

QwQ 32B has a maximum context length of 131,072 tokens. This enables it to maintain coherence across lengthy documents or conversations. It’s ideal for tasks requiring deep contextual understanding over extended inputs.

What is the performance of QwQ 32B?

The QwQ 32B beats benchmarks like 65.2% on GPQA and 90.6% on MATH-500 and rivals the latest DeepSeek-R1.

What are the GPU requirements for QwQ 32B?

We recommend using 1xNVIDIA A100-80G-PCIe or 1xNVIDIA H100-80G-PCIe to deploy the QwQ 32B model on Hyperstack. Sign up here to access the GPUs.

View full post

How to Deploy QwQ 32B: A Comprehensive Guide

Steps to Deploy QwQ 32B

Step 1: Accessing Hyperstack

Step 2: Deploying a New Virtual Machine

Step 3: Initialisation and Setup

Step 4: Accessing Your VM

Interacting with QwQ 32B

Troubleshooting QwQ 32B

Step 5: Hibernating Your VM

Why Deploy QwQ 32B on Hyperstack?

You May Also Like to Explore

FAQs

What is QwQ 32B?

How many parameters does QwQ 32B have?

What is the maximum context length of QwQ 32B?

What is the performance of QwQ 32B?

What are the GPU requirements for QwQ 32B?