Hyperstack - Tutorials

Deploying and Using Qwen 2.5 Coder 32B Instruct on Hyperstack: A Quick Start Guide

Written by Sebastian Panman de Wit | Nov 13, 2024 9:10:02 AM

The latest Qwen 2.5 Coder series is a groundbreaking model in code generation, repair and reasoning in sizes ranging from 0.5B to a massive 32B parameter version. The 32B model achieves state-of-the-art performance across multiple benchmarks, matching and even surpassing some open-source models in tasks like code generation (EvalPlus, LiveCodeBench), multi-language repair (MdEval) and user preference alignment (Code Arena). This model is ideal for complex coding tasks across over 40 languages with unmatched precision and support for developers. 

Read below how you can deploy the Qwen 2.5 Coder on Hyperstack. Also, we will show you how to integrate this LLM to work as your private coding assistant!

Why Deploy on Hyperstack?

Hyperstack is a cloud platform designed to accelerate AI and machine learning workloads. Here's why it's an excellent choice for deploying Qwen 2.5 Coder 32B Instruct:

  • Availability: Hyperstack provides access to the latest and most powerful GPUs such as the NVIDIA A100 and the NVIDIA H100 SXM on-demand, specifically designed to handle large language models. 
  • Ease of Deployment: With pre-configured environments and one-click deployments, setting up complex AI models becomes significantly simpler on our platform. 
  • Scalability: You can easily scale your resources up or down based on your computational needs.
  • Cost-Effectiveness: You pay only for the resources you use with our cost-effective cloud GPU pricing
  • Integration Capabilities: Hyperstack provides easy integration with popular AI frameworks and tools.

Deployment Process

Now, let's walk through the step-by-step process of deploying Qwen 2.5 Coder 32B Instruct on Hyperstack.

Step 1: Accessing Hyperstack

  1. Go to the Hyperstack website and log in to your account.
  2. If you're new to Hyperstack, you'll need to create an account and set up your billing information. Check our documentation to get started with Hyperstack.
  3. Once logged in, you'll be greeted by the Hyperstack dashboard, which provides an overview of your resources and deployments.

Step 2: Deploying a New Virtual Machine

Initiate Deployment

  1. Look for the "Deploy New Virtual Machine" button on the dashboard.
  2. Click it to start the deployment process.

Select Hardware Configuration

  1. In the hardware options, choose the "2xA100-80G-PCIe" flavour.

Choose the Operating System

  1. Select the "Server 22.04 LTS R535 CUDA 12.2 with Docker".
  2. This image comes pre-installed with Ubuntu 22.04 LTS and NVIDIA drivers (R535) along with CUDA 12.2, and Docker installed, providing an optimised environment for AI workloads.

Select a keypair

  1. Select one of the keypairs in your account. Don't have a keypair yet? See our Getting Started tutorial for creating one.

Network Configuration

  1. Ensure you assign a Public IP to your Virtual machine.
  2. This allows you to access your VM from the internet, which is crucial for remote management and API access.

Enable SSH Access

  1. Make sure to enable an SSH connection.
  2. You'll need this to securely connect and manage your VM.

Configure Additional Settings

  1. Look for an "Additional Settings" or "Advanced Options" section.
  2. Here, you'll find a field for cloud-init scripts. This is where you'll paste the initialisation script. Click here to get the cloud-init script! 
  3. Ensure the script is in bash syntax. This script will automate the setup of your Qwen 2.5 Coder 32B Instruct environment.

DISCLAIMER: This tutorial will deploy the Qwen 2.5 Coder once for demo-ing purposes. For production environments, consider using production-grade deployments with API keys, secret management, monitoring etc.

Review and Deploy

  1. Double-check all your settings.
  2. Click the "Deploy" button to launch your virtual machine.

Step 3: Initialisation and Setup

After deploying your VM, the cloud-init script will begin its work. This process typically takes about 7 minutes. During this time, the script performs several crucial tasks:

  1. Dependencies Installation: Installs all necessary libraries and tools required to run Qwen 2.5 Coder 32B Instruct.
  2. Model Download: Fetches the Qwen 2.5 Coder 32B Instruct model files from the specified repository.
  3. API Setup: Configures the vLLM engine and sets up an OpenAI-compatible API endpoint on port 8000.

While waiting, you can prepare your local environment for SSH access and familiarise yourself with the Hyperstack dashboard.

Step 4: Accessing Your VM

Once the initialisation is complete, you can access your VM:

Locate SSH Details

  1. In the Hyperstack dashboard, find your VM's details.
  2. Look for the public IP address, which you will need to connect to your VM with SSH.

Connect via SSH

  1. Open a terminal on your local machine.
  2. Use the command ssh -i [path_to_ssh_key] [os_username]@[vm_ip_address] (e.g: ssh -i /users/username/downloads/keypair_hyperstack ubuntu@0.0.0.0.0)
  3. Replace username and ip_address with the details provided by Hyperstack.

Interacting with Qwen 2.5 Coder 32B Instruct

To access and experiment with Meta's latest model, SSH into your machine after completing the setup. If you are having trouble connecting with SSH, watch our recent platform tour video (at 4:08) for a demo. Once connected, use this API call on your machine to start using the Qwen 2.5 Coder 32B Instruct. 

 MODEL_NAME="Qwen/Qwen2.5-Coder-32B-Instruct"
curl -X POST http://localhost:8000/v1/chat/completions \
    -H "Content-Type: application/json" \
    -d '{
        "model": "'$MODEL_NAME'",
        "messages": [
            {
                "role": "user",
                "content": "Hi, how to write a Python function that prints \"Hyperstack is the greatest GPU Cloud platform\""
            }
        ]
    }'

If the API is not working after ~10 minutes, please refer to our 'Troubleshooting Qwen 2.5 Coder 32B Instruct section below.

Troubleshooting Qwen 2.5 Coder 32B Instruct