Deploying and Using Qwen 2.5 Coder 32B Instruct on Hyperstack: A Quick Start Guide

Written by Sebastian Panman de Wit | Nov 13, 2024 9:10:02 AM

The latest Qwen 2.5 Coder series is a groundbreaking model in code generation, repair and reasoning in sizes ranging from 0.5B to a massive 32B parameter version. The 32B model achieves state-of-the-art performance across multiple benchmarks, matching and even surpassing some open-source models in tasks like code generation (EvalPlus, LiveCodeBench), multi-language repair (MdEval) and user preference alignment (Code Arena). This model is ideal for complex coding tasks across over 40 languages with unmatched precision and support for developers.

Read below how you can deploy the Qwen 2.5 Coder on Hyperstack. Also, we will show you how to integrate this LLM to work as your private coding assistant!

Why Deploy on Hyperstack?

Hyperstack is a cloud platform designed to accelerate AI and machine learning workloads. Here's why it's an excellent choice for deploying Qwen 2.5 Coder 32B Instruct:

Availability: Hyperstack provides access to the latest and most powerful GPUs such as the NVIDIA A100 and the NVIDIA H100 SXM on-demand, specifically designed to handle large language models.
Ease of Deployment: With pre-configured environments and one-click deployments, setting up complex AI models becomes significantly simpler on our platform.
Scalability: You can easily scale your resources up or down based on your computational needs.
Cost-Effectiveness: You pay only for the resources you use with our cost-effective cloud GPU pricing.
Integration Capabilities: Hyperstack provides easy integration with popular AI frameworks and tools.

Deployment Process

Now, let's walk through the step-by-step process of deploying Qwen 2.5 Coder 32B Instruct on Hyperstack.

Step 1: Accessing Hyperstack

Go to the Hyperstack website and log in to your account.
If you're new to Hyperstack, you'll need to create an account and set up your billing information. Check our documentation to get started with Hyperstack.
Once logged in, you'll be greeted by the Hyperstack dashboard, which provides an overview of your resources and deployments.

Step 2: Deploying a New Virtual Machine

Initiate Deployment

Look for the "Deploy New Virtual Machine" button on the dashboard.
Click it to start the deployment process.

Select Hardware Configuration

In the hardware options, choose the "2xA100-80G-PCIe" flavour.

Choose the Operating System

Select the "Server 22.04 LTS R535 CUDA 12.2 with Docker".
This image comes pre-installed with Ubuntu 22.04 LTS and NVIDIA drivers (R535) along with CUDA 12.2, and Docker installed, providing an optimised environment for AI workloads.

Select a keypair

Select one of the keypairs in your account. Don't have a keypair yet? See our Getting Started tutorial for creating one.

Network Configuration

Ensure you assign a Public IP to your Virtual machine.
This allows you to access your VM from the internet, which is crucial for remote management and API access.

Enable SSH Access

Make sure to enable an SSH connection.
You'll need this to securely connect and manage your VM.

Configure Additional Settings

Look for an "Additional Settings" or "Advanced Options" section.
Here, you'll find a field for cloud-init scripts. This is where you'll paste the initialisation script. Click here to get the cloud-init script!
Ensure the script is in bash syntax. This script will automate the setup of your Qwen 2.5 Coder 32B Instruct environment.

DISCLAIMER: This tutorial will deploy the Qwen 2.5 Coder once for demo-ing purposes. For production environments, consider using production-grade deployments with API keys, secret management, monitoring etc.

Review and Deploy

Double-check all your settings.
Click the "Deploy" button to launch your virtual machine.

Step 3: Initialisation and Setup

After deploying your VM, the cloud-init script will begin its work. This process typically takes about 7 minutes. During this time, the script performs several crucial tasks:

Dependencies Installation: Installs all necessary libraries and tools required to run Qwen 2.5 Coder 32B Instruct.
Model Download: Fetches the Qwen 2.5 Coder 32B Instruct model files from the specified repository.
API Setup: Configures the vLLM engine and sets up an OpenAI-compatible API endpoint on port 8000.

While waiting, you can prepare your local environment for SSH access and familiarise yourself with the Hyperstack dashboard.

Step 4: Accessing Your VM

Once the initialisation is complete, you can access your VM:

Locate SSH Details

In the Hyperstack dashboard, find your VM's details.
Look for the public IP address, which you will need to connect to your VM with SSH.

Connect via SSH

Open a terminal on your local machine.
Use the command ssh -i [path_to_ssh_key] [os_username]@[vm_ip_address] (e.g: ssh -i /users/username/downloads/keypair_hyperstack ubuntu@0.0.0.0.0)
Replace username and ip_address with the details provided by Hyperstack.

Interacting with Qwen 2.5 Coder 32B Instruct

To access and experiment with Meta's latest model, SSH into your machine after completing the setup. If you are having trouble connecting with SSH, watch our recent platform tour video (at 4:08) for a demo. Once connected, use this API call on your machine to start using the Qwen 2.5 Coder 32B Instruct.

 MODEL_NAME="Qwen/Qwen2.5-Coder-32B-Instruct"
curl -X POST http://localhost:8000/v1/chat/completions \
    -H "Content-Type: application/json" \
    -d '{
        "model": "'$MODEL_NAME'",
        "messages": [
            {
                "role": "user",
                "content": "Hi, how to write a Python function that prints \"Hyperstack is the greatest GPU Cloud platform\""
            }
        ]
    }'

If the API is not working after ~10 minutes, please refer to our 'Troubleshooting Qwen 2.5 Coder 32B Instruct section below.

Troubleshooting Qwen 2.5 Coder 32B Instruct

If you are having any issues, you might need to restart your machine before calling the API:

Run sudo reboot inside your VM
Wait 5-10 minutes for the VM to reboot
SSH into your VM
Wait ~3 minutes for the LLM API to boot up
Run the above API call again

If you are still having issues, try:

Run docker ps and find the container_id of your API container
Run docker logs [container_id] to see the logs of your container
Use the logs to debug any issues

Integrating your self-hosted Qwen 2.5 Coder LLM in VScode

Using a self-hosted LLM as a coding assistant can ensure full data privacy and control, keeping sensitive code on your infrastructure without third-party exposure. If you'd like to integrate this self-hosted LLM with VSCode for code completions and code assistant chats, follow the instructions below.

1. Open port 8000 on your machine: Follow the instructions [here] to open port 8000. Be aware that this will expose port 8000 to the public internet, allowing access to the dashboard via the public IP address and port number. If you prefer to limit access, you can configure your VM to restrict which IP addresses are permitted on port 8000.

2. Launch VSCode: Open your Visual Studio Code editor to proceed with the integration.

3. Install the 'Continue' extension: Go to the Extensions tab in VSCode and search for the 'Continue' extension. Install it to proceed with the setup.

4. Add the Chat model in the 'Continue' extension: On the left sidebar, click on the 'Continue' extension. At the top-left of the window, select ‘Add Chat model’ to start the configuration.

5. Modify the config file: In the configuration dialog, click the "config file" button at the bottom, which says, “This will update your config file.” This will allow you to edit the configuration file

6. Update the configs.json file: In the configs.json file, input your model information, replacing [public-ip] it with the public IP address of your Hyperstack VM. Don't forget to save the file by pressing CMD + S.

"models": [
    {
      "provider": "openai",
      "title": "vLLM hosted on Hyperstack",
      "apiBase": "http://[public-ip]:8000/v1",
      "model": "Qwen/Qwen2.5-Coder-32B-Instruct"
    }
  ],
  "tabAutocompleteModel": {
    "provider": "openai",
    "title": "vLLM hosted on Hyperstack",
    "apiBase": "http://[public-ip]:8000/v1",
    "model": "Qwen/Qwen2.5-Coder-32B-Instruct",
    "apiKey": "None",
    "completionOptions": {
      "stop": ["<|endoftext|>", "\n"]
    }
  },

7. Interact with the self-hosted LLM: On the left sidebar, you'll now be able to chat with your self-hosted LLM. Refer to the attached image for a sample chat interface.

8. Accept code suggestions: When you receive code suggestions in the chat, click the '>' icon at the top of the suggestion. This will apply the changes to your file, highlighting them in green. An example of this is shown in the attached image.

9. Accept code insertions: If you want to accept a code insertion, simply click the 'Accept' label above the inserted code.

10. Enable auto-completion in VSCode settings: To use the auto-complete functionality, add the necessary configuration lines to your VSCode settings file. Open the settings by pressing CTRL + SHIFT + P and selecting settings.json, then add the required lines for auto-complete.

  "github.copilot.editor.enableAutoCompletions": false,
  "editor.inlineSuggest.enabled": true,
  "continue.enableTabAutocomplete": true

With these steps, you're all set to enjoy a fully integrated private coding assistant within VSCode, running on your infrastructure in Hyperstack. This setup ensures full control over your data while providing you with powerful AI-driven code suggestions and completions. We wish you all the best as you boost your development workflow with Qwen 2.5 Coder with Hyperstack.

Step 5: Hibernating Your VM

When you're finished with your current workload, you can hibernate your VM to avoid incurring unnecessary costs:

In the Hyperstack dashboard, locate your Virtual machine.
Look for a "Hibernate" option.
Click to hibernate the VM, which will stop billing for compute resources while preserving your setup.

To continue your work without repeating the setup process:

Return to the Hyperstack dashboard and find your hibernated VM.
Select the "Resume" or "Start" option.
Wait a few moments for the VM to become active.
Reconnect via SSH using the same credentials as before.

FAQs

What is Qwen 2.5 Coder 32B Instruct?

It’s a large language model optimised for complex code generation, repair, and multi-language support.

How can I deploy Qwen 2.5 Coder 32B Instruct on Hyperstack?

Simply follow the deployment steps in our Hyperstack guide for quick setup.

What GPUs are recommended for running Qwen 2.5 Coder 32B Instruct?

NVIDIA A100 and H100 GPUs are ideal for handling the demands of this model.

Can I integrate Qwen 2.5 Coder 32B Instruct with VSCode?

Yes, you can set up the model for code assistance in VSCode using Hyperstack integration steps.

Is Qwen 2.5 Coder 32B Instruct suitable for production use?

Yes, but we recommend adding production-grade features like API keys and monitoring for optimal performance.

Similar Reads:

- Deploying and Using Qwen2-72B on Hyperstack
- Deploying and Using Granite 3.0 8B on Hyperstack

View full post