The latest Qwen 2.5 Coder series is a groundbreaking model in code generation, repair and reasoning in sizes ranging from 0.5B to a massive 32B parameter version. The 32B model achieves state-of-the-art performance across multiple benchmarks, matching and even surpassing some open-source models in tasks like code generation (EvalPlus, LiveCodeBench), multi-language repair (MdEval) and user preference alignment (Code Arena). This model is ideal for complex coding tasks across over 40 languages with unmatched precision and support for developers.
Read below how you can deploy the Qwen 2.5 Coder on Hyperstack. Also, we will show you how to integrate this LLM to work as your private coding assistant!
Hyperstack is a cloud platform designed to accelerate AI and machine learning workloads. Here's why it's an excellent choice for deploying Qwen 2.5 Coder 32B Instruct:
Now, let's walk through the step-by-step process of deploying Qwen 2.5 Coder 32B Instruct on Hyperstack.
Initiate Deployment
Select Hardware Configuration
Choose the Operating System
Select a keypair
Network Configuration
Enable SSH Access
Configure Additional Settings
DISCLAIMER: This tutorial will deploy the Qwen 2.5 Coder once for demo-ing purposes. For production environments, consider using production-grade deployments with API keys, secret management, monitoring etc.
Review and Deploy
After deploying your VM, the cloud-init script will begin its work. This process typically takes about 7 minutes. During this time, the script performs several crucial tasks:
While waiting, you can prepare your local environment for SSH access and familiarise yourself with the Hyperstack dashboard.
Once the initialisation is complete, you can access your VM:
Locate SSH Details
Connect via SSH
To access and experiment with Meta's latest model, SSH into your machine after completing the setup. If you are having trouble connecting with SSH, watch our recent platform tour video (at 4:08) for a demo. Once connected, use this API call on your machine to start using the Qwen 2.5 Coder 32B Instruct.
MODEL_NAME="Qwen/Qwen2.5-Coder-32B-Instruct"
curl -X POST http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "'$MODEL_NAME'",
"messages": [
{
"role": "user",
"content": "Hi, how to write a Python function that prints \"Hyperstack is the greatest GPU Cloud platform\""
}
]
}'
If the API is not working after ~10 minutes, please refer to our 'Troubleshooting Qwen 2.5 Coder 32B Instruct section below.
If you are having any issues, you might need to restart your machine before calling the API:
Run sudo reboot
inside your VM
Wait 5-10 minutes for the VM to reboot
SSH into your VM
Wait ~3 minutes for the LLM API to boot up
Run the above API call again
If you are still having issues, try:
Run docker ps
and find the container_id of your API container
Run docker logs [container_id]
to see the logs of your container
Use the logs to debug any issues
Using a self-hosted LLM as a coding assistant can ensure full data privacy and control, keeping sensitive code on your infrastructure without third-party exposure. If you'd like to integrate this self-hosted LLM with VSCode for code completions and code assistant chats, follow the instructions below.
1. Open port 8000 on your machine: Follow the instructions [here] to open port 8000. Be aware that this will expose port 8000 to the public internet, allowing access to the dashboard via the public IP address and port number. If you prefer to limit access, you can configure your VM to restrict which IP addresses are permitted on port 8000.
2. Launch VSCode: Open your Visual Studio Code editor to proceed with the integration.
3. Install the 'Continue' extension: Go to the Extensions tab in VSCode and search for the 'Continue' extension. Install it to proceed with the setup.
4. Add the Chat model in the 'Continue' extension: On the left sidebar, click on the 'Continue' extension. At the top-left of the window, select ‘Add Chat model’ to start the configuration.
5. Modify the config file: In the configuration dialog, click the "config file" button at the bottom, which says, “This will update your config file.” This will allow you to edit the configuration file
6. Update the configs.json
file: In the configs.json
file, input your model information, replacing [public-ip]
it with the public IP address of your Hyperstack VM. Don't forget to save the file by pressing CMD + S.
"models": [
{
"provider": "openai",
"title": "vLLM hosted on Hyperstack",
"apiBase": "http://[public-ip]:8000/v1",
"model": "Qwen/Qwen2.5-Coder-32B-Instruct"
}
],
"tabAutocompleteModel": {
"provider": "openai",
"title": "vLLM hosted on Hyperstack",
"apiBase": "http://[public-ip]:8000/v1",
"model": "Qwen/Qwen2.5-Coder-32B-Instruct",
"apiKey": "None",
"completionOptions": {
"stop": ["<|endoftext|>", "\n"]
}
},
7. Interact with the self-hosted LLM: On the left sidebar, you'll now be able to chat with your self-hosted LLM. Refer to the attached image for a sample chat interface.
8. Accept code suggestions: When you receive code suggestions in the chat, click the '>' icon at the top of the suggestion. This will apply the changes to your file, highlighting them in green. An example of this is shown in the attached image.
9. Accept code insertions: If you want to accept a code insertion, simply click the 'Accept' label above the inserted code.
10. Enable auto-completion in VSCode settings: To use the auto-complete functionality, add the necessary configuration lines to your VSCode settings file. Open the settings by pressing CTRL + SHIFT + P and selecting settings.json
, then add the required lines for auto-complete.
"github.copilot.editor.enableAutoCompletions": false,
"editor.inlineSuggest.enabled": true,
"continue.enableTabAutocomplete": true
With these steps, you're all set to enjoy a fully integrated private coding assistant within VSCode, running on your infrastructure in Hyperstack. This setup ensures full control over your data while providing you with powerful AI-driven code suggestions and completions. We wish you all the best as you boost your development workflow with Qwen 2.5 Coder with Hyperstack.
When you're finished with your current workload, you can hibernate your VM to avoid incurring unnecessary costs:
To continue your work without repeating the setup process:
It’s a large language model optimised for complex code generation, repair, and multi-language support.
Simply follow the deployment steps in our Hyperstack guide for quick setup.
NVIDIA A100 and H100 GPUs are ideal for handling the demands of this model.
Yes, you can set up the model for code assistance in VSCode using Hyperstack integration steps.
Yes, but we recommend adding production-grade features like API keys and monitoring for optimal performance.
Similar Reads: