Meta has finally released its latest Llama 3.2 model designed to push the boundaries of generative AI. With improved inference capabilities and better scaling, this model is perfect for AI-driven applications across diverse platforms. It is important to note that the first official release of the Llama Stack which simplifies how developers use Llama models across various environments—single-node, on-prem, cloud, and on-device—allowing easy deployment of RAG and tools-enabled applications with built-in safety
We are excited to share this guide in which we'll walk you through how to deploy the Llama 3.2 11B model using the Llama Stack on Hyperstack!
Now, let's walk through the step-by-step process of deploying Llama 3.2 11B on Hyperstack.
Initiate Deployment
Select Hardware Configuration
Choose the Operating System
Select a keypair
Network Configuration
Enable SSH Access
Configure Additional Settings
Please note: this cloud-init script will only enable the API once for demo-ing purposes. For production environments, consider using containerization (e.g. Docker), secure connections, secret management, and monitoring for your API.
Review and Deploy
After deploying your VM, the cloud-init script will begin its work. This process typically takes about 5-10 minutes. During this time, the script performs several crucial tasks:
While waiting, you can prepare your local environment for SSH access and familiarise yourself with the Hyperstack dashboard.
Once the initialisation is complete, you can access your VM:
Locate SSH Details
Connect via SSH
To access and experiment with Meta's latest model, SSH into your machine after completing the setup. If you are having trouble connecting with SSH, watch our recent platform tour video (at 4:08) for a demo. Once connected, use this API call on your machine to start using the Llama 3.2:
IMAGE_URL="https://www.hyperstack.cloud/hs-fs/hubfs/deploy-vm-11-ecd8c53003182041d3a2881d0010f6c6-1.png?width=3352&height=1852&name=deploy-vm-11-ecd8c53003182041d3a2881d0010f6c6-1.png"
IMAGE_EXTENSION=$(echo "$IMAGE_URL" | awk -F. '{print $NF}' | cut -d'?' -f1)
FILE_NAME="/home/ubuntu/downloaded_image.$IMAGE_EXTENSION"
curl -o $FILE_NAME $IMAGE_URL
# Write the JSON payload to payload.json file
cat < payload.json
{
"model": "Llama3.2-11B-Vision-Instruct",
"messages": [
{
"role": "user",
"content": [
{
"image": {
"uri": "file://$FILE_NAME"
}
},
"Describe this image in two sentences"
]
}
]
}
EOF
# Use the JSON payload file in the curl command
curl -X POST http://localhost:8000/inference/chat_completion \
-H "Content-Type: application/json" \
-d @payload.json
If the API is not working after ~10 minutes, please refer to our 'Troubleshooting Llama 3.2 section below.
If you are having any issues, please follow the following instructions:
SSH into your VM.
Check the cloud-init logs with the following command: cat /var/log/cloud-init-output.log
When you're finished with your current workload, you can hibernate your VM to avoid incurring unnecessary costs:
Hyperstack is a cloud platform designed to accelerate AI and machine learning workloads. Here's why it's an excellent choice for deploying Llama 3.2:
Explore our tutorials on Deploying and Using Granite 3.0 and Notebook Llama
on Hyperstcak.