How to Use Llama 3.3 70B: A Comprehensive Guide

Written by Sebastian Panman de Wit | Dec 18, 2024 10:58:18 AM

Meta has surprisingly released Llama 3.3, marking a major leap in open-source AI. Llama 3.3 is a 70-billion parameter instruction-tuned model optimised for text-only tasks. It performs exceptionally well in instruction following, coding and multilingual processing. But what's exciting is that Llama 3.3 offers 405B-level performance without breaking the bank. Want to get started? Check out the latest tutorial below to deploy the Llama 3.3 70B model on Hyperstack.

What is Llama 3.3?

Llama 3.3 is a 70-billion parameter model optimised for instruction-following and text-based tasks. It outperforms Llama 3.1 70B and Llama 3.2 90B and even competes with the larger Llama 3.1 405B in some tasks. Unlike earlier models, Llama 3.3 70B is only available in an instruction-optimised form and does not come in a pre-trained version.

Llama 3.3 Features

The latest Llama 3.3 comes with new capabilities, including:

Instruction Following: Excels in understanding and executing natural language instructions for task-based applications.
Multilingual Support: Llama 3.3 multilingual model handles multiple languages, performing well in multilingual reasoning tasks.
Improved Coding Proficiency: Enhanced code generation and debugging capabilities for developers.
Expanded Context: Processes up to 128k tokens, supporting larger datasets and longer documents.
Cost-Effective Performance: Provides 405B-level performance at a lower cost, ideal for budget-conscious developers.
Synthetic Data Generation: Generates synthetic data to address privacy and data scarcity challenges.

Steps to Deploy Llama 3.3 70B

Now, let's walk through the step-by-step process of deploying Llama 3.3 70B on Hyperstack.

Step 1: Accessing Hyperstack

Go to the Hyperstack website and log in to your account.
If you're new to Hyperstack, you'll need to create an account and set up your billing information. Check our documentation to get started with Hyperstack.
Once logged in, you'll be greeted by the Hyperstack dashboard, which provides an overview of your resources and deployments.

Step 2: Deploying a New Virtual Machine

Initiate Deployment

Look for the "Deploy New Virtual Machine" button on the dashboard.
Click it to start the deployment process.

Select Hardware Configuration

For Llama 3.3 70B GPU requirements, go to the hardware options and choose the "2xA100-80G-PCIe" flavour. This configuration provides 2 NVIDIA A100 GPU with 80GB GPU memory, connected via PCIe, offering exceptional performance for running Llama 3.3 70B.

Choose the Operating System

Select the "Ubuntu Server 22.04 LTS R535 CUDA 12.4 with Docker".

Select a keypair

Select one of the keypairs in your account. Don't have a keypair yet? See our Getting Started tutorial for creating one.

Network Configuration

Ensure you assign a Public IP to your Virtual machine [See the attached screenshot].
This allows you to access your VM from the internet, which is crucial for remote management and API access.

Enable SSH Access

Make sure to enable an SSH connection.
You'll need this to securely connect and manage your VM.

Configure Additional Settings

Look for an "Additional Settings" or "Advanced Options" section.
Here, you'll find a field for cloud-init scripts. This is where you'll paste the initialisation script. Click here to get the cloud-init script!
To use the Llama 3.3 70B model you need to:
1. Request access to the gated model here: https://huggingface.co/meta-llama/Llama-3.3-70B-Instruct
2. Create a HuggingFace token to access the gated model, see more info here.
3. Replace line 12 with their own HuggingFace token (see more details here)

Please note: this cloud-init script will only enable the API once for demo-ing purposes. For production environments, consider using secure connections, secret management, and monitoring for your API.

Review and Deploy

Double-check all your settings.
Click the "Deploy" button to launch your virtual machine.

Step 3: Initialisation and Setup

After deploying your VM, the cloud-init script will begin its work. This process typically takes about 5-10 minutes. During this time, the script performs several crucial tasks:

Dependencies Installation: Installs all necessary libraries and tools required to run Llama 3.3 70B.
Model Download: Fetches the Llama 3.3 70B model files from the specified repository.

While waiting, you can prepare your local environment for SSH access and familiarise yourself with the Hyperstack dashboard.

Step 4: Accessing Your VM

Once the initialisation is complete, you can access your VM:

Locate SSH Details

In the Hyperstack dashboard, find your VM's details.
Look for the public IP address, which you will need to connect to your VM with SSH.

Connect via SSH

Open a terminal on your local machine.
Use the command ssh -i [path_to_ssh_key] [os_username]@[vm_ip_address] (e.g: ssh -i /users/username/downloads/keypair_hyperstack ubuntu@0.0.0.0.0)
Replace username and ip_address with the details provided by Hyperstack.

Interacting with Llama 3.3 70B

To access and experiment with Meta's latest model, SSH into your machine after completing the setup. If you are having trouble connecting with SSH, watch our recent platform tour video (at 4:08) for a demo. Once connected, use this API call on your machine to start using the Llama 3.3 70B:

MODEL_NAME="meta-llama/Llama-3.3-70B-Instruct"
curl -X POST http://localhost:8000/v1/chat/completions \
    -H "Content-Type: application/json" \
    -d '{
        "model": "'$MODEL_NAME'",
        "messages": [
            {
                "role": "user",
                "content": "Hello, how are you?"
            }
        ]
    }'

If the API is not working after ~10 minutes, please refer to our 'Troubleshooting Llama 3.3 section below.

Troubleshooting Llama 3.3 70B

Step 5: Hibernating Your VM

When you're finished with your current workload, you can hibernate your VM to avoid incurring unnecessary costs:

In the Hyperstack dashboard, locate your Virtual machine.
Look for a "Hibernate" option.
Click to hibernate the VM, which will stop billing for compute resources while preserving your setup.

Why Deploy Llama 3.3 70B on Hyperstack?

Hyperstack is a cloud platform designed to accelerate AI and machine learning workloads. Here's why it's an excellent choice for deploying Llama 3.3 70B:

Availability: Hyperstack provides access to the latest and most powerful GPUs such as the NVIDIA H100 on-demand, specifically designed to handle large language models.
Ease of Deployment: With pre-configured environments and one-click deployments, setting up complex AI models becomes significantly simpler on our platform.
Scalability: You can easily scale your resources up or down based on your computational needs.
Cost-Effectiveness: You pay only for the resources you use with our cost-effective cloud GPU pricing.
Integration Capabilities: Hyperstack provides easy integration with popular AI frameworks and tools.

FAQs

What is Llama 3.3?

Llama 3.3 open source is a 70-billion-parameter instruction-tuned model optimised for text-based tasks such as instruction-following, multilingual support, and coding.

How is Llama 3.3 different from Llama 3.1 and Llama 3.2?

Llama 3.3 outperforms Llama 3.1 70B and Llama 3.2 90B in several tasks and provides performance comparable to Llama 3.1 405B but at a lower cost.

Can Llama 3.3 process long texts?

Yes, Llama 3.3 supports an expanded context of up to 128k tokens, making it capable of handling larger datasets and documents.

How do I deploy Llama 3.3 on Hyperstack?

You can deploy Llama 3.3 by launching a virtual machine with an NVIDIA A100 GPU, configuring the environment, and using cloud-init scripts for setup.

Why should I deploy Llama 3.3 on Hyperstack?

Hyperstack provides access to powerful GPUs like the NVIDIA A100, easy deployment, scalability, and cost-effective GPU pricing, making it ideal for running Llama 3.3.

Explore our tutorials on Deploying and Using Llama 3.2 and Llama 3.1

on Hyperstack.

View full post