Hyperstack - Tutorials

Deploying and Using Granite 3.0 8B on Hyperstack: A Quick Start Guide

Written by Sebastian Panman de Wit | Oct 23, 2024 8:38:30 AM

IBM has just released Granite 3.0 8B, the latest in its series of LLMs designed for enterprise use. The Granite 3.0 8B Instruct is a dense decoder-only model trained on over 12 trillion tokens. It rivals advanced models from Meta and Mistral AI on Hugging Face’s OpenLLM Leaderboard, delivering exceptional performance in benchmarks for enterprise tasks, speed and safety. With an open-source evaluation methodology and a focus on practical application, Granite 3.0 8B Instruct is ideal for sophisticated workflows and tool-based solutions.

Want to get started with the Granite 3.0? Check out our quick guide below to deploying and using the Granite 3.0 on Hyperstack.

Why Deploy on Hyperstack?

Hyperstack is a cloud platform designed to accelerate AI and machine learning workloads. Here's why it's an excellent choice for deploying Granite 3.0:

  • Availability: Hyperstack provides access to the latest and most powerful GPUs such as the NVIDIA A100 and the NVIDIA H100 SXM on-demand, specifically designed to handle large language models. 
  • Ease of Deployment: With pre-configured environments and one-click deployments, setting up complex AI models becomes significantly simpler on our platform. 
  • Scalability: You can easily scale your resources up or down based on your computational needs.
  • Cost-Effectiveness: You pay only for the resources you use with our cost-effective cloud GPU pricing
  • Integration Capabilities: Hyperstack provides easy integration with popular AI frameworks and tools.

Deployment Process

Now, let's walk through the step-by-step process of deploying Granite 3.0 on Hyperstack.

Step 1: Accessing Hyperstack

  1. Go to the Hyperstack website and log in to your account.
  2. If you're new to Hyperstack, you'll need to create an account and set up your billing information. Check our documentation to get started with Hyperstack.
  3. Once logged in, you'll be greeted by the Hyperstack dashboard, which provides an overview of your resources and deployments.

Step 2: Deploying a New Virtual Machine

Initiate Deployment

  1. Look for the "Deploy New Virtual Machine" button on the dashboard.
  2. Click it to start the deployment process.

Select Hardware Configuration

  1. In the hardware options, choose the "1xL40" flavour.
  2. This configuration provides 1 NVIDIA L40 GPU with 48GB memory offering exceptional performance for running Granite 3.0.

Choose the Operating System

  1. Select the "Server 22.04 LTS R535 CUDA 12.2".
  2. This image comes pre-installed with Ubuntu 22.04 LTS and NVIDIA drivers (R535) along with CUDA 12.2, providing an optimised environment for AI workloads.

Select a keypair

  1. Select one of the keypairs in your account. Don't have a keypair yet? See our Getting Started tutorial for creating one.

Network Configuration

  1. Ensure you assign a Public IP to your Virtual machine.
  2. This allows you to access your VM from the internet, which is crucial for remote management and API access.

Enable SSH Access

  1. Make sure to enable an SSH connection.
  2. You'll need this to securely connect and manage your VM.

Review and Deploy

  1. Double-check all your settings.
  2. Click the "Deploy" button to launch your virtual machine.

Step 3: Accessing Your VM

Once the initialisation is complete, you can access your VM:

Locate SSH Details

  1. In the Hyperstack dashboard, find your VM's details.
  2. Look for the public IP address, which you will need to connect to your VM with SSH.

Connect via SSH

  1. Open a terminal on your local machine.
  2. Use the command ssh -i [path_to_ssh_key] [os_username]@[vm_ip_address] (e.g: ssh -i /users/username/downloads/keypair_hyperstack ubuntu@0.0.0.0.0)
  3. Replace username and ip_address with the details provided by Hyperstack.

Step 4: Running Granite 3.0

  1. After connecting with SSH, we will set up the LLM.
  2. Run the following command inside the VM:

# Download and install ollama
curl -fsSL https://ollama.com/install.sh | sh

# Pull and the model
ollama pull granite3-dense:8b-instruct-fp16
 
Please note: This tutorial will only enable the API once for demo-ing purposes. For production environments, consider using production inference services (e.g. vLLM), secure connections, secret management, monitoring for your API etc.
 
 

Interacting with Granite 3.0

To access and experiment with the LLM, SSH into your machine after completing the setup. If you are having trouble connecting with SSH, watch our recent platform tour video (at 4:08) for a demo. Once connected, use this API call on your machine to start using the Granite 3.0.


# Talk to the model interactively
ollama run granite3-dense:8b-instruct-fp16


# Talk to the model via API
curl http://localhost:11434/api/chat -d '{
  "model": "granite3-dense:8b-instruct-fp16",
  "stream": false,
  "messages": [
    {
      "role": "user",
      "content": "Tell me about GPUs"
    }
  ]
}'

Step 5: Hibernating Your VM

When you're finished with your current workload, you can hibernate your VM to avoid incurring unnecessary costs:

  1. In the Hyperstack dashboard, locate your Virtual machine.
  2. Look for a "Hibernate" option.
  3. Click to hibernate the VM, which will stop billing for compute resources while preserving your setup.

To continue your work without repeating the setup process:

  1. Return to the Hyperstack dashboard and find your hibernated VM.
  2. Select the "Resume" or "Start" option.
  3. Wait a few moments for the VM to become active.
  4. Reconnect via SSH using the same credentials as before.

Want to get started with Llama 3.2? Check out our tutorial below!

Deploying and Using Llama 3.2 11B on Hyperstack: A Quick Start Guide