Updated on 30 Sep 2025

Deploy Llama-3.1 Nemotron-70B on Hyperstack: A Quick Start Guide

TABLE OF CONTENTS

NVIDIA A100 GPUs On-Demand

In our latest article, we explore how to deploy and use NVIDIA’s fine-tuned Llama 3.1 Nemotron-70B-Instruct model on Hyperstack. This model outperforms GPT-4o and Claude 3.5 Sonnet across key benchmarks and even solves the strawberry problem without additional reasoning tokens. We walk you through setting up a 4xNVIDIA A100-80G-PCIe VM, installing Ollama, and running the model via API. Plus, we cover cost-saving hibernation features to optimise your deployment. Ready to get started? Read the full guide and deploy Llama 3.1 today!

The buzz around Llama 3.1 is not over yet as NVIDIA recently released its fine-tuned Llama 3.1 Nemotron-70B-Instruct model. The model beat GPT-4o and Claude 3.5 Sonnet across key benchmarks. What’s more interesting is that the fine-tuned model can solve the strawberry problem without specialised prompting or additional reasoning tokens. For those working with models that struggle with coherence and reasoning, fine-tuning them on Hyperstack can significantly elevate your LLM performance.

Want to get started with the latest NVIDIA model? Check out our quick guide below to deploying and using the Llama 3.1 Nemotron-70B-Instruct model on Hyperstack.

Why Deploy on Hyperstack?

Hyperstack is a cloud platform designed to accelerate AI and machine learning workloads. Here's why it's an excellent choice for deploying Llama-3.1-Nemotron-70B-Instruct:

Availability: Hyperstack provides access to the latest and most powerful GPUs such as the NVIDIA A100 and the NVIDIA H100 SXM on-demand, specifically designed to handle large language models.
Ease of Deployment: With pre-configured environments and one-click deployments, setting up complex AI models becomes significantly simpler on our platform.
Scalability: You can easily scale your resources up or down based on your computational needs.
Cost-Effectiveness: You pay only for the resources you use with our cost-effective cloud GPU pricing.
Integration Capabilities: Hyperstack provides easy integration with popular AI frameworks and tools.

Deployment Process

Now, let's walk through the step-by-step process of deploying Llama-3.1-Nemotron-70B-Instruct on Hyperstack.

Step 1: Accessing Hyperstack

Go to the Hyperstack website and log in to your account.
If you're new to Hyperstack, you'll need to create an account and set up your billing information. Check our documentation to get started with Hyperstack.
Once logged in, you'll be greeted by the Hyperstack dashboard, which provides an overview of your resources and deployments.

Step 2: Deploying a New Virtual Machine

Initiate Deployment

Look for the "Deploy New Virtual Machine" button on the dashboard.
Click it to start the deployment process.

Select Hardware Configuration

In the hardware options, choose the "4xA100-80G-PCIe" flavour.
This configuration provides 4 NVIDIA A100 GPUs with 80GB memory each, connected via PCIe, offering exceptional performance for running Llama-3.1-Nemotron-70B.

Choose the Operating System

Select the "Server 22.04 LTS R535 CUDA 12.2".
This image comes pre-installed with Ubuntu 22.04 LTS and NVIDIA drivers (R535) along with CUDA 12.2, providing an optimised environment for AI workloads.

Select a keypair

Select one of the keypairs in your account. Don't have a keypair yet? See our Getting Started tutorial for creating one.

Network Configuration

Ensure you assign a Public IP to your Virtual machine.
This allows you to access your VM from the internet, which is crucial for remote management and API access.

Enable SSH Access

Make sure to enable an SSH connection.
You'll need this to securely connect and manage your VM.

Review and Deploy

Double-check all your settings.
Click the "Deploy" button to launch your virtual machine.

Step 3: Accessing Your VM

Once the initialisation is complete, you can access your VM:

Locate SSH Details

In the Hyperstack dashboard, find your VM's details.
Look for the public IP address, which you will need to connect to your VM with SSH.

Connect via SSH

Open a terminal on your local machine.
Use the command ssh -i [path_to_ssh_key] [os_username]@[vm_ip_address] (e.g: ssh -i /users/username/downloads/keypair_hyperstack ubuntu@0.0.0.0.0)
Replace username and ip_address with the details provided by Hyperstack.

Step 4: Running LLama-3.1-Nemotron-70B-instruct

After connecting with SSH, we will set up the LLM.
Run the following command inside the VM:

# Create models directory to ensure enough space
mkdir -p /ephemeral/models

# Download and install ollama
curl -fsSL https://ollama.com/install.sh | sh

# Set ollama models path
sudo cat < /etc/systemd/system/ollama.service.d/override.conf
[Service]
Environment="OLLAMA_HOST=0.0.0.0"
Environment="OLLAMA_MODELS=/ephemeral/models"
EOF

# Enable access for ollama
sudo chown -R ollama:ollama /ephemeral/models

# Reload ollama to apply changes
sudo systemctl daemon-reload
sudo systemctl restart ollama.service

# Pull and the model
ollama pull nemotron:70b-instruct-fp16

Please note: This tutorial will only enable the API once for demo-ing purposes. For production environments, consider using production inference services (e.g. vLLM), secure connections, secret management, monitoring for your API etc.

Interacting with Llama-3.1-Nemotron-70B-Instruct

To access and experiment with the LLM, SSH into your machine after completing the setup. If you are having trouble connecting with SSH, watch our recent platform tour video (at 4:08) for a demo. Once connected, use this API call on your machine to start using the Llama-3.1-Nemotron-70B-Instruct


# Talk to the model interactively
ollama run nemotron:70b-instruct-fp16

# Talk to the model via API
curl http://localhost:11434/api/chat -d '{
  "model": "nemotron:70b-instruct-fp16",
  "stream": false,
  "messages": [
    {
      "role": "user",
      "content": "How many letters R are there in strawberry?"
    }
  ]
}'

Step 5: Hibernating Your VM

When you're finished with your current workload, you can hibernate your VM to avoid incurring unnecessary costs:

In the Hyperstack dashboard, locate your Virtual machine.
Look for a "Hibernate" option.
Click to hibernate the VM, which will stop billing for compute resources while preserving your setup.

To continue your work without repeating the setup process:

Return to the Hyperstack dashboard and find your hibernated VM.
Select the "Resume" or "Start" option.
Wait a few moments for the VM to become active.
Reconnect via SSH using the same credentials as before.

Explore our tutorial on Deploying and Using Notebook Llama on Hyperstack.

Innovation, AI, Machine Learning, LLM, a100, High-Performance Computing (HPC)

Subscribe to Hyperstack!

Enter your email to get updates to your inbox every week

Get Started

Ready to build the next big thing in AI?

Talk to an expert

Share On Social Media

link

Deploying Qwen3-VL-30B-A3B-Instruct-FP8 on Hyperstack

15 Oct 2025

What is Qwen3-VL-30B-A3B-Instruct-FP8? Qwen3-VL-30B-A3B-Instruct-FP8 is a fine-tuned, ...

link

Deploy Qwen3-Next-80B-A3B on Hyperstack: A Step-by-Step ...

15 Sep 2025

What is Qwen3-Next-80B-A3B? Qwen3-Next-80B-A3B is one of the latest models in the ...

link

Running Flux Model on ComfyUI with NVIDIA H100 GPUs: A ...

15 Sep 2025

What is Flux? Flux is one of the latest and most powerful text-to-image diffusion models, ...

Deploy Llama-3.1 Nemotron-70B on Hyperstack: A Quick Start Guide

NVIDIA A100 GPUs On-Demand

Why Deploy on Hyperstack?

Deployment Process

Step 1: Accessing Hyperstack

Step 2: Deploying a New Virtual Machine

Step 3: Accessing Your VM

Step 4: Running LLama-3.1-Nemotron-70B-instruct

Interacting with Llama-3.1-Nemotron-70B-Instruct

Step 5: Hibernating Your VM

Subscribe to Hyperstack!

Get Started

Deploying Qwen3-VL-30B-A3B-Instruct-FP8 on Hyperstack

Deploy Qwen3-Next-80B-A3B on Hyperstack: A Step-by-Step ...

Running Flux Model on ComfyUI with NVIDIA H100 GPUs: A ...

United Kingdom (Head office)

Spain

Solutions

Site map

Products

Legal

Deploy Llama-3.1 Nemotron-70B on Hyperstack: A Quick Start Guide

NVIDIA A100 GPUs On-Demand

Why Deploy on Hyperstack?

Deployment Process

Step 1: Accessing Hyperstack

Step 2: Deploying a New Virtual Machine

Step 3: Accessing Your VM

Step 4: Running LLama-3.1-Nemotron-70B-instruct

Interacting with Llama-3.1-Nemotron-70B-Instruct

Step 5: Hibernating Your VM

Subscribe to Hyperstack!

Get Started

Related Post

Deploying Qwen3-VL-30B-A3B-Instruct-FP8 on Hyperstack

Deploy Qwen3-Next-80B-A3B on Hyperstack: A Step-by-Step ...

Running Flux Model on ComfyUI with NVIDIA H100 GPUs: A ...

United Kingdom (Head office)

Spain

Solutions

Site map

Products

Legal