Published on 2 Oct 2024

Improving LLM Fine-Tuning and Inference with High-Speed Networking

TABLE OF CONTENTS

Updated: 21 Feb 2025

NVIDIA H100 SXM On-Demand

In our latest article, we explore how Hyperstack’s high-speed networking with SR-IOV addresses the growing network bottlenecks in fine-tuning and inference for large language models like Llama 3.1-70B and Qwen2-72B. SR-IOV significantly improves inter-VM data transfer, reaching speeds up to 350 Gbps—far surpassing traditional VirtIO’s 10 Gbps. This enhancement accelerates multi-node LLM training by reducing bottlenecks, ensuring efficient resource utilisation, and optimising inference workloads. Read the full article to learn how SR-IOV enhances AI performance on Hyperstack.

As advanced LLMs like Llama 3.1-70B and Qwen2-72B scale in size and complexity, network efficiency becomes a bottleneck. Hyperstack’s recently released high-speed networking with SR-IOV brings a new dimension to addressing these challenges. SR-IOV allows multiple virtual machines (VMs) to share the same physical NIC (network interface card) while maintaining high-speed and low-latency communication. Continue reading this blog as we explore how SR-IOV improves fine-tuning and inference for LLM workloads.

Challenges in Multi-Node LLM Fine-Tuning

When fine-tuning LLMs across multiple nodes, one of the primary challenges is data transfer efficiency between virtual machines. Traditional networking setups, such as VirtIO, which delivers speeds of 10 Gbps are not sufficient for the high-volume and low-latency data exchanges that large language models require. In a distributed fine-tuning setup, model weights, gradients and datasets need to be transferred between nodes continuously. This often results in bottlenecks, with nodes waiting for data from others which increases total training time. Similarly, AI inference at scale also suffers from these limitations, especially when models are deployed across multiple VMs to handle large-scale traffic.

How SR-IOV Enhances Inter-VM Network Speeds

SR-IOV resolves many of these networking challenges by enabling direct hardware access for VMs, bypassing the software layer that slows down traditional network virtualisation like VirtIO. This direct access boosts network throughput that allows data transfers between VMs to happen at speeds up to 350 Gbps- a stark difference to that of 10 Gbps of VirtIO, as seen below in the Iperf tests conducted within Hyperstack’s environment.

SR-IOV Benchmarks

According to the benchmarking figures:

VirtIO (VM with virtio-net vNIC): Peaks at 10.5 Gbps with 1-thread Iperf tests.
SR-IOV (VM with SR-IOV VFLag NIC): Starts at 37.1 Gbps with 1-thread, ramping up to 349 Gbps with 24-thread tests.

This increase in network speeds makes SR-IOV a huge turning point for multi-node LLM fine-tuning tasks where VMs constantly exchange large amounts of data.

SR-IOV Benefits for Multi-Node Fine-Tuning of LLMs

The key benefits of multi-node fine-tuning of LLMs are:

Faster Data Transfer Across VMs

The multi-thread performance of SR-IOV significantly reduces inter-node communication delays. During LLM fine-tuning, where each node in the cluster trains on a subset of data, the ability to share updates, gradients and model checkpoints faster between VMs cuts down training time and allows more efficient scaling across nodes.

For instance, in scenarios where hyperparameters are continuously adjusted or where model updates need to be synchronised across multiple GPUs, the quick data movement offered by SR-IOV reduces waiting periods between nodes, accelerating the overall fine-tuning process.

Reduced Bottlenecks in Multi-Node Training

By eliminating the bottlenecks that arise from slow network communication, SR-IOV enhances data flow between nodes, even in complex LLM architectures. Traditional VirtIO setups cause significant delays when training large models like GPT or Llama across multiple VMs, as data transmission between these nodes is a critical component of large AI model training efficiency.

Efficient Resource Utilisation for Large-Scale Models

One key advantage of SR-IOV is its ability to more efficiently allocate network and compute resources across VMs. SR-IOV allows multiple VMs to share the same physical NIC without compromising on bandwidth or latency. This ensures that network speeds remain optimal even when multiple models or datasets are being processed concurrently. If you are working with distributed LLMs, this results in better utilisation of GPU resources and faster model convergence.

Impact on Inference

While network speed is often associated with multi-node training, SR-IOV also holds potential benefits for LLM inference workloads, especially in distributed setups. Although the distance largely influences inference latency and connection from the data centre to the user, SR-IOV’s ability to accelerate inference calculations by reducing data transfer times between VMs can indirectly improve response times.

In distributed inference, where model partitions or ensemble models run on different VMs, faster data sharing between VMs means the model can respond quickly to a user query. While this doesn’t directly reduce network latency to the user, it ensures that the processing of the request within the data centre happens faster which could lead to quicker responses.

Conclusion

The release of our on-demand high-speed networking with SR-IOV is a significant upgrade for our users, offering high-performance networking to handle even the most demanding workloads like fine-tuning LLMs. With speeds up to 350 Gbps, SR-IOV can accelerate multi-node training and inference for faster data transfer and reduced bottlenecks. As we continue to innovate, we remain committed to introducing new features that help optimise AI workloads and drive more efficient and scalable solutions for our users.

Did you miss our previous parts? Give them a read today👇

FAQs

How does SR-IOV improve LLM fine-tuning?

SR-IOV reduces data transfer times between nodes, accelerating multi-node fine-tuning processes.

Can SR-IOV reduce inference latency?

While it doesn't directly reduce user-to-data centre latency, it speeds up inference calculations by improving inter-VM data flow.

Which GPUs support SR-IOV on Hyperstack?

SR-IOV is available on Hyperstack GPUs like NVIDIA H100 PCIe, NVIDIA H100 PCIe with NVLink and NVIDIA A100 with NVLink.

Innovation, AI, Machine Learning, LLM, NLP, Gen AI, a100, Deep Learning, H100

Subscribe to Hyperstack!

Enter your email to get updates to your inbox every week

Get Started

Ready to build the next big thing in AI?

Talk to an expert

Share On Social Media

link

From Startup to Scale-Up: How Cloud GPUs Drive AI Growth

2 Jun 2025

You have a great idea with the right vision but sometimes your infrastructure can be a ...

link

Training LLMs? Here's Why Startups Lose Money Fast

23 May 2025

The recent surge of open-source LLMs like Meta’s Llama models and Mistral AI’s Mistral 7B ...

link

5 Best GPUs for AI in 2025: All You Need to Know

22 May 2025

With new releases now and then, AI models get bigger, training gets more complex and ...

Improving LLM Fine-Tuning and Inference with High-Speed Networking

NVIDIA H100 SXM On-Demand

Challenges in Multi-Node LLM Fine-Tuning

How SR-IOV Enhances Inter-VM Network Speeds

SR-IOV Benefits for Multi-Node Fine-Tuning of LLMs

Faster Data Transfer Across VMs

Reduced Bottlenecks in Multi-Node Training

Efficient Resource Utilisation for Large-Scale Models

Impact on Inference

Conclusion

Did you miss our previous parts? Give them a read today👇

FAQs

How does SR-IOV improve LLM fine-tuning?

Can SR-IOV reduce inference latency?

Which GPUs support SR-IOV on Hyperstack?

Subscribe to Hyperstack!

Get Started

From Startup to Scale-Up: How Cloud GPUs Drive AI Growth

Training LLMs? Here's Why Startups Lose Money Fast

5 Best GPUs for AI in 2025: All You Need to Know

United Kingdom (Head office)

Spain

Solutions

Site map

Products

Legal

Improving LLM Fine-Tuning and Inference with High-Speed Networking

NVIDIA H100 SXM On-Demand

Challenges in Multi-Node LLM Fine-Tuning

How SR-IOV Enhances Inter-VM Network Speeds

SR-IOV Benefits for Multi-Node Fine-Tuning of LLMs

Faster Data Transfer Across VMs

Reduced Bottlenecks in Multi-Node Training

Efficient Resource Utilisation for Large-Scale Models

Impact on Inference

Conclusion

Did you miss our previous parts? Give them a read today👇

FAQs

How does SR-IOV improve LLM fine-tuning?

Can SR-IOV reduce inference latency?

Which GPUs support SR-IOV on Hyperstack?

Subscribe to Hyperstack!

Get Started

Related Post

From Startup to Scale-Up: How Cloud GPUs Drive AI Growth

Training LLMs? Here's Why Startups Lose Money Fast

5 Best GPUs for AI in 2025: All You Need to Know

United Kingdom (Head office)

Spain

Solutions

Site map

Products

Legal