TABLE OF CONTENTS
Updated: 16 Oct 2024
NVIDIA H100 GPUs On-Demand
As advanced LLMs like Llama 3.1-70B and Qwen2-72B scale in size and complexity, network efficiency becomes a bottleneck. Hyperstack’s recently released high-speed networking with SR-IOV brings a new dimension to addressing these challenges. SR-IOV allows multiple virtual machines (VMs) to share the same physical NIC (network interface card) while maintaining high-speed and low-latency communication. Continue reading this blog as we explore how SR-IOV improves fine-tuning and inference for LLM workloads.
Challenges in Multi-Node LLM Fine-Tuning
When fine-tuning LLMs across multiple nodes, one of the primary challenges is data transfer efficiency between virtual machines. Traditional networking setups, such as VirtIO, which delivers speeds of 10 Gbps are not sufficient for the high-volume and low-latency data exchanges that large language models require. In a distributed fine-tuning setup, model weights, gradients and datasets need to be transferred between nodes continuously. This often results in bottlenecks, with nodes waiting for data from others which increases total training time. Similarly, AI inference at scale also suffers from these limitations, especially when models are deployed across multiple VMs to handle large-scale traffic.
How SR-IOV Enhances Inter-VM Network Speeds
SR-IOV resolves many of these networking challenges by enabling direct hardware access for VMs, bypassing the software layer that slows down traditional network virtualisation like VirtIO. This direct access boosts network throughput that allows data transfers between VMs to happen at speeds up to 350 Gbps- a stark difference to that of 10 Gbps of VirtIO, as seen below in the Iperf tests conducted within Hyperstack’s environment.
According to the benchmarking figures:
- VirtIO (VM with virtio-net vNIC): Peaks at 10.5 Gbps with 1-thread Iperf tests.
- SR-IOV (VM with SR-IOV VFLag NIC): Starts at 37.1 Gbps with 1-thread, ramping up to 349 Gbps with 24-thread tests.
This increase in network speeds makes SR-IOV a huge turning point for multi-node LLM fine-tuning tasks where VMs constantly exchange large amounts of data.
SR-IOV Benefits for Multi-Node Fine-Tuning of LLMs
The key benefits of multi-node fine-tuning of LLMs are:
Faster Data Transfer Across VMs
The multi-thread performance of SR-IOV significantly reduces inter-node communication delays. During LLM fine-tuning, where each node in the cluster trains on a subset of data, the ability to share updates, gradients and model checkpoints faster between VMs cuts down training time and allows more efficient scaling across nodes.
For instance, in scenarios where hyperparameters are continuously adjusted or where model updates need to be synchronised across multiple GPUs, the quick data movement offered by SR-IOV reduces waiting periods between nodes, accelerating the overall fine-tuning process.
Reduced Bottlenecks in Multi-Node Training
By eliminating the bottlenecks that arise from slow network communication, SR-IOV enhances data flow between nodes, even in complex LLM architectures. Traditional VirtIO setups cause significant delays when training large models like GPT or Llama across multiple VMs, as data transmission between these nodes is a critical component of large AI model training efficiency.
Efficient Resource Utilisation for Large-Scale Models
One key advantage of SR-IOV is its ability to more efficiently allocate network and compute resources across VMs. SR-IOV allows multiple VMs to share the same physical NIC without compromising on bandwidth or latency. This ensures that network speeds remain optimal even when multiple models or datasets are being processed concurrently. If you are working with distributed LLMs, this results in better utilisation of GPU resources and faster model convergence.
Impact on Inference
While network speed is often associated with multi-node training, SR-IOV also holds potential benefits for LLM inference workloads, especially in distributed setups. Although the distance largely influences inference latency and connection from the data centre to the user, SR-IOV’s ability to accelerate inference calculations by reducing data transfer times between VMs can indirectly improve response times.
In distributed inference, where model partitions or ensemble models run on different VMs, faster data sharing between VMs means the model can respond quickly to a user query. While this doesn’t directly reduce network latency to the user, it ensures that the processing of the request within the data centre happens faster which could lead to quicker responses.
Conclusion
The release of our on-demand high-speed networking with SR-IOV is a significant upgrade for our users, offering high-performance networking to handle even the most demanding workloads like fine-tuning LLMs. With speeds up to 350 Gbps, SR-IOV can accelerate multi-node training and inference for faster data transfer and reduced bottlenecks. As we continue to innovate, we remain committed to introducing new features that help optimise AI workloads and drive more efficient and scalable solutions for our users.
Did you miss our previous parts? Give them a read today👇
- Getting Started with SR-IOV for High-Speed Networking on Hyperstack
- High-Performing Cloud Applications with SR-IOV: Learn About Hyperstack’s High-Speed Networking
FAQs
How does SR-IOV improve LLM fine-tuning?
SR-IOV reduces data transfer times between nodes, accelerating multi-node fine-tuning processes.
Can SR-IOV reduce inference latency?
While it doesn't directly reduce user-to-data centre latency, it speeds up inference calculations by improving inter-VM data flow.
Which GPUs support SR-IOV on Hyperstack?
SR-IOV is available on Hyperstack GPUs like NVIDIA H100 PCIe, NVIDIA H100 PCIe with NVLink and NVIDIA A100 with NVLink.
Subscribe to Hyperstack!
Enter your email to get updates to your inbox every week
Get Started
Ready to build the next big thing in AI?