<img alt="" src="https://secure.insightful-enterprise-intelligence.com/783141.png" style="display:none;">

Access NVIDIA H100s from just $2.06/hour. Reserve here

Deploy 8 to 16,384 NVIDIA H100 SXM GPUs on the AI Supercloud. Learn More

|

Published on 25 Sep 2024

Leveraging high-speed networking for high-performance cloud applications

TABLE OF CONTENTS

updated

Updated: 16 Oct 2024

NVIDIA H100 GPUs On-Demand

Sign up/Login

With the public cloud market expected to reach over $1 trillion by 2026, optimising cloud infrastructure for performance, scalability and low latency is imperative. This is where SR-IOV (Single Root I/O Virtualisation) comes in as a game-changing technology that enhances cloud networking to deliver exceptional throughput and reduced latency so your cloud applications can handle even the most demanding tasks.

Learn how our high-speed networking with SR-IOV offers a lightning-fast solution for cloud applications that require optimised inter-VM bandwidth and superior network performance.

Hyperstack’s High-Speed Networking with SR-IOV

At Hyperstack, we’ve recently released our on-demand high-speed networking with SR-IOV, available in select network-optimised environments i.e. CANADA. We use a technology called SR-IOV VFlag, which enables active-active bonding of dual physical network ports, allowing them to be shared as a single virtual NIC across multiple VMs on the same host. The network traffic is hardware offloaded, ensuring high performance and reduced CPU overhead. This setup provides customers with a simplified, single virtual NIC configuration that delivers high performance and resiliency through the bonded physical network ports.

Using SR-IOV, our GPU VMs can achieve inter-VM bandwidth of up to 350 Gbps, making it ideal for the most demanding network-intensive applications.

Performance Benchmarks

Let’s look at the actual performance gains SR-IOV offers over traditional networking setups. In a benchmark test using iPerf to measure network throughput, the difference between legacy virtio-net and SR-IOV is mind-blowing:

Threads

Virtio-Net vNIC Throughput

SR-IOV VF NIC Throughput

1-thread

10.5 Gbps

37.1 Gbps

8-threads

8.5 Gbps

199 Gbps

16-threads

8.4 Gbps

290 Gbps

24-threads

8.2 Gbps

349 Gbps

From the numbers, you can see that SR-IOV offers a massive leap in performance, particularly when scaled across multiple threads. At 24 threads, SR-IOV achieves an impressive 350 Gbps, out of a 400 Gbps theoretical maximum, compared to just 8.2 Gbps with the virtio-net configuration. This kind of bandwidth is imperative for businesses that rely on inter-VM traffic for their cloud-based AI training or high-performance simulations.

Supported GPUs 

Hyperstack offers a range of GPUs that support SR-IOV-powered networking, including the NVIDIA A100 PCIe with NVLink, NVIDIA H100 PCIe, NVIDIA H100 PCIe with NVLink and the NVIDIA H100 SXM. These GPUs leverage SR-IOV via PCIe enabling direct access to the underlying hardware and minimise latency. For even faster inter-GPU communication, our NVLink boosts performance, making these GPUs an excellent choice for workloads that require low-latency data transfers between multiple GPUs. 

Benefits of SR-IOV for High-Performance Cloud Applications

The real business value of a cloud application is not just about performance- it's about scalability, efficiency and cost-effectiveness. SR-IOV delivers on all these fronts, providing an ideal solution for businesses that need to scale while maintaining top performance.

Check out the benefits of using high-speed networking with SR-IOV for your cloud applications:

  • Scalability and Efficiency: In cloud environments, networking is often the limiting factor when scaling workloads. SR-IOV enables businesses to seamlessly scale their applications by offering high bandwidth and low latency between virtual machines without the need for expensive and complex hardware configurations. This allows for greater flexibility and better use of cloud resources, helping you stay agile as your business grows.
  • Cost Efficiency: With SR-IOV, fewer resources are needed to handle the same workload. By reducing the networking overhead and increasing throughput, businesses can run the same tasks with fewer VMs or less powerful hardware, leading to reduced operational costs. This is particularly beneficial for organisations managing large-scale AI, ML or HPC workloads where both performance and cost are key considerations.
  • Increased Reliability: Reliability is non-negotiable for mission-critical applications, whether in healthcare, finance or manufacturing. SR-IOV ensures consistent and high-speed performance to reduce the likelihood of downtime or performance degradation. This makes it ideal for applications where network latency and bandwidth directly impact end-user experiences or business outcomes.

Use Cases of SR-IOV in Cloud Applications

Here are the use cases of SR-IOV in cloud applications:

AI/ML Training and Inference

SR-IOV is particularly beneficial for AI and ML workloads where data needs to move quickly and reliably between compute nodes. Training models like GPT-4 or Llama 3.1 which require massive datasets and substantial computational resources become far more efficient with the high throughput and low latency provided by SR-IOV. Inference, where models are deployed in real-time applications also benefits from enhanced performance with SR-IOV.

High-Performance Computing (HPC)

Industries that rely on High-Performance Computing such as scientific research, oil and gas simulations and financial modelling require fast data movement across multiple compute nodes. SR-IOV’s ability to deliver up to 350 Gbps of inter-VM bandwidth makes it the ideal solution for businesses that need to run complex simulations or analyse large datasets in real-time.

Data Analytics and Simulations

With the rise of big data, businesses need to process massive amounts of information quickly and accurately. SR-IOV enables fast data transfer across VMs, ensuring that large-scale simulations and real-time data analytics run without interruption. For example, eCommerce platforms that rely on real-time analytics to personalise customer experiences can use SR-IOV to deliver these insights at a moment’s notice.

Conclusion

At Hyperstack, we always aim to bring the best solutions to help your projects thrive. With our SR-IOV release, we hope your cloud applications experience unparalleled network performance, from reduced latency to enhanced scalability. Stay tuned for the final part of our SR-IOV series, where we’ll explore optimising SR-IOV for AI and LLM applications- coming next week!

Missed our first part? Give it a read today👇

Getting Started with SR-IOV for High-Speed Networking on Hyperstack

FAQs

What is SR-IOV?

SR-IOV (Single Root I/O Virtualisation) enables multiple VMs to share a single physical NIC for faster network performance.

Which GPUs support SR-IOV on Hyperstack?

Hyperstack supports SR-IOV on GPUs like the NVIDIA A100 PCIe with NVLink, NVIDIA H100 PCIe, NVIDIA H100 PCIe with NVLink and NVIDIA H100 SXM for high-speed networking.

What is the maximum inter-VM bandwidth with SR-IOV on Hyperstack?

With SR-IOV, Hyperstack VMs can achieve inter-VM bandwidth of up to 350 Gbps.

How does SR-IOV improve AI/ML workloads?

SR-IOV reduces latency and boosts throughput, accelerating AI/ML training and inference tasks.

Is SR-IOV available in all Hyperstack regions?

SR-IOV is available in select Hyperstack environments, such as the CANADA-1 region.

Subscribe to Hyperstack!

Enter your email to get updates to your inbox every week

Get Started

Ready to build the next big thing in AI?

Sign up now
Talk to an expert

Share On Social Media

Hyperstack - Case Studies link

18 Nov 2024

While Docker and Kubernetes are integral to container-based applications, they serve ...

Hyperstack - Case Studies link

7 Nov 2024

Intensive AI workloads like training or fine-tuning advanced LLMs like Llama 3.2 11B, ...

Hyperstack - Case Studies link

2 Oct 2024

As advanced LLMs like Llama 3.1-70B and Qwen2-72B scale in size and complexity, network ...