<img alt="" src="https://secure.insightful-enterprise-intelligence.com/783141.png" style="display:none;">

NVIDIA H100 SXMs On-Demand at $3.00/hour - Reserve from just $2.10/hour. Reserve here

Deploy 8 to 16,384 NVIDIA H100 SXM GPUs on the AI Supercloud. Learn More

|

Published on 29 Jul 2024

NexGen Cloud's Billing System: Our Latest Update Behind the Scenes

TABLE OF CONTENTS

updated

Updated: 29 Jul 2024

At NexGen Cloud, we are committed to providing transparency and growing together. Today, as we share our latest product update, we aim to maintain transparency with our customers. Our latest update is about enhancing the existing billing system. This was more than just an upgrade, it was a comprehensive modification effort to replace the existing billing system with a robust, scalable and transparent billing infrastructure. The journey began just as we realised that our existing system was no longer keeping pace with our rapid growth and customer needs. Let us guide you through the entire journey including the challenges, decisions and solutions, all while keeping our customers' needs on the top.

Motivation Behind the Change

Our cloud platform has been experiencing substantial and ongoing growth, including a rising number of customers and transactions. While this growth is exciting, it has also exposed critical limitations affecting our customers. 

  • Accuracy Issues: Billing processes would stop during outages, leading to data loss and inaccuracies.
  • Scalability Concerns: The system wasn’t equipped to handle parallel workloads, complicating efforts to scale.
  • Limited Visibility: Both customers and administrators had restricted access to billing information, hindering transparency.
  • Manual Processes: Invoicing was a time-consuming and manual process prone to errors.

These challenges went beyond operational headaches, affecting our ability to deliver the high level of service our customers expect. As soon as we identified this, we knew immediate action was necessary. 

Our Decision-Making Process

To tackle these challenges, we did a comprehensive review of our billing system and set clear objectives:

  1. Enhance billing accuracy to prevent revenue loss.
  2. Increase efficiency and reduce manual efforts.
  3. Provide greater visibility to customers and administrators.
  4. Optimise scalability for future growth.
  5. Strengthen system resilience.
  6. Improve the overall user experience.

Understanding New Architecture 

With these objectives in mind, we assessed various technologies and architectures to address our needs. After careful analysis and discussion, we chose a multi-faceted approach that leverages modern technologies while ensuring a seamless transition from our legacy system.

Time-Series Database Integration

A major upgrade was the integration of Gnocchi, a time-series database designed for efficient storage and querying of time-series data. Here’s why we chose Gnocchi:

  • Efficient Data Storage: Gnocchi allows us to store historical billing records at various granularities, significantly reducing storage requirements compared to our previous per-minute approach.
  • Fast Querying: Its optimised query capabilities enable quick retrieval of historical data, enhancing our reporting and analysis capabilities.
  • Scalability: Gnocchi is designed to handle large volumes of time-series data and can scale horizontally, aligning with our growth projections.
  • OpenStack Integration: As part of the OpenStack ecosystem, Gnocchi integrates seamlessly with other services like Ceilometer and Aodh, providing a comprehensive monitoring solution.

Enhanced Billing Engine Flow

We revamped our billing calculation process to increase resilience and scalability:

  • Fetch Data Process: This runs every minute, collecting data from our Pricebook API, Core API, and billing database. It partitions organisation data into smaller chunks for distributed processing.
  • Calculate Task: Implemented as a RabbitMQ listener, this process calculates consumption costs based on current pricing schemas. It's designed for parallel execution, improving scalability.
  • Credit Sync: Another RabbitMQ-driven process that handles credit deductions and balance updates. It interacts with our Core API to manage resource hibernation when balances are exhausted.
  • Actions Triggering: This scheduled task manages credit balance warnings and auto top-up features, integrating with our Auth API for notifications.
  • History Export: A dedicated process exports billing metrics to Gnocchi for long-term storage and analysis.

Improved APIs and User Interfaces

New RESTful APIs developed with Flask offer a standardised interface for billing data stored in MySQL and Gnocchi. This enables: 

  • Automated customer invoicing
  • Enhanced traceability with new transaction logs
  • Improved accuracy and transparency

Database Architecture

We adopted a dual-database approach:

  • MySQL (BillingDB): For core billing data including customer details and payment history.
  • Gnocchi TSDB: For time-series data, enabling historical analysis of billing metrics and system performance.

Message Queue Implementation

We used RabbitMQ as our message broker to facilitate asynchronous communication between different components of our billing system. This allows for:

  • Decoupled Architecture: Enhances scalability.
  • Reliable Delivery: Ensures message integrity.
  • High-Volume Handling: Manages large billing events efficiently.

The Delivery: From Vision to Reality

Implementing a huge change like this required careful execution. Hence, Our phased approach to implementation involved:

  • Detailed requirements gathering and scope definition
  • System architecture design with a focus on scalability
  • Data migration and integration planning
  • Implementation of new billing calculation procedures
  • Development and integration of TSDB and new APIs
  • Rigorous testing, including unit, integration, and end-to-end tests
  • User acceptance testing with selected customers
  • Refinement based on feedback
  • Soft launch with full feature enablement for all users

Throughout the process, we kept our customers informed and incorporated their feedback into our final implementation.

Technical Challenges and Solutions

Transitioning to a Time-Series Database and implementing asynchronous processing posed significant challenges. Our solutions included:

  1. Data Migration: We developed a custom ETL process to migrate historical data from our relational database to Gnocchi, ensuring data integrity and consistency.
  2. Asynchronous Processing: We redesigned our billing calculation logic to work with RabbitMQ, allowing for distributed and parallel processing of billing events.
  3. API Redesign: Our new Flask-based API was built with scalability in mind, using connection pooling and caching mechanisms to handle increased load.

The Results

Our new billing system has achieved impressive results:

  • Accuracy: Billing accuracy has reached 99.9%.
  • Scalability: The system now handles our growing customer base and transaction volume effectively.
  • Transparency: Enhanced visibility into billing information for customers and administrators.
  • Automated Processes: Manual invoicing has been automated, reducing errors and saving resources.
  • Increased Efficiency: The use of TSDB has significantly reduced data retrieval times and improved overall system performance.

What’s Next?

Our goal is to grow together with our customers. This revised billing system is more than just a technical upgrade, it is aimed at our commitment to transparency and customer satisfaction. By sharing this journey with you, we aim to demonstrate our dedication to continuous improvement and our desire to grow alongside our customers.

As we look to the future, we are excited about the new possibilities this system opens up. We are already planning further improvements, including:

  • Enhanced business intelligence capabilities
  • More advanced revenue attribution features

We encourage our valued customers to keep providing feedback as we keep upgrading and expanding our billing capabilities. Together, we can ensure our billing system not only meets your present needs but also identifies future challenges. Thank you for being part of this journey. 

FAQs

What was the main reason for the billing system overhaul?

We wanted to address limitations in accuracy, scalability, visibility and efficiency as our customer base grew.

What technology did you integrate for efficient data storage?

We integrated Gnocchi, a time-series database, for efficient data storage and querying.

How does the new system improve billing accuracy?

The new system has reduced data loss and inaccuracies, achieving 99.9% billing accuracy.

What benefits do the new RESTful APIs provide?

They enable automated invoicing, enhanced traceability, and improved billing transparency.

How does RabbitMQ enhance the billing system's performance?

RabbitMQ facilitates asynchronous communication, ensuring scalable, reliable, and efficient handling of billing events.



Subscribe to Hyperstack!

Enter your email to get updates to your inbox every week

Get Started

Ready to build the next big thing in AI?

Sign up now
Talk to an expert

Share On Social Media

16 Dec 2024

Welcome to Hyperstack Weekly! This week's edition will be short and sweet. As we wrap up ...

12 Dec 2024

Hello and welcome to the Hyperstack Weekly! We've got major news for you this week, from ...

2 Dec 2024

Welcome to the Hyperstack Weekly! We’re excited to bring you this week’s highlights, ...