Hyperstack - Thought Leadership

OpenAI Releases Latest AI Flagship Model GPT- 4o: Free for All

Written by Damanpreet Kaur Vohra | May 16, 2024 9:17:18 AM

After months of anticipation for ChatGPT 5, OpenAI has instead released ChatGPT 4-o - a flagship AI model that can process audio, visuals, and text in real time. This development follows the massive success of ChatGPT 3, which fascinated users with its uncanny ability to understand and generate human-like text. While regular users can access the groundbreaking GPT-4o for free, ChatGPT Plus subscribers gain priority access to increased prompt limits and the latest multimodal features. Continue reading to learn more details about GPT-4o.

About ChatGPT- 4o

GPT-4o brings interactive human-computer communication by accepting inputs spanning text, audio and images while generating outputs across these same modalities. The latest AI model can respond to audio prompts in as little as 232 milliseconds, with an average response time of 320 milliseconds—nearly matching human response time. It matches GPT-4’s performance with English text and coding with a massive improvement in non-English language performance. And the best part is its ability to deliver much faster and 50% more cost-effective API access. However, its true strength lies in superior audio and visual comprehension compared to existing models.

Capabilities of ChatGPT- 4o

Before GPT-4o, users could use Voice Mode to talk to ChatGPT with latencies of 2.8 seconds (GPT-3.5) and 5.4 seconds (GPT-4) on average. To achieve this, previous versions of ChatGPT used a disconnected "Voice Mode" comprising three separate models: one transcribing audio to text, another (GPT-3.5 or GPT-4) processing that text, and a third converting the output text back to audio. This multi-step process resulted in informational losses for the core AI model, which could not directly perceive nuances like tone, multiple speakers, background noise, laughter, singing or emotional expression. Hence, GPT-4o integrates these modalities through an end-to-end training approach using a unified neural network architecture. With text, vision, and audio data flowing through a single model, ChatGPT-4o is a capable multimodal AI.

Also Read: Tips and Tricks for Developers of AI Applications in the Cloud

Language Tokenization of ChatGPT- 4o

To improve GPT-4o's linguistic capabilities, a new tokenizer was developed to process 20 diverse languages: Arabic, Bengali, Chinese, English, French, German, Gujarati, Hindi, Italian, Japanese, Kannada, Korean, Malayalam, Marathi, Oriya, Punjabi, Russian, Spanish, Tamil, and Telugu.

Also Read: Running a Chatbot: How-to Guide

Model Evaluation of ChatGPT- 4o

On standard tests that measure text understanding, reasoning, and coding abilities, GPT-4o performs just as well as GPT-4 Turbo. However, GPT-4o sets new higher standards for understanding multiple languages, audio data like speech, and visual information like images. Check out the evaluation details below:

  • Text Evaluation: GPT-4o sets a new high score of 88.7% on the COT MMLU benchmark for general knowledge questions, showing improved reasoning abilities.
  • Audio ASR: GPT-4o dramatically improves speech recognition performance across all languages compared to Whisper-v3, especially for lower-resourced languages.
  • Audio Translation: GPT-4o establishes a new state-of-the-art on the MLS speech translation benchmark, outperforming Whisper-v3.
  • M3Exam Zero-Shot: On this multilingual and vision-based exam questions benchmark, GPT-4o outperforms GPT-4 across all languages.
  • Vision Understanding: GPT-4o achieves state-of-the-art performance on visual perception benchmarks like MMMU, MathVista, and ChartQA in zero-shot settings.

Also Read: How to Use Oobabooga Web UI to Run LLMs on Hyperstack

Safety Approach of ChatGPT- 4o

OpenAI has used robust safeguards in GPT-4o's core architecture across all modalities, like Meta’s LLama 3 and Microsoft’s Phi-3 using a responsible and safe AI approach. The model is built with techniques like filtered training data and post-training behavioural refinement to help imbue the model with built-in safeguards with novel systems to regulate voice outputs. The careful evaluations based on OpenAI's Preparedness Framework and voluntary commitments reveal that GPT-4o poses no higher than a "Medium" risk across categories like cybersecurity, CBRN (chemical, biological, radiological, and nuclear), persuasion, and model autonomy. This multi-stage assessment process involved comprehensive automated and human evaluations throughout training, testing pre- and post-mitigation versions using custom prompts and fine-tuning to probe the model's capabilities.

Over 70 external experts spanning social psychology, bias/fairness, and misinformation were engaged in extensive "red teaming" exercises to identify potential risks amplified by GPT-4o's multimodal nature. While OpenAI acknowledges the dangers presented by GPT-4o's audio capabilities, the current launch enables public access to text/image inputs and outputs. 

Model Availability of ChatGPT- 4o 

Thanks to OpenAI’s research of two years aiming to improve efficiency across the AI stack.  ChatGPT users can now leverage GPT-4o's text and image capabilities, with the model being offered on the free tier and up to 5x higher message limits for Plus subscribers. An alpha version of the revamped Voice Mode powered by GPT-4o will follow for ChatGPT Plus in the coming weeks. Developers can also now integrate GPT-4o for text and vision tasks via OpenAI's API, with a 2x performance boost, 50% cost reduction and 5x higher rate limits over GPT-4 Turbo. However, the support for GPT-4o's audio/video functions will initially roll out to a limited cohort of trusted partners in the coming weeks.

Conclusion 

As OpenAI continues to refine and expand the capabilities of ChatGPT-4o, we are eagerly waiting to experience breakthroughs in AI. At Hyperstack, we offer access to NVIDIA’s powerful resources such as the NVIDIA A100 and NVIDIA H100 PCIe, which boast exceptional performance and advanced features designed for accelerating powerful LLMs like GPT. These GPUs are built with specialised Tensor Cores, which can perform matrix operations faster. Hence, enabling efficient training and inference for large language models like GPT-4o. The multi-GPU systems and cutting-edge interconnect technologies like NVLink further boost the computational power of these AI models for faster training times and real-time inference. 

Get started today to experience the power of high-end NVIDIA GPUs to lead innovation. 

FAQs

What is ChatGPT- 4o?

ChatGPT 4o is OpenAI's latest flagship model that can reason across audio, vision, and text in real time. The model enables much more natural human-computer interaction.

What are the capabilities of ChatGPT- 4o?

GPT-4o matches GPT-4 Turbo's performance on text in English and code, while significantly improving on non-English languages. It excels at vision and audio understanding and can respond to audio inputs with near-human response times.

How can users access ChatGPT-4o? 

ChatGPT-4o's text/image capabilities are currently rolling out on ChatGPT across free and paid tiers, with audio support coming to ChatGPT Plus soon. Developers can also access the model through OpenAI's API.

Is ChatGPT-4o safe?

Yes, absolutely. OpenAI has implemented safety measures such as filtering training data, post-training refinement and novel safety systems for voice outputs. The model has undergone extensive evaluations and red teaming to identify and mitigate potential risks.

How is ChatGPT-4o different from its predecessors?

Unlike previous models that relied on separate pipelines for different modalities, ChatGPT-4o is a single end-to-end model trained across text, vision, and audio, allowing for seamless multimodal integration.

Is ChatGPT-4o free?

Yes, regular users can access the GPT-4o for free but ChatGPT Plus subscribers can get priority access to increased prompt limits and the latest multimodal features.