ChatGPT : the groundbreaking language model powered by HPC


By: Thomas van Osch

Since the publicity stunt of OpenAI by openly hosting their new language model, ChatGPT has received abundant attention in the media and research. The model is able to generate artificial yet realistic texts and has with that impacted education, marketing, communication and even our way of using internet. But how has ChatGPT evolved up until now and how does HPC relate?


Timeline ChatGPT

In 2018, OpenAI applied the new uprising AI-architecture called Transformer on a large set of digital data. This model is now know as GPT-1 (Generative Pre-trained Transformer) and set the foundation for the GPT series. Only a year later, they improved upon the first model by introducing its ten times larger successor GPT-2. Then in 2020, the enormous GPT-3 large language model was published and was mainly on a vast amount of scraped internet data. On top of that, GPT-3 features more parameter compared to GPT-2 by a factor of 100. The next two years passed under the theme of extending the existing trained GPT-3 model to a interactive user-friendly language model as reflected by the development of InstructGPT (January 2022) and ChatGPT (November 2022), bringing us to our flabbergasted status quo.

What is ChatGPT

In short yet technical terms: ChatGPT is an interactive large language model (LLM) which uses natural language processing and deep learning to return generated pieces of text. Let’s break this down. 

The interactive nature stems from the chatbot-like interface and how the model was trained. By employing humans to generate and check text themselves, the model receives human feedback to stimulate a more natural and humanlike conversation. Large language models refer to the explosive growth and model size of the models in which these models are asked during training to predict a next plausible and fitting word given a group of words or sentence. While this objective has been around for a while, deep learning and novel AI architectures have accelerated the power of language modeling to the model landscape of today. 

In parallel, other natural language processing (NLP) tasks like speech recognition and natural language understanding have also benefited from the increasingly powerful GPUs and technical advancements. 

ChatGPT is somewhat unique here as it was trained to simulate a conversation with an artificial agent, rather than only exhibiting a large database of knowledge like GPT-3. The input from the user and the output from the model are coherent, continuous and plausible and allow for a wide variety of user applications like text summarization, question-answering, essay-writing and even code generation.

What is the role of HPC in ChatGPT

The massive growth of language models to ‘large’ language models cannot only be attributed to smart Artificial Intelligence (AI) techniques. As GPT-1 to GPT-3 has grown by 4 factors of magnitude over the span of just a few years, the computing facilities have also played a crucial role. 

Let’s investigate the training process of the underlying ‘database’ of knowledge of ChatGPT: GPT-3. Coming in at 175 billion parameters, GPT-3 was estimated to cost 5 million USD to train using an optimized data center GPU (Tesla V100) . To train on a single GPU, the entire model would have multiple centuries to finish. Instead, such models can be trained in a few weeks or months with appropriate optimization on an entire cluster consisting of hundreds of GPUs – similar-sized BLOOM took 3.5 months on 384 GPUs to finish training on the French Jean Zay supercomputer

To put that in perspective, that is around 433 MWh  or driving an average car for 300,000 km of carbon footprint just for the GPU usage.

OpenAI, the founder of ChatGPT, is getting exclusive Microsoft data centers purely to train their models. Other parties like Google and Meta are also actively developing gigantic AI models and are in arduous need of computing resources. Building and managing HPC facilities is therefore essential not only in advancing the next generation of language models but also other applications of artificial intelligence including astronomy, medicine and geology.

Leave a Reply

Your email address will not be published. Required fields are marked *

Fill out this field
Fill out this field
Please enter a valid email address.
You need to agree with the terms to proceed