If you’re in or around the world of LLMs, you hear this term all the time. But what does it mean? It’s not as complicated as it sounds, and it’s the key to understanding the entire AI hardware race.
In this blog, we will explore what a FLOP is and what FLOPS are in a GPU. Then we will explore the mathematical aspect of training LLMs in FLOPs and how long it will take to train an LLM on one GPU, and how many GPUS you need to train an LLM.
Table of Contents:
What is a FLOP?
What are FLOPS in a GPU?
How FLOPS Relate to Training LLMs?
How long does it take to train GPT-3 on one GPU?
Get All My Books, One Button Away With 40% Off
I have created a bundle for my books and roadmaps, so you can buy everything with just one button and for 40% less than the original price. The bundle features 8 eBooks, including:
1. What is a FLOP?
First, let’s define the base unit.
FLOP(Floating-point OPeration): This is a single mathematical calculation involving a number with a decimal point (a “floating-point number”). Examples: 3.141 * 2.718 or 1.0 + 0.5.
Why Floating-Point? In AI and especially in LLMs, the model’s “knowledge” is stored in its parameters (weights and biases), which are almost always numbers with decimal points. The calculations to process language and learn from data require this precision.
2. What are FLOPS in a GPU?
This is where it gets interesting. FLOPS (note the ‘S’ at the end) stands for Floating-point Operations Per Second. Think of it as the horsepower of a computer for AI tasks. It’s a measure of a GPU’s raw computational speed.
So, FLOPS is the maximum number of floating-point calculations a GPU can perform in one second.
Modern GPUs perform a mind-boggling number of these. We use prefixes to describe them:
GigaFLOPS (GFLOPS): Billions of operations per second.
TeraFLOPS (TFLOPS): Trillions of operations per second. (This is the standard for modern gaming and AI GPUs.)
PetaFLOPS (PFLOPS): Quadrillions of operations per second. (This is the scale of supercomputers and large GPU clusters.)
ExaFLOPS (EFLOPS): Quintillions of operations per second. (The frontier of high-performance computing).
Why are GPUs so good at this?
A CPU (Central Processing Unit) is like a master chef. It has a few very powerful, very smart cores that can do complex tasks one after another very quickly.
A GPU (Graphics Processing Unit) is like an army of line cooks. It has thousands of simpler, smaller cores. They can’t do complex tasks like a CPU, but they can all perform the same simple operation (like multiply or add) on thousands of different pieces of data at the exact same time.
This “massively parallel” architecture is perfect for the math behind AI, which primarily consists of giant matrix multiplications — essentially, performing millions of simple multiplications and additions simultaneously.
Keep reading with a 7-day free trial
Subscribe to To Data & Beyond to keep reading this post and get 7 days of free access to the full post archives.