About 100 Days of AI:
Hey everyone! I’m Nataraj, and just like you, I’ve been fascinated with the recent progress of artificial intelligence. Realizing that I needed to stay abreast with all the developments happening, I decided to embark on a personal journey of learning, thus 100 days of AI was born! With this series, I will be learning about LLMs and share ideas, experiments, opinions, trends & learnings through my blog posts. You can follow along the journey on HackerNoon here or my personal website here. In today’s article, we’ll be talking about take aways from Nvidia’s annual AI developer conference GTC.
If AI is a gold rush, Nvidia is the ultimate picks and shovels company. All things AI go through Nvidia because of its dominance as the sole company that creates the chips (GPUs) needed to deploy the computationally expensive large language models. As the models scale it increases the need for more power GPUs and the entire world is relying on Nvidia to deliver on them. Its not exaggeration if we say Nvidia is at the core of all things AI and they recently had their semi-annual developer conference GTC (March 18-21). In this post I am going to summarize my take aways from this very special GTC conference and what does it tell about upcoming days in the gen AI space.
Take away 1: AI is not limited to the tech sector
The power of what gen AI can do and deliver has not lost on non-tech sectors. Companies across all the sectors in the economy think that there is a need for them to adopt AI and find way to partner, execute and stay ahead. The proof here is looking at how many non-tech sector companies have partnered with Nvidia. Companies from retail, automobile, construction, design and everything else have announced partnerships with Nvidia.
Take away 2: We need much bigger GPUs
Ever since transformers have invented, we are doubling the scale of LLMs every 6 months. For example look at the parameter count of Open AI models below.
In GPT-3 series alone the parameter count ranged from 125M to 175B. And the latest models expected to come out of Open AI are rumored to have parameter counts above trillion and soon could hit trillions in less than two years, if the rate of scaling continues. To support the increasing scale of LLMs we need increased computational power. To get the world, Nvidia is launching a new series of GPUs called as Blackwell, named after statistician & mathematician, David Blackwell. Blackwell is a new series of GPUs (GB200) but also a new super computer platform.
Here’s a performance comparison of the new Blackwell GPU with its previous generation GPU Hopper.
Takeaway 3: Moving from retrieval to generation
In the current generation of computing everything is about retrieving a piece of data stored in different formats and presenting in a way that is useful to the user. But in coming generation of computing and apps, we are about to see more of generating things at the user request and giving it back to the user. In case of chat-gpt for example, the answer you are getting is not stored in some database before hand, but it is being generated in real time to serve the user’s question. We are about to see the generation of text, images, videos, chemicals, proteins and more.
Takeaway 4: Inference is getting better
Inferencing is when you ask chat-gpt a question, the model is figuring out the answer in the form of token generation and returning the answer in the form of tokens. If inference is not fast enough then the consumers will not be able to use any of the AI applications. Inference is especially tricky because LLMs size is getting larger and larger and they don’t fit in one GPU, so parallelizing the inference across GPUs is a difficult task and involved programming Nvidia GPUs using multiple optimization techniques. With Blackwell GPUs Nvidia is able to achieve 30x faster inference speeds.
Takeaway 5: The real metaverse is NVidia’s Omniverse
As I learnt more and more about Omniverse by Nvidia I am more and more bullish on the product and the idea. This is because to really leverage AI and unlock all the possibilities in automation and robotics, we actually do not have all the data needed. For example say you want to create a robot which cooks great food of all kinds. There is no data where you have first person view of chefs cooking various kinds of dishes. We have general data of various cooking in the form of YouTube videos, but to get the robots learn the physical movements involved in cooking a first person data is needed. A 3d simulation of the world in Omniverse can help bridge that gap of data needed for unlocking all these use cases. I am extremely bullish on the idea.
That’s it for Day 22 of 100 Days of AI.
I write a newsletter called Above Average where I talk about the second order insights behind everything that is happening in big tech. If you are in tech and don’t want to be average, subscribe to it.
Follow me on Twitter, LinkedIn for latest updates on 100 days of AI. If you are in tech you might be interested in joining my community of tech professionals here.