Alphabet Focuses on Efficiency with New AI Chip

4 Sep 2023

Although Nvidia's GPUs for data centers are extensively utilized for AI tasks, Google, under the parent company Alphabet, has been dedicated to developing its own AI chips. In 2016, Google introduced its Tensor Processing Unit (TPU), and now, its fourth-generation TPU (TPU v4) is accessible to customers on the Google Cloud platform since last year.

An application-specific integrated circuit (ASIC) offers the advantage of being designed at the hardware level to carry out particular tasks. On the other hand, a general-purpose graphics processing unit (GPU) is capable of being useful for a diverse range of workloads, not just AI, but it may have less efficiency. According to Google, their TPU v4 outperforms Nvidia's previous-generation A100 data center GPU when it comes to various AI workloads.

In 2022, Nvidia introduced its highly potent H100 GPU for data centers, which has now become the benchmark for AI tasks. The demand for H100 GPUs is soaring, and Nvidia is selling every unit it manufactures. According to reports, the company is anticipated to increase the production of data center GPUs threefold by 2024. As of now, nothing surpasses the H100 in terms of AI speed.

Developing a massive language model, like the one fueling OpenAI's ChatGPT, requires a tremendous effort. To achieve timely completion of the training process, numerous high-performance AI chips need to be interconnected. The upfront expenditure on these AI chips and the essential equipment, forming an AI supercomputer in essence, is substantial. Furthermore, the continuous operational expenses involved are also exorbitant.

Despite the significance of sheer power, effectiveness holds equal value. On Tuesday, Google revealed a fresh version of its TPU that emphasizes striking the right balance between performance and efficiency. The TPU v5e is built to cater to AI training and inference, and it is currently accessible for Google Cloud users who utilize the Google Kubernetes Engine platform in its preview mode.

According to Google, the TPU v5e offers double the training efficiency per unit of currency and up to 2.5 times the inference efficiency per unit of currency compared to the TPU v4. As the expenses for training cutting-edge AI models continue to surge, offering a budget-friendly alternative could prove to be a major success for Google.

Significantly, the TPU v5e has the capability to handle more AI training tasks that require high performance. In comparison, the previous TPU v4 was able to accommodate only around 3,000 chips for a single workload on Google Cloud, whereas with the TPU v5e, customers will have the advantage of utilizing tens of thousands of chips simultaneously.

Despite Google's TPUs providing an economical option for running AI tasks on the cloud, they must acknowledge the significant demand for Nvidia's H100 GPUs. In addition to introducing the TPU v5e, Google Cloud has also launched its A3 virtual machines, which are equipped with H100 GPUs. These virtual machines combine Intel's latest Xeon CPU with eight H100 GPUs, making them capable of handling even the most demanding AI workloads. It is worth noting that a single workload can utilize tens of thousands of H100 chips, ensuring ample power for resource-intensive AI tasks.

Supercharging Your Cloud Business

Google Cloud is not the initial major cloud provider to introduce virtual machines supported by Nvidia's H100. Amazon Web Services disclosed a comparable offering in July, and Microsoft Azure followed suit earlier this month. Nonetheless, the corporation is confident that cloud clients require alternatives. Its effective and economical TPU-driven services may provide it with an advantage as cloud providers compete fiercely to attract AI workloads.

Google Cloud has emerged as a significant venture for Alphabet, bringing in a substantial $8 billion in revenue during the second quarter. What's even more crucial is that it has managed to achieve operating profitability. In this rapidly evolving AI landscape, Google Cloud aims to set itself apart from its competitors. This is particularly significant as the AI industry reaches a more advanced stage, where the focus is shifting towards ensuring a reasonable return on investment and efficiently managing the expenses associated with training sophisticated AI models.

Suzanne Frey, who holds an important position at Alphabet, is part of the board of directors at The Motley Fool. John Mackey, the former CEO of Whole Foods Market (which is owned by Amazon), is also on The Motley Fool's board of directors. Timothy Green has investments in Intel. The Motley Fool also has investments in and recommends Alphabet, Amazon.com, Microsoft, and Nvidia. Additionally, The Motley Fool recommends Intel and suggests considering the purchase of long-term call options on Intel: specifically, the January 2023 $57.50 calls and the January 2025 $45 calls. The Motley Fool adheres to a disclosure policy.