As part of its continued efforts to bring power-hungry machine learning models to the forefront of the tech industry, Google published details in its blog on Wednesday of one of its advanced artificial intelligence supercomputers that it claims is faster and more efficient than rival Nvidia systems.
As Nvidia currently dominates the market for the training and deployment of AI models, with a market share of over 90%, Google is designing and deploying TPUs, or Tensor Processing Units, in order to take advantage of this market opportunity.
Over the last decade, Google has been a major player in the field of artificial intelligence, and its employees have developed some of the most significant advances in the field. There are some, however, who believe it has fallen behind in the commercialization of its invention, and the company has been racing to release products to show it hasn't squandered its lead, a condition described as a "code red" situation within the company.
The models and products that use artificial intelligence, such as Google's Bard or OpenAI's ChatGPT, which is powered by NVIDIA's A100 chips, require thousands or hundreds of computers to train the models, and the computers have to run around the clock for weeks or even months at a time.
Google announced on Tuesday that it had built a system with over 4,000 TPUs joined by custom components designed to run and train AI models. It's been running since 2020 and has been used to train Google's PaLM model, which competes with OpenAI's GPT model, for the past 50 days.
Google researchers have developed a supercomputer based on the TPU called TPU v4 that runs 1.2x to 1.7x faster and consumes 1.3x to 1.9x less power than Nvidia's A100 supercomputer, according to their paper.
“TPU v4 supercomputers are the workhorses of large language models due to their performance, scalability, and availability,” the researchers concluded.
There were, however, no comparisons made between Google’s TPU results and the results of Nvidia’s latest AI chip, the H100, because the Nvidia chip is more recent and was manufactured with more advanced manufacturing technologies, according to Google.
In this case, Nvidia declined to comment on the matter. It is anticipated on Wednesday that Nvidia will release the results and rankings of MLperf, which is an industry-wide AI chip test.
Many companies in the AI industry are focused on developing new chips, and components like optical connections, or developing software techniques that will reduce the amount of power needed to run AI, as the amount of computing power that is required for AI is substantial.
Cloud providers like Google, Microsoft, or Amazon may also benefit from the power requirements of artificial intelligence because they are able to rent out computer processing by the hour and provide credits or computing time to startups as a means of building relationships with them. In addition to the Nvidia chip time, Google's cloud also sells time on Nvidia's TPU chips. For example, Google stated that Midjourney, a machine learning algorithm, was trained on these chips.
As a leading independent research provider, TradeAlgo keeps you connected from anywhere.