Two Big Reasons Why Google's AI Chips Will Have A Tough Time Competing With Nvidia

Credit: Nvidia

Not even Google will likely slow down Nvidia's artificial intelligence gravy train anytime soon.

Graphics chipmaker Nvidia, whose stock has tripled in the past year, has emerged as the dominant platform for deep learning, a popular form of AI computing. The company's graphics processing units, which were originally designed for generating graphics for computer games and simulations, are used for the computationally intensive part of training large deep learning networks with millions of pieces of data.

But at Google's annual developer conference last week, the tech giant announced the second-generation Tensor Processing Unit, the Cloud TPU, a custom chip tuned for accelerating the type of deep learning mathematics that Google uses. The first-generation TPU, announced last year and introduced into its server infrastructure two years ago, was only able to process deep learning inferencing, which is running the already-trained models. Google said the Cloud TPU can tackle training better than commercially-available Nvidia GPUs.

More importantly, Google plans to make the AI hardware available to outside companies through the Google Cloud. It also plans on making 1,000 Cloud TPUs available to researchers who are conducting open AI research. The real purpose of the Cloud TPU appears to be focused on making the Google Cloud more competitive over cloud competitors Amazon and Microsoft. Those two tech giants are far ahead of Google, and Google needs to do everything it can to set itself apart.

The impact for Nvidia is somewhat mixed. The Santa Clara, California-based company dominates the still-young market, but Cloud TPUs offer AI developers a competing architecture to tinker around with. With the announcement of last week's Cloud TPUs, though, Google was careful to note that it's still a heavy user of Nvidia GPUs internally and it will make available Nvidia's next-generation GPUs, Volta, in its cloud.

Here are two major reasons why Google will have a hard time competing in the AI hardware world.

Google Cloud lock-in

First, Cloud TPUs keep users locked into Google's AI framework, called TensorFlow, and locked into the Google Cloud.

"If you're using TensorFlow, that's great, but what if you don't want to be locked into Google?" said Stacy Rasgon, a senior analyst at Bernstein Research. "Part of Nvidia's advantage is that you're not locked in. That's important."

Nvidia GPUs are accessible every major cloud service -- Google, Amazon, Microsoft, IBM. That allows AI developers to pick and choose any cloud vendor they feel like using, and pick up and leave when one isn't working out. As well, Nvidia does its best to optimize its hardware to run most alternative deep learning frameworks, such as Caffe, Torch or PaddlePaddle. The AI market is so young that no one knows which algorithms or frameworks will end up in winning out. Getting stuck with Google's TensorFlow could be a risk for any company.

“Cloud TPUs are clearly part of Google’s strategy to get you onto its cloud and lock you in forever,” said Matt Zeiler, founder and CEO of AI startup Clarifai. “At our company, we want to be as agnostic as possible.”

Even in academic AI circles, where the use of Cloud TPUs will be free, the limitation to one AI framework makes the hardware less enticing. "Practically, for us, the main current limitation would be that it only runs TensorFlow," said Alexei Efros, an associate professor at University of California, Berkeley, in an email. "Since in my lab, we use a lot of different packages (PyTorch seems like the current favorite) for now Nvidia Titan X is still our main workhorse. But let's see what happens."

Nevertheless, there are many startups who have already fully adopted Google's TensorFlow and getting access to Cloud TPUs is potentially a big deal. "We’re built on top of TensorFlow so Google Cloud TPU is certainly something we’re exploring," said Mark Hammond, cofounder and CEO of AI startup Bonsai. "Anything that makes AI infrastructure better, faster, cheaper is a good thing for both Bonsai, and our customers.”

Not a chip business

Then there's the problem that Google doesn't sell its chips directly to customers like Nvidia does. For training, many deep learning startups prefer owning their own hardware, because the costs of storing huge sets of training data on the cloud can be immensely expensive, said Zeiler. Clarifai, for example, buys Nvidia GeForce gaming graphics card to train its neural networks in its own data centers in New Jersey. "We run our own hardware with GPUs for training because it’s much more cost effective — especially when you consider the massive datasets we have and the amount of experimentation we do," he said.

The real competition for Nvidia will more likely come from the companies you'd expect: actual chip companies. Dozens of AI chip startups are emerging at this time and Intel spent more than $400 million for one of the leading startups, Nervana.

"TPUs may impact the GPU market over time, but it won’t just be from Google," said Zeiler. "I’m sure the other chipmakers aren’t just sitting around."

Andrew Feldman, cofounder and CEO of stealthy AI chip startup Cerebras Systems, sees Nvidia's fundamental problem is that the hardware is originally built to generate graphics, not process AI algorithms. “I don’t think the GPU is very good for machine learning,” said Feldman. “It’s just better than Intel’s CPU, but the GPU represents 25 years of optimization for a different problem.”

Nvidia is aware of this problem. That's why Nvidia's latest graphics chip architecture has specialized computing cores, called Tensor Cores, that are optimized for a type of mathematics useful in deep learning operations. With Tensor Cores, Nvidia is increasingly specializing its processors for AI and less for generating graphics. “You’re going to see us add more AI capabilities and specialization into our GPUs,” said Ian Buck, vice president of accelerated computing at Nvidia.

Nvidia's response to the growing competition in AI hardware is that the market is simply growing too fast for any of this to matter right now. “I think it’s an explosive growth market," said Buck. "There’s room for everyone.”

Follow me on Twitter. Send me a secure tip.

More From Forbes

Two Big Reasons Why Google's AI Chips Will Have A Tough Time Competing With Nvidia