There Is No AI Revolution Without Platform Engineering

Steve Rodda is the CEO of Ambassador, a cloud-native development company helping enterprises manage microservices on Kubernetes.

getty

It dawned on me that last November, KubeCon + CloudNativeCon and the OpenAI DevDay opened on the exact same day. Yet, the two conferences seemed worlds apart. For all the talk of using artificial intelligence (AI), machine learning (ML) and large language models (LLMs) within the DevOps and platform engineering ecosystem, the industry has yet to fully recognize the role AI will play. As Andrew Fong, CEO of Prodvana, put it, “AI essentially does not exist at Kubecon. Again, Platform Engineering Teams have no idea what is coming.”

But some platform engineering teams do understand what’s coming. These are the platform engineering teams building the AI tools that are changing the world. The fascinating truth is that the marriage of AI and platform engineers is actually strong.

AI might not be changing platform engineering just yet, but platform engineering is changing AI. Let me explain.

AI Is Built On The Infrastructure Of The Future

“We believe that increasing compute is a huge lever to AI progress.”

So begins an early 2024 job posting for the OpenAI Supercomputing team. Of course, they're entirely correct—AI requires scale. This is, in fact, the main reason we're seeing such huge performance leaps in AI models. It’s not that the underlying algorithms are getting better; it's just that more teams are able to scale their training data and compute to offer better models.

MORE FOR YOU

300 Billion Perfect Storm Bitcoin Price Crash Under 60 000 Suddenly Accelerates As Ethereum XRP And Crypto Brace For Shock Fed Flip

Toyota s SUV Lineup Is New And Refreshed Which One Is Right For You

The Top 10 Richest People In The World (May 2024)

How do you scale? One way is by using clusters. That same job description goes on to say that "you'll find this work really exciting if you...have built large clusters but have motivation to scale beyond." This will be absolutely familiar to anyone involved in DevOps or platform engineering, as Kubernetes technology is very commonplace here. It asks for knowledge of Terraform, cloud platforms and running high-availability distributed clusters. There are a couple of quirks (have you spun up any GPU clusters?), but it's a run-of-the-mill platform engineering job description for an absolutely not-run-of-the-mill role.

Of course, OpenAI isn’t the only team running their models using Kubernetes. Anthropic, the company behind the Claude models, runs on Kubernetes. Google AI runs on Kubernetes. Cohere, an AI platform for enterprises, also runs on Kubernetes. Saurabh Baji, Cohere's senior vice president of engineering, explains: "Kubernetes has significantly simplified the deployment of large language models across platforms."

As AI continues to push the boundaries of what's possible, it's clear that the AI future is inextricably linked to the innovation and expertise of platform engineering teams that are quietly building the robust, scalable infrastructure (and more often than not with K8s tools) that will power the next generation of AI breakthroughs.

AI And Platform Engineering Are Built On The Same Foundation

Why does platform engineering fit so well into this new field? Everything that platform engineering offers—scalability, management, collaboration, monitoring and security—is precisely what these new AI companies are looking for.

• Scalability: Modern AI systems, especially LLMs, require massive computational resources to train and deploy. Platform engineering is able to design and build the distributed computing infrastructure that can handle this scale. Training cutting-edge AI models wouldn't be feasible without the ability to parallelize workloads across large clusters of GPUs and TPUs.

• Data Management: AI relies on massive datasets for training. Platform engineering practices around data pipeline automation, data versioning and efficient storage and retrieval can help manage the petabyte-scale datasets used in AI development. Tools like Kubeflow help streamline this.

• Continuous Integration And Deployment: Platform engineering can automate the training, testing and deployment of new AI model versions. This CI/CD applied to ML enables rapid iterations and rollouts of new capabilities. Platforms like AWS Sagemaker help manage this ML life cycle.

• Monitoring And Feedback: As AI systems are deployed in the real world, monitoring their performance and gathering feedback data is critical for identifying issues and improving models over time. Platform engineering practices enable instrumenting, logging and updating live ML systems.

• Security: Millions of users log in to these systems every day. API gateways, especially Kubernetes-native options, allow platform engineering teams to control the traffic to ensure the system can cope with the load.

AI research and algorithms get a lot of attention. Still, the unglamorous infrastructure and engineering work make the current AI boom possible by allowing models and datasets to scale by orders of magnitude.

Testing The Limits

What's the most exciting part of this for platform engineers? The scale of AI is testing the limits of our infrastructure and pushing us to build new tools, processes and platforms to increase efficiency.

An example of this is Karpenter, a Kubernetes cluster autoscaler that can "improve the efficiency and cost of running workloads on that cluster." Karpenter works to optimize your pod size and clusters and can remove the need for manual configuration. AWS designed this in partnership with Anthropic because this was the type of problem AI companies see—huge scale and a need to build new optimizations.

Yes, platform engineers should be excited about how AI will help them in their roles in the near (and distant) future. But, more importantly, they should also be excited about how the platforms, processes and tools they've already built have ushered in a completely new field (AI) that would be impossible without them.

Platform engineers are the unsung heroes underpinning advances in AI, and their expertise will be pivotal in overcoming the challenges and realizing the opportunities presented by this and any other groundbreaking technology.

Forbes Technology Council is an invitation-only community for world-class CIOs, CTOs and technology executives. Do I qualify?

Follow me on Twitter or LinkedIn. Check out my website.

More From Forbes