VentureBeat presents: AI Unleashed – An exclusive executive event for enterprise data leaders. Network and learn with industry peers. Learn More
How can you rapidly deploy an AI model around the word for fast inference? That’s a challenge Cloudflare is looking to solve with a series of AI platform updates announced today.
Cloudflare is a globally distributed platform founded in 2009 that has been steadily building out capabilities to enable organizations to safely deploy and secure applications at scale. With the increasing demand from organizations of all sizes to deploy AI models, there is now a clear need for platforms to support that demand.
Cloudflare’s new Workers AI service, provides a serverless capability for delivering AI inference models around the world. While model deployment is critical, so too is the need for governance and observability, which is where Cloudflare’s new AI Gateway fits in. AI relies on the use of vector databases and that’s another need Cloudflare is providing a solution for with its new Vectorize distributed global vector database.
Beyond just expanding its own platform capabilities, Cloudflare is also growing its AI services with partnerships. Hugging Face is now partnering with Cloudflare to enable models to be easily deployed onto the new Workers AI platform. Helping to power AI inference is where a partnership with Microsoft comes into play, as Cloudflare will now be using the Microsoft ONNX runtime model.
An exclusive invite-only evening of insights and networking, designed for senior enterprise executives overseeing data stacks and strategies.
“One of Cloudflare’s ‘secret sauces’ is that we run a massive global network and one of things we’re really good at is moving data, code and traffic, so the right thing is in the right place,” John Graham-Cumming, CTO of Cloudflare told VentureBeat.
How Workers AI enable serverless inference
Cloudflare has been rolling out different services under the Workers product brand for several years. The basic idea is to enable application code to run at the edge of a network, without the need for users to have an always-on server. It’s an approach known as serverless, where users are only paying for services when code runs.
Cloudflare first revealed its attempts to run AI as a serverless offering under the code name Constellation in May of this year. With the Workers AI launch, Graham-Cumming explained that Cloudflare is deploying a massive global rollout of GPUs and AI optimized CPUs across its distributed network.
He noted that based on the specific AI workload’s requirements, Cloudflare will deploy AI models to its network nodes that have the appropriate hardware. Graham-Cumming said that Cloudflare will be able to automatically determine the ideal hardware, whether it’s a CPU or a specific type of GPU, to optimize AI inference tasks.
“People can use our network pretty much wherever they are in the world to do AI tasks using our platform,” he said.
In terms of the AI tasks where Cloudflare expects Workers AI to be useful, the range is broad. Graham-Cumming said it could be nearly anything with tasks like image recognition and predictive analytics being top of mind. In fact, one of the reasons that Cloudflare is partnering with Hugging Face is to enable a wide spectrum of use cases.
“We’re going to be their first serverless GPU partner, so you can go into Hugging Face, pick a model you want to deploy onto our network without writing any code,” he said.
AI Gateway brings observability to AI deployments
For organizations looking to scale AI deployments, having the ability to easily deploy globally is great, but it’s also critical to have insight into what is being deployed.
The new Cloudflare AI Gateway sits in front of AI applications and provides tools for managing, monitoring, and controlling how those applications are used. Graham-Cumming said that it can also be used by developers as a forward proxy to connect to and manage how an AI application is used, while providing visibility into usage patterns and API tokens. The gateway handles capabilities like observability, caching and rate limiting to help scale AI applications.
When it comes to scaling applications, a constraint for AI can often be the data on which the application relies on, which more often than not is a vector database. Vectorize is Cloudflare’s new vector database for storing embeddings and other vectorized data. The goal with Vectorize is to have a distributed deployment with the data closer to the where the inference needs to occur with Workers AI.
The overall effort to fully enable the Cloudflare platform for AI is still a work in progress. Graham-Cumming noted that AI demand today is quite large. Among the challenges for Cloudflare in particular is getting the right GPU hardware deployed across its large global footprint of over 300 cities.
“We’re not there today with 300 cities, but you know, we’re going to be rolling out hardware all over the world for this,” Graham-Cumming said. “That has been a logistic effort to get that right and we know we’re going to be in a lot of places very, very soon.”
VentureBeat’s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative enterprise technology and transact. Discover our Briefings.