Nvidia partners with Run:ai and Weights & Biases for MLops Stack

We’re excited to bring Transform 2022 back in person on July 19 and pretty much July 20-28. Join AI and data leaders for insightful conversations and exciting networking opportunities. Register today!


Running a full lifecycle of machine learning workflows can often be a complicated operation, involving multiple disconnected components.

Users must have hardware optimized for machine learning, the ability to orchestrate workloads across that hardware, and then also have some form of machine learning operations (MLops) technology to manage the models. In an effort to make things easier for data scientists, artificial intelligence (AI) compute orchestration supplier Run:ai, which raised $75 million in March, and MLops platform vendor Weights & Biases (W&B) are partnering with Nvidia.

“This three-way partnership allows data scientists to use Weights & Biases to plan and execute their models,” Omri Geller, CEO and co-founder of Run:AI told VentureBeat. “In addition, Run:ai efficiently orchestrates all workloads on Nvidia’s GPU resources, so you get the complete solution from the hardware to the data scientist.”

Run:ai is designed to help organizations leverage Nvidia hardware for machine learning workloads in cloud native environments – a deployment approach that leverages containers and microservices managed by the Kubernetes container orchestration platform.

One of the most common ways for organizations to run machine learning on Kubernetes is with the open source project Kubeflow. Run:ai has an integration with Kubeflow that can help users optimize Nvidia GPU usage for machine learning, Geller explained.

Omri added that Run:ai is designed as a plug-in for Kubernetes that enables the virtualization of Nvidia GPU resources. By virtualizing the GPU, the resources can be fractionated so that multiple containers can access the same GPU. Run:ai also enables quota management for GPU virtual instances to ensure workloads always access the required resources.

Geller said the goal of the partnership is to make a complete workflow for machine learning activities more consumable for business users. To that end, Run:ai and Weights & Biases are building an integration to make it easier to use the two technologies together. Omri said organizations that wanted to use Run:ai and Weights & Biases had to go through a manual process before collaborating to make the two technologies work together.

Seann Gardiner, vice president of business development at Weights & Biases, noted that the partnership will allow users to take advantage of the workout automation offered by Weights & Biases with the GPU resources orchestrated by Run:ai.

Nvidia is not monogamous and works with everyone

Nvidia is partnering with both Run:ai and Weights & Biases as part of the company’s larger strategy to collaborate across the machine learning ecosystem of vendors and technologies.

“Our strategy is to work fairly and evenly with the overarching goal of making AI ubiquitous,” Scott McClellan, senior director of product management at Nvidia, told VentureBeat.

McClellan said the partnership with Run:ai and Weights & Biases is particularly interesting because he believes the two vendors offer complementary technologies. Both vendors can now also connect to the Nvidia AI Enterprise platform, which provides software and tools to make AI useful for enterprises.

With the three vendors working together, McClellan said that if a data scientist tries to use Nvidia’s AI enterprise containers, they don’t have to figure out how to do their own orchestration deployment frameworks or their own planning.

“These two partners complement our stack — or we complete theirs and we complement each other’s — so the whole is greater than the sum of its parts,” he said.

Avoiding MLops’ “Bermuda Triangle”

For Nvidia, partnering with vendors such as Run:ai and Weights & Biases is about helping solve a key challenge many companies face when embarking on an AI project for the first time.

“The moment when a data science or AI project tries to go from experiment to production is sometimes a bit like the Bermuda Triangle where a lot of projects die,” McClellan said. “I mean, they just disappear into the Bermuda Triangle of — how do I get this thing into production?”

With the use of Kubernetes and cloud-native technologies, which are widely used by enterprises today, McClellan hopes it will be easier now than in the past to develop and operationalize machine learning workflows.

“MLops is devops for ML — it’s literally how these things don’t die when they go into production, and go full and healthy lives,” McClellan said.