Report: 37% of ML leaders say they don’t have the data they need to improve model performance

Report: 37% of ML leaders say they don’t have the data they need to improve model performance

We’re excited to bring Transform 2022 back in person on July 19 and pretty much July 20-28. Join AI and data leaders for insightful conversations and exciting networking opportunities. Register today!


A new report from Scale AI reveals what works and what doesn’t work with AI deployment, and best practices for ML teams to move from testing-only to real-world deployment. The report examines each stage of the ML lifecycle — from data collection and annotation to model development, implementation and monitoring — to understand where AI innovation is hindered, where failures occur, and which approaches help businesses find success.

The aim of the report is to continue to shed light on the reality of what it takes to unlock the full potential of AI for any business and empower organizations and ML practitioners to overcome their current hurdles, learn best practices and deploy, and ultimately use AI as a strategic advantage.

For ML practitioners, data quality is one of the most important factors in their success, and according to respondents, it is also the most difficult challenge to overcome. In this survey, more than a third (37%) of all respondents said they don’t have the variety of data they need to improve model performance. Not only do they lack variety of data, but quality is also an issue – only 9% of respondents indicated that their training data is free of noise, bias and gaps.

The majority of respondents have problems with their training data. The top three issues are data noise (67%), data bias (47%) and domain gaps (47%).

Most teams, regardless of industry or level of AI advancement, face similar challenges in data quality and variety. Scale’s data suggests that working closely with annotation partners can help ML teams overcome data management and annotation quality challenges, accelerating model deployment. ML teams not involved with annotation partners at all will likely take more than three months to get annotated data.

This survey was conducted online in the United States by Scale AI from March 31, 2022 to April 12, 2022. More than 1,300 ML practitioners, including those from Meta, Amazon, Spotify and more, were surveyed for the report.

Read the full Scale AI report.

The mission of VentureBeat is a digital city square for technical decision makers to gain knowledge about transformative business technology and transactions. Learn more about membership.