SAN FRANCISCO, April 1, 2026 (GLOBE NEWSWIRE) – Today, MLCommons® has announced new results for its industry standard MLPerf® Inference v6.0 benchmark suite. This release includes several key enhancements that ensure the benchmark suite tests current, real-world scenarios for AI deployments and provides a comprehensive view of AI system performance.
Five of the eleven data center tests in MLPerf Inference v6.0 are new or updated, and the release also includes a new object detection test for edge systems. Key changes include:
● A new, open model for major languages benchmark based on GPT-OSS 120B and can be used for mathematics, scientific reasoning and coding;
● Extensive DeepSeek-R1 advanced reasoning benchmarkincluding an interactive scenario that enables speculative decoding;
● DLRMv3, the third generation of our recommendation benchmark and now the first sequential recommendation benchmark test in the suite, which has been thoroughly modernized based on generous technical contributions from Meta, a global leader in recommendation systems;
● The suite’s first text-to-video generation benchmark;
● A new vision language model (VLM) benchmark that converts unstructured multimodal data from Shopify’s extensive product catalog into structured metadata;
● Improved detection of single objects benchmark for edge scenarios based on Ultralytics’ YOLOv11 Large model.
“This is the most significant revision of the Inference benchmark suite we have ever done,” said Frank Han, technical staff, Systems Development Engineering at Dell Technologies and co-chair of the MLPerf Inference Working Group. “The decision to update so many benchmarks in this round was driven by the extraordinary enthusiasm and collaboration of our members, who have contributed an unprecedented amount of engineering effort and IP to building new inference benchmarks. Adding these new tests will help MLPerf Inference better keep pace with the rapid pace of evolution in AI models and techniques, so that our benchmarks are relevant and representative of real-world deployments.”
The open-source MLPerf Inference benchmark suite measures system performance in an architecture-neutral, representative, and reproducible manner. The aim is to create a level playing field for competition that drives innovation, performance and energy efficiency for the entire industry. The published results provide critical technical information for customers purchasing and tuning AI systems.
“We thank Meta, Shopify and Ultralytics for their substantial collaboration with us in implementing these changes to the MLPerf Inference benchmark suite and for contributing their datasets, task definitions and expertise,” said Miro Hodak, senior member of the technical staff at AMD and co-chair of the MLPerf Inference Working Group. “These partnerships were essential to ensure that testing included scenarios and workloads that represented the current state of the industry.”
“MLPerf Inference benchmarks play a critical role in driving transparency and accountability in the AI industry,” said Glenn Jocher, CEO and founder of Ultralytics. “At Ultralytics, rigorous, reproducible benchmarking is central to how we develop and validate our Ultralytics YOLO models – so developers and organizations can make informed decisions about real-world performance. We are proud to be part of an ecosystem that keeps the entire field at a higher level.”
“Commerce is one of the most complex domains in AI, but researchers rarely have data that reflects this complexity,” said Kshetrajna Raghavan, principal engineer, Applied ML at Shopify. “Shopify is uniquely positioned to address this because it sits at the intersection of millions of sellers and billions of products. Sharing this taxonomy will help the entire field evolve.”
New tools for filers and consumers
With Inference 6.0, submitters have the option to use a newly available harness to complete benchmark tests. The new system, LoadGen++, allows LLMs to run with a server-style software stack, which is familiar to typical implementations today. “LoadGen++ is a significant upgrade over its predecessor and represents a significant investment by MLCommons that will allow us to remain agile as we continue to produce benchmark tests that follow the state of the art,” said Han.
In addition, the results of Inference 6.0 can be viewed in a new online dashboard https://mlcommons.org/visualizer on the MLCommons site. The dashboard brings new levels of interactivity when viewing results, including advanced filtering and custom performance graphs.
Large-scale multi-node systems are receiving attention
The submissions for Inference 6.0 show that technology vendors are looking to demonstrate the performance of scaled-up, multi-node systems running real-world inference workloads. This round recorded a new high for multi-node system submissions, up 30% from the Inference 5.1 benchmark six months ago. Additionally, 10% of all submitted systems in Inference 6.0 had more than ten nodes, compared to just 2% in the previous round. The largest system submitted in Inference 6.0 contained 72 nodes and 288 accelerators, quadrupling the number of nodes in the largest system in the previous round.
“As more AI applications have gone into production and become widely available, the demand for large-scale, high-performance systems to run them has increased,” says Hodak. “At the same time, multi-node systems present a unique set of technical challenges beyond those of single-node systems, requiring configuration and optimization of system architectures, network connections, data storage, and software layers. Stakeholders are eager to address these challenges and run inference workloads at scale.”
The AI community continues to embrace and invest in MLPerf Inference
The MLPerf Inference 6.0 benchmark received entries from a total of 24 participating organizations: AMD, ASUSTeK, Cisco, CoreWeave, Dell, GATEOverflow, GigaComputing, Google, Hewlett Packard Enterprise, Intel, Inventec Corporation, KRAI, Lambda, Lenovo, MangoBoost, MiTAC, Nebius, Netweb Technologies India Limited, NVIDIA, Oracle, Quanta Cloud Technology, Red Hat, Stevens Institute of Technology and Supermicro.
“I would like to welcome our first applicants, Inventec Corporation, Netweb Technologies India Limited and Stevens Institute of Technology,” said Han. “The AI ecosystem is large and diverse, and continues to grow and evolve rapidly. On behalf of MLCommons, I would also like to thank our members, our contributors, and our partners, including Meta, Shopify, and Ultralytics, for working with us to build and advance the most comprehensive and relevant performance benchmark suite for AI inference. Together, we ensure stakeholders in our community have valuable, actionable information to help them make better decisions.”
View the results
To view the results for MLPerf Inference v6.0, visit the benchmark results dashboard https://mlcommons.org/visualizer.
About MLCommons
MLCommons is the global leader in AI benchmarking. MLCommons is an open engineering consortium supported by more than 130 members and affiliates and has a proven track record of bringing together academia, industry and civil society to measure and improve AI. The foundation for MLCommons started with the MLPerf benchmarks in 2018, which quickly grew into a set of industry metrics for measuring machine learning performance and promoting transparency in machine learning techniques. Since then, MLCommons has continued to use collective engineering to develop the benchmarks and metrics needed for better AI – ultimately helping to evaluate and improve the accuracy, security, speed and efficiency of AI technologies.
For more information about MLCommons and details on how to become a member, visit MLCommons.org or email participation@mlcommons.org.
Press inquiries: contact press@mlcommons.org

