MLCommons publishes new MLPerf Inference v6.0 benchmark results

SAN FRANCISCO, April 1, 2026 (GLOBE NEWSWIRE) – Today, MLCommons^® has announced new results for its industry standard MLPerf^® Inference v6.0 benchmark suite. This release includes several key enhancements that ensure the benchmark suite tests current, real-world scenarios for AI deployments and provides a comprehensive view of AI system performance.

Five of the eleven data center tests in MLPerf Inference v6.0 are new or updated, and the release also includes a new object detection test for edge systems. Key changes include:

● A new, open model for major languages benchmark based on GPT-OSS 120B and can be used for mathematics, scientific reasoning and coding;
● Extensive DeepSeek-R1 advanced reasoning benchmarkincluding an interactive scenario that enables speculative decoding;
● DLRMv3, the third generation of our recommendation benchmark and now the first sequential recommendation benchmark test in the suite, which has been thoroughly modernized based on generous technical contributions from Meta, a global leader in recommendation systems;
● The suite’s first text-to-video generation benchmark;
● A new vision language model (VLM) benchmark that converts unstructured multimodal data from Shopify’s extensive product catalog into structured metadata;
● Improved detection of single objects benchmark for edge scenarios based on Ultralytics’ YOLOv11 Large model.

“This is the most significant revision of the Inference benchmark suite we have ever done,” said Frank Han, technical staff, Systems Development Engineering at Dell Technologies and co-chair of the MLPerf Inference Working Group. “The decision to update so many benchmarks in this round was driven by the extraordinary enthusiasm and collaboration of our members, who have contributed an unprecedented amount of engineering effort and IP to building new inference benchmarks. Adding these new tests will help MLPerf Inference better keep pace with the rapid pace of evolution in AI models and techniques, so that our benchmarks are relevant and representative of real-world deployments.”

The open-source MLPerf Inference benchmark suite measures system performance in an architecture-neutral, representative, and reproducible manner. The aim is to create a level playing field for competition that drives innovation, performance and energy efficiency for the entire industry. The published results provide critical technical information for customers purchasing and tuning AI systems.

“We thank Meta, Shopify and Ultralytics for their substantial collaboration with us in implementing these changes to the MLPerf Inference benchmark suite and for contributing their datasets, task definitions and expertise,” said Miro Hodak, senior member of the technical staff at AMD and co-chair of the MLPerf Inference Working Group. “These partnerships were essential to ensure that testing included scenarios and workloads that represented the current state of the industry.”

“MLPerf Inference benchmarks play a critical role in driving transparency and accountability in the AI industry,” said Glenn Jocher, CEO and founder of Ultralytics. “At Ultralytics, rigorous, reproducible benchmarking is central to how we develop and validate our Ultralytics YOLO models – so developers and organizations can make informed decisions about real-world performance. We are proud to be part of an ecosystem that keeps the entire field at a higher level.”

“Commerce is one of the most complex domains in AI, but researchers rarely have data that reflects this complexity,” said Kshetrajna Raghavan, principal engineer, Applied ML at Shopify. “Shopify is uniquely positioned to address this because it sits at the intersection of millions of sellers and billions of products. Sharing this taxonomy will help the entire field evolve.”

New tools for filers and consumers

With Inference 6.0, submitters have the option to use a newly available harness to complete benchmark tests. The new system, LoadGen++, allows LLMs to run with a server-style software stack, which is familiar to typical implementations today. “LoadGen++ is a significant upgrade over its predecessor and represents a significant investment by MLCommons that will allow us to remain agile as we continue to produce benchmark tests that follow the state of the art,” said Han.

In addition, the results of Inference 6.0 can be viewed in a new online dashboard https://mlcommons.org/visualizer on the MLCommons site. The dashboard brings new levels of interactivity when viewing results, including advanced filtering and custom performance graphs.

Large-scale multi-node systems are receiving attention

The submissions for Inference 6.0 show that technology vendors are looking to demonstrate the performance of scaled-up, multi-node systems running real-world inference workloads. This round recorded a new high for multi-node system submissions, up 30% from the Inference 5.1 benchmark six months ago. Additionally, 10% of all submitted systems in Inference 6.0 had more than ten nodes, compared to just 2% in the previous round. The largest system submitted in Inference 6.0 contained 72 nodes and 288 accelerators, quadrupling the number of nodes in the largest system in the previous round.

“As more AI applications have gone into production and become widely available, the demand for large-scale, high-performance systems to run them has increased,” says Hodak. “At the same time, multi-node systems present a unique set of technical challenges beyond those of single-node systems, requiring configuration and optimization of system architectures, network connections, data storage, and software layers. Stakeholders are eager to address these challenges and run inference workloads at scale.”

The AI community continues to embrace and invest in MLPerf Inference

The MLPerf Inference 6.0 benchmark received entries from a total of 24 participating organizations: AMD, ASUSTeK, Cisco, CoreWeave, Dell, GATEOverflow, GigaComputing, Google, Hewlett Packard Enterprise, Intel, Inventec Corporation, KRAI, Lambda, Lenovo, MangoBoost, MiTAC, Nebius, Netweb Technologies India Limited, NVIDIA, Oracle, Quanta Cloud Technology, Red Hat, Stevens Institute of Technology and Supermicro.

“I would like to welcome our first applicants, Inventec Corporation, Netweb Technologies India Limited and Stevens Institute of Technology,” said Han. “The AI ecosystem is large and diverse, and continues to grow and evolve rapidly. On behalf of MLCommons, I would also like to thank our members, our contributors, and our partners, including Meta, Shopify, and Ultralytics, for working with us to build and advance the most comprehensive and relevant performance benchmark suite for AI inference. Together, we ensure stakeholders in our community have valuable, actionable information to help them make better decisions.”

View the results

To view the results for MLPerf Inference v6.0, visit the benchmark results dashboard https://mlcommons.org/visualizer.

About MLCommons

MLCommons is the global leader in AI benchmarking. MLCommons is an open engineering consortium supported by more than 130 members and affiliates and has a proven track record of bringing together academia, industry and civil society to measure and improve AI. The foundation for MLCommons started with the MLPerf benchmarks in 2018, which quickly grew into a set of industry metrics for measuring machine learning performance and promoting transparency in machine learning techniques. Since then, MLCommons has continued to use collective engineering to develop the benchmarks and metrics needed for better AI – ultimately helping to evaluate and improve the accuracy, security, speed and efficiency of AI technologies.

For more information about MLCommons and details on how to become a member, visit MLCommons.org or email participation@mlcommons.org.

Press inquiries: contact press@mlcommons.org

Source link

What's Hot

The movement centers on stablecoin payments as the layer 2 boom loses momentum

XRP price to see violent, discontinuous price revisions and $10 could be just the beginning

XRP Price Takes Another Hit as Bitcoin-Led Weakness Spreads Across Crypto

The movement centers on stablecoin payments as the layer 2 boom loses momentum

Cardano partners with Token Terminal to improve access to on-chain data

France intercepts sanctioned tanker Tagor linked to Russian oil trade

XRP to be included in Bitwise’s first-ever $259 million tokenized fund, CEO speaks out

XRP to be included in Bitwise’s first-ever $259 million tokenized fund, CEO speaks out

Bank of England stablecoin caps may choke the UK’s pound-token market before launch

Europe is actively trying to stop the takeover of the dollar stablecoin

How a disputed $1 billion claim became a powerful weapon against prediction markets

The US says it has captured Iran’s cryptocurrency with a $1 billion seizure

Hyperliquid’s HYPE rally is bigger than a new all-time high

XRP Price Takes Another Hit as Bitcoin-Led Weakness Spreads Across Crypto

Bitcoin’s Plunge to $65,000 Leaves Traders Paying to Protect Against a Drop to $50,000

Bitcoin price bursts lower, opening the door to more pain

Banks have pushed Congress to destroy stablecoin proceeds with the CLARITY Act

Goldman Sachs specialist outlines the stock sector he’s excited about amid the historic boom in tech stocks

Williams %R Indicator in Crypto: How to Use %R in Crypto Trading

What Is a Semi-Fungible Token? SFT Crypto Explained

Pennant Chart Pattern in Crypto: How Bullish and Bearish Pennants Work

Head and Shoulders Crypto Pattern: How It Works and How to Read It

Crypto Triangle Patterns: How to Spot and Read Them

3D Systems announces the pricing of a larger public offering valued at $50 million

Phaos Technology Holdings (Cayman) Limited provides updated response to unusual market action

Vitalik wil dat de prijscrashes van DeFi niet langer automatische liquidaties veroorzaken

Ajay Rajan joins Protean eGov Technologies Ltd. as Managing Director and Chief Executive Officer

Altcoins could rise 25% to 30% if Bitcoin’s price rises above this crucial level

Will it shoot by $ 125,000 or go back to $ 110,000?

Avalanche DeFi protocols hit 9-month highs amid AVAX token rallies

‘Fighting for DeFi’: Top Decentralized Exchange Uniswap After Receiving Wells Notice from SEC

Dogwifhat (WIF) is up 21% as analysts see the $4.5 price tag

Bitcoin price will reflect the same movements from 2023, here’s what it means

Bitcoin is pumping! (But could this just be the lull before the collapse?)

Top Insights

The movement centers on stablecoin payments as the layer 2 boom loses momentum

XRP price to see violent, discontinuous price revisions and $10 could be just the beginning

XRP Price Takes Another Hit as Bitcoin-Led Weakness Spreads Across Crypto

What's Hot

MLCommons publishes new MLPerf Inference v6.0 benchmark results

Related Posts

Subscribe to Updates