OpenAI GPT 4o ranked as best AI model for writing Solidity smart contract code by IQ

Receive, manage and grow your crypto investments with Brighty

SolidityBench by IQ has been launched as the first leaderboard that evaluates LLMs in Solidity code generation. Available on Hugging faceit introduces two innovative benchmarks, NaiveJudge and HumanEval for Solidity, designed to assess and rank the proficiency of AI models in generating smart contract code.

Developed by IQs BrainDAO As part of the upcoming IQ Code suite, SolidityBench serves to refine and compare their proprietary EVMind LLMs against generalist and community-created models. IQ Code aims to provide AI models tailored to generate and control smart contract code, meeting the growing need for secure and efficient blockchain applications.

As IQ said CryptoSlateNaiveJudge offers a new approach by tasking LLMs with implementing smart contracts based on detailed specifications derived from audited OpenZeppelin contracts. These contracts provide a gold standard for correctness and efficiency. The generated code is evaluated against a reference implementation using criteria such as functional completeness, compliance with Solidity best practices and security standards, and optimization efficiency.

The review process uses state-of-the-art LLMs, including several versions of OpenAI’s GPT-4 and Claude 3.5 Sonnet as unbiased code reviewers. They review the code based on strict criteria, including implementing all major functionalities, handling edge cases, error management, correct syntax usage, and overall code structure and maintainability.

Optimization considerations such as gas efficiency and storage management are also evaluated. Scores range from 0 to 100 and provide a comprehensive assessment of functionality, security, and efficiency, reflecting the complexity of smart contract professional development.

Which AI models are best for developing solid smart contracts?

Benchmark results showed that OpenAI’s GPT-4o model achieved the highest overall score of 80.05, with a NaiveJudge score of 72.18 and HumanEval for Solidity pass rates of 80% on pass@1 and 92% on pass@3 .

Interestingly, newer reasoning models like OpenAI’s o1-preview and o1-mini were beaten into first place, with scores of 77.61 and 75.08 respectively. Models from Anthropic and XAI, including Claude 3.5 Sonnet and Grok-2, showed competitive performance with overall scores hovering around 74. Nvidia’s Llama-3.1-Nemotron-70B scored the lowest in the top 10 with 52.54.

SolidityBench scores for LLMs (Hugging Face)

Per IQ, HumanEval for Solidity adapts OpenAI’s original HumanEval benchmark from Python to Solidity, and includes 25 tasks of varying difficulty. Each task includes corresponding tests compatible with Hardhat, a popular Ethereum development environment, which allows accurate compilation and testing of the generated code. The evaluation metrics, pass@1 and pass@3, measure the model’s success on first attempts and over multiple attempts, providing insight into both accuracy and problem-solving ability.

Objectives of using AI models in smart contract development

By introducing these benchmarks, SolidityBench aims to promote the AI-enabled development of smart contracts. It encourages the creation of more advanced and reliable AI models and provides developers and researchers with valuable insights into the current capabilities and limitations of AI in Solidity development.

The benchmarking toolkit aims to advance IQ Code’s EVMind LLMs and also sets new standards for the development of AI-enabled smart contracts in the blockchain ecosystem. The initiative hopes to address a critical need in the industry, where demand for secure and efficient smart contracts continues to grow.

Developers, researchers, and AI enthusiasts are invited to explore and contribute to SolidityBench, which aims to drive the continued refinement of AI models, advance best practices, and advance decentralized applications.

Visit the SolidityBench ranking on Hugging Face for more information and to start benchmarking Solidity generation models.

Mentioned in this article

Source link

What's Hot

FLOKI’s price is benefiting from the Coinbase hype, but this is why not everything is ready yet

Crypto prices are cool as Fed Chairman Jerome Powell strikes an ambiguous tone on future monetary policy choices

Bitcoin MVRV Reaches Critical Profit-Taking Threshold – What Does This Mean?

Dora Vota Revolutionizes Blockchain Privacy with MACI Protocol

Ghana uses Blockchain for carbon credit trading

Privacy project Nym is building an ‘NSA-proof’ VPN

Hadron by Tether Platform brings simplified asset tokenization to the mass market

ICP identity protocol DecideID is launching on Solana to eliminate any KYC need for DeFi

Crypto prices are cool as Fed Chairman Jerome Powell strikes an ambiguous tone on future monetary policy choices

SEC Chairman Gary Gensler’s conduct cannot be labeled ‘good faith errors,’ says Tyler Winklevoss

Prominent US Public Prosecution Service wants to reduce focus on crypto cases, top official says: report

Notorious crypto hacker behind nearly $11,000,000,000 Bitfinex exploit sentenced to five years in prison

Hong Kong watchdog warns against foreign entities posing as crypto ‘banks’

XRP ready for a $100 price target, here’s why

JPMorgan Chase, Morgan Stanley and Citi say the bull market isn’t over yet – here’s their outlook

Analyst Says Large-Cap Memecoin Poised to Explode Over 100%, Updates Outlook for XRP and Solana

Binance Launchpool announces support for a new native token from the decentralized issuer of Fiat Stablecoin

WIF dips below $3,582, raising fears of further losses

The Safest Way to Store Cryptocurrency in 2024

BTC vs SATs: How Many Satoshis Are in a Bitcoin?

Dogecoin (DOGE) Price Prediction 2024 2025 2026 2027

Ethereum Classic (ETC) Price Prediction 2024 2025 2026 2027

The Graph (GRT) Price Prediction 2024 2025 2026 2027

Connect beyond borders with .WIFI 🛜

ICP Identity Protocol Decision to launch on Solana, eliminating any KYC need for DeFi

Introducing .XMR domains: the future of privacy and digital identity

Linea Association unveils plan for decentralized governance with LINEA token

Bitcoin Below $26.8K – Everything You Need To Know

Can Rocket Pool Boost Polygon in the DeFi Sector

NFT sales reached $95.42 million as Ethereum dominated, but market participation declined

TRX Extends Rally As Bulls Eye $0.08

$311 Million in NFT Sales Amid Fluctuating Blockchain and Collections Performance

Mysterious Shiba Inu Whale Abruptly Moves 4,254,147,213,294 SHIB – Here’s Where the Crypto Is Going

SEC Asks Federal Judge To Ignore Court Ruling In XRP Lawsuit, Hints At Possible Appeal: Report

Top Insights

FLOKI’s price is benefiting from the Coinbase hype, but this is why not everything is ready yet

Crypto prices are cool as Fed Chairman Jerome Powell strikes an ambiguous tone on future monetary policy choices

Bitcoin MVRV Reaches Critical Profit-Taking Threshold – What Does This Mean?

What's Hot

OpenAI GPT 4o ranked as best AI model for writing Solidity smart contract code by IQ

Which AI models are best for developing solid smart contracts?

Objectives of using AI models in smart contract development

Mentioned in this article

Related Posts

Subscribe to Updates