Benchmarking AI News & Research

🤗 AI Labs Hugging Face Blog 6 min read

Adding Benchmaxxer Repellant to the Open ASR Leaderboard

#asr #benchmarking #datasets

🕐 7 days ago

Read →

📥 Newsletters Import AI 13 min read

Import AI 453: Breaking AI agents; MirrorCode; and ten views on gradual disempowerment

Welcome to Import AI, a newsletter about AI research. Import AI runs on arXiv and feedback from readers. If you’d like to support this, please subscribe. A shorter issue than…

#AI agents #code generation #software engineering

🕐 30 days ago

Read →

📥 Newsletters Import AI 16 min read

ImportAI 449: LLMs training other LLMs; 72B distributed training run; computer vision is harder than generative text

Welcome to Import AI, a newsletter about AI research. Import AI runs on arXiv and feedback from readers. If you’d like to support this, please subscribe. Subscribe now Can LLMs…

#LLMs #post-training #AI autonomy

🕐 a month ago

Read →

🐍 Newsletters AI Snake Oil 10 min read

New Paper: Towards a science of AI agent reliability

Quantifying the capability-reliability gap

#AI agents #reliability #measurement

🕐 2 months ago

Read →

📥 Newsletters Import AI 12 min read

Import AI 446: Nuclear LLMs; China’s big AI benchmark; measurement and AI policy

Welcome to Import AI, a newsletter about AI research. Import AI runs on arXiv and feedback from readers. If you’d like to support this, please subscribe. Subscribe now Want to…

#AI Policy #AI Measurement #Governance

🕐 2 months ago

Read →