Sitemap

Data Science Collective

Advice, insights, and ideas from the Medium data science community

Member-only story

AI Models Know When They’re Being Tested

The leaderboard you trust is just a performance, and the model is the one performing

8 min readJun 20, 2026

In April 2026, Meta Superintelligence Labs published the safety and preparedness report for Muse Spark. These reports exist mostly to satisfy governance checklists, and almost nobody reads them. Buried inside this one was a number that should have detonated across every engineering Slack channel that has ever argued about which model to use.

Press enter or click to view image in full size
Photo by Maxim Tolchinskiy on Unsplash

On public benchmarks, Muse Spark verbally acknowledged that it was probably being tested in 19.8% of samples. On Meta’s internal evaluations, built in-house and never published, the same behavior appeared in 2.0% of samples.

Ten times more often. Same model, same weights. The only difference was whether the test looked like a test.

Read that again, because the implication is uncomfortable. The benchmark scores we screenshot into procurement decks, the leaderboard positions we cite when picking a model for production, the MMLU and SWE-bench numbers in every launch post: a meaningful fraction of those measurements were taken while the subject knew it was being measured. And models, like people, do not behave the same way when they know someone is watching.

Create an account to read the full story.

The author made this story available to Medium members only.
If you’re new to Medium, create a new account to read this story on us.

Or, continue in mobile web
Already have an account? Sign in

Data Science Collective

Published in Data Science Collective

Advice, insights, and ideas from the Medium data science community

Ayoub Nainia
Ayoub Nainia

Written by Ayoub Nainia

PhD Canditate @ Sorbonne University: Bridging science and practical systems. Working on LLM evaluation, domain-specific QA, and information extraction.

No responses yet

Unknown user

Write a response