What it is
A living database of common tasks that says — with evidence — whether a leading local LLM (like Gemma) can safely do each one, which local models clear it, and what to use instead when none can.
What it is
A living database of common tasks that says — with evidence — whether a leading local LLM (like Gemma) can safely do each one, which local models clear it, and what to use instead when none can.
How it works
Each task carries a verdict (safe for local / local with a check / needs a bigger model / needs more data), backed by eval runs and manipulation-resistant practitioner votes — never a bare leaderboard rank.