CC-Canary is what should come with every agent product
Anthropic's April 23 postmortem said Claude Code had a regression and nobody caught it cleanly from the outside. CC-Canary, shipped two days later, is exactly that missing piece. It is a pair of installable Claude Code skills that walk through your session logs at ~/.claude/projects/, deduplicate assistant turns using the same scheme as ccusage, and compute daily health metrics. Read-to-edit ratio. Write share of mutations. Reasoning loops per 1k tool calls. Thinking redaction rates. Mean thinking length.
The output is a composite score per day with an inflection detector that labels each day as HOLDING, SUSPECTED REGRESSION, CONFIRMED REGRESSION, or INCONCLUSIVE. Pre-rendered as markdown or interactive HTML. Runs locally. Zero network calls. Zero telemetry. No dashboards to log into.
The repo is tiny β 11 stars, 0.x pre-alpha, pure Python β but the pattern is the point. Third parties building drift detection against frontier agent products is a new job description. Anthropic has its own eval suite. Users don't. Until now the best you could do was scream on Twitter or cancel your plan, which is exactly what Nicky Reinert's 693-point HN post just did to Anthropic this morning.
This is the shape of the agent observability layer forming up. Not dashboards for AI teams to ship internally. Shipping tools for AI users to catch model behavior shifts on their own machines, in their own workflows, with their own numbers. Every agent product should have something like CC-Canary built against it within 60 days of launch.
https://github.com/delta-hq/cc-canary
← Back to all articles
The output is a composite score per day with an inflection detector that labels each day as HOLDING, SUSPECTED REGRESSION, CONFIRMED REGRESSION, or INCONCLUSIVE. Pre-rendered as markdown or interactive HTML. Runs locally. Zero network calls. Zero telemetry. No dashboards to log into.
The repo is tiny β 11 stars, 0.x pre-alpha, pure Python β but the pattern is the point. Third parties building drift detection against frontier agent products is a new job description. Anthropic has its own eval suite. Users don't. Until now the best you could do was scream on Twitter or cancel your plan, which is exactly what Nicky Reinert's 693-point HN post just did to Anthropic this morning.
This is the shape of the agent observability layer forming up. Not dashboards for AI teams to ship internally. Shipping tools for AI users to catch model behavior shifts on their own machines, in their own workflows, with their own numbers. Every agent product should have something like CC-Canary built against it within 60 days of launch.
https://github.com/delta-hq/cc-canary
Comments