The quality layer for AI skills, agents, and workflows.

Audit any skill, agent, or tool setup for quality, efficiency, and safety. One artifact or the whole repo. Same input, same score.

audit / sample-skill
0
EXCELLENT
Overall score
8 dimensions · 99 checks · repeatable
100 4(1H) 5(2M) = 91
1 High2 Medium
1/11 skills missing allowed-tools field
Token bloat in references/ (~400 tokens)
Outdated tool definition in 2 skills

Registries list what exists. Stars rank popularity. Scanners check for malware.
Nobody checks whether the thing is actually well built.

We're the layer that does.

01For teams & internal builders

Make sure your setup performs.

Treat your AI stack like the rest of your code — reviewed, measured, repeatable.

findingsyour-repo/
broken ref: scripts/foo.py
4/11 skills missing allowed-tools
−1,540 tokens redundant
Broken pathsToken wastePrivate by default
02For creators sharing skills and agents

Ship something you're proud of.

Pre-publish review, a verified badge, and a public profile once you're ready.

91
Ready to ship
agentquality.io/b/your-skill
![Agent Quality](…/b/your-skill.svg)
Verified badgeRe-audit on releaseNo repo? Upload a zip
03For people installing what they didn't write

Check before you trust.

You can't verify every shared skill by hand. We do.

Buyer's guide
Good for
solo devs · prototyping
Not for
team infra · prod
Rough
3 broken refs · token bloat
Risk surfaceBuyer's GuideBrowse the directory

Four questions. Every audit answers all of them.

99 checks. 8 dimensions. Repeatable.

VALID

Is it valid?

We check: Structure, metadata, spec compliance — name ≤ 64 chars, description ≤ 1024 chars, required fields present.

Why it matters

Progressive disclosure means only frontmatter loads initially4. Get it wrong and Claude Code may never discover or load the skill. Hours of work that never fire.

EFFICIENT

Is it efficient?

We check: Token cost, redundancy, bloat, reference hygiene.

Why it matters

Every byte loaded burns context and money. Bloated skills crowd out the real work.

SAFE

Is it safe?

We check: Permissions declared, tools justified, risk surface visible, no unexpected access.

Why it matters

Snyk found prompt injection in 36% of audited skills and 1,467 malicious payloads in supply-chain studies3. This matters.

SOLID

Is it solid?

We check: Broken file paths, missing dependencies, dead references.

Why it matters

The #1 silent failure: a skill references scripts/foo.py that doesn't exist. Looks fine until it runs.

How it works

Standardized. Repeatable. Explained.

01

Submit

Paste a GitHub URL, a single skill file, a repo path, or upload a zip.

02

Review

Checks run per-element across every skill, agent, and script — not a single blanket pass.

03

Score

0–100 with a breakdown, findings pinned to where they live, and a Buyer's Guide verdict.

Paid audits include ready-to-paste fix prompts you can hand straight to Claude Code.

Your whole repo, mapped

Every skill, every tool, every external link — one frame.

See risk, redundancy, and broken paths at a glance.

architecture · sample-repo
live
Simple
Workflow
tool useexternal call
Hover any node to see its connections
Why re-audit

Even a great skill built today gets stale fast.

Claude Code ships new features every few weeks5. Skills that don't adapt become the slow, expensive, deprecated way.

New capabilities
Features and patterns your skill should start using.
Deprecations
Outdated tool definitions, legacy fields, old patterns.
Claim drift
What your skill says vs. what the platform now supports.

Every audit ends with a verdict you can act on.

No user reviews. No upvotes. A Buyer's Guide derived from the audit itself.

buyer's guide / sample-skill
Good for you if
rapid prototyping · solo devs · Python codebases
Not for you if
production pipelines · multi-language repos · team-shared infra
Where it's rough
broken file refs (3) · undeclared tools on 4/11 skills · token bloat in 2 files
And one more thing

Honesty.

Five-star READMEs. “Miracle” tools with no substance. Every claim gets checked against the code.

Verified
Claim supported by code evidence
Outdated
Was true, no longer accurate
Aspirational
Partially true, overstated
Misleading
Contradicted by evidence
Marketing
No evidence either way

Make sure your AI tools are actually ready.

One audit, under a minute. Specific issues, pointed fixes.

Free during closed beta · GitHub login