How to Build a Skill Your Agent Will Actually Use

Skills are now the open format every major coding agent reads. The same SKILL.md works in Claude Code, Codex, Cursor, Copilot, Gemini CLI, and 30 others. The hard part is not the spec. It is writing one the model actually picks up and follows.

A skill is a folder. One Markdown file with a few lines of YAML at the top. That is the entire spec.

The hard part is not the spec.

I have been spending a lot of time inside Claude Code and Codex lately. I keep landing on the same realisation. The agent is only as good as the context you hand it. Skills are how you hand it that context, in a format every major coding agent now reads.

If you have not run into the format yet, the structure is small enough to copy from memory.

my-skill/
├── SKILL.md          # required: metadata + instructions
├── scripts/          # optional: executable code
├── references/       # optional: docs the skill loads on demand
└── assets/           # optional: templates, lookup data, images

SKILL.md is the only required file. It needs YAML frontmatter with two fields, name and description. The body is plain Markdown. Everything else is optional. The full spec lives at agentskills.io and you can read it in a coffee.

What makes the format interesting is what happens at runtime. The agent does not load every skill into context. It loads only the names and descriptions, then pulls the full SKILL.md only when it decides a skill is relevant. Anthropic call this progressive disclosure. It means you can have two hundred skills installed without paying a context tax for the one hundred and ninety-nine you are not using right now.

The format started at Anthropic. It is now an open standard adopted by Claude Code, OpenAI Codex, Gemini CLI, GitHub Copilot, Cursor, VS Code, OpenHands, Goose, and roughly thirty other agents. A SKILL.md you write today works in all of them. That cross-tool portability is the part most people sleep on.

So the spec is small. The format is portable. Where does it go wrong?

Almost always in the writing. Five things I have picked up the hard way, and from reading Anthropic's own skill-creator skill more times than I would like to admit.

One. The description is the trigger. Write it pushy.

The agent decides whether to even consider your skill from the description alone. If you write "Helps with PDFs", you will get nothing. If you write "Extracts text and tables from PDF files, fills PDF forms, and merges multiple PDFs. Use when working with PDF documents or when the user mentions PDFs, forms, or document extraction", the trigger fires. The skill-creator team's own phrasing is to make descriptions "a little bit pushy". They are not joking.

Two. Imperative beats heavy-handed MUSTs.

If you find yourself typing ALWAYS or NEVER in all caps, you are probably overcorrecting. Today's models have decent theory of mind. Tell them why something matters and they will generalise. Tell them MUST and they will follow the letter without the spirit. The official guide flags all-caps absolutes as a yellow flag for exactly this reason.

Three. Do not overfit to your test cases.

I have written skills that worked perfectly on the three examples I had open. The moment a fourth example showed up, the skill made things worse. The whole point is generality. If your test cases become constraints, the skill is broken.

Four. Bundle scripts that get re-invented.

If you watch a model doing the same thing in the same way three times in a row, that is a script. Move it to scripts/ and reference it from SKILL.md. The agent will run it instead of rewriting it from scratch every invocation.

Five. Read transcripts, not just outputs.

The output told me everything was fine. The transcript showed the model burning four turns chasing a dead end my SKILL.md had planted. Half my edits since have been deletions, not additions. Remove anything that is not pulling its weight.

I tested all of this last week. I installed claude-blog, an open-source skill ecosystem for blog content, and pointed it at the seven posts on this site. The audit it ran caught four real issues I had not flagged in my own pass: a missing meta description on one post, zero internal links across the entire corpus, two clusters of stale content, no answer-first leads. It took me an hour to fix what the audit found. The skill itself is twenty-eight folders and one orchestrating SKILL.md.

That was the part that surprised me. The skill did not need to be smart. It needed to be specific. The format gave it permission to be both.

If you are going to build one, start small. One folder, one SKILL.md, one description that is unambiguous about when to trigger. Run it on a real task. Read the transcript. Cut what is not earning its place. Add a script the third time the model writes the same helper inline. (It's the same loop I followed rebuilding this site without being a developer. Build, watch, prune.)

The format is open. The toolchain is broad. The hardest part is taste.

If you are looking for a starting point, Anthropic's skill-creator is open source and worth reading end to end before you write your first one. The agentskills.io specification is the other tab to keep open. And if you write Markdown for the agent, not for yourself, you will get further than you think.

What is the best skill you have installed? I am filling out my own ~/.claude/skills/ this month and the good recommendations beat anything you would find on a list.