Agentic Spec Writing: How I Use Claude and GitHub CLI to Turn Fuzzy Ideas into Phased Issues

December 03, 2025 • tech

ai-agentsclaudecursorgithub-clispec-writingprocesstutorpro

Agentic Spec Writing with Claude and GitHub CLI

Just a couple months ago, I was still doing a lot of “vibe coding.” I’d get a fuzzy feature idea in my head, then vomit words straight into the editor, and hope I would remember what I was thinking when I first prompted.

Today, that’s completely flipped.

Every meaningful feature for TutorPro now starts the same way: with an AI agent that interviews me, crawls my codebase and docs, cross‑checks my standards, and then hands me a stack of small, well‑formed GitHub issues—each sized to be implemented in roughly 3–8 hours (human time).

The star of that show is my custom spec-writer command, built on top of Claude Code's slash command system and a set of "skills" that know how to interact with my repo, docs, and GitHub CLI.

This post is the deep dive on that spec-writing phase: how it works, what it feels like to use, and why it’s the thing that lets me move with both precision and velocity instead of just shipping faster chaos. Also: fair warning, there will be dad jokes. I have teenagers and a codebase; the puns are compulsory – the more embarassing, the better.

Why Spec Writing Became the Bottleneck

When I began approaching AI as an entire distributed dev team—cloud agents, local agents, CI bots, and more (see this deep dive on agentic dev workflows)—I hit a classic scaling snag: my specs just weren’t cutting it.

  • I’d lump multiple concerns into one “feature.”
  • I’d under‑spec error handling, logging, and edge cases.
  • I’d forget to think through data model changes, security rules, or test strategy up front.

That was survivable when I was the only “developer” in the loop. But as soon as I started delegating work to multiple agents and tools, poorly defined specs translated directly into:

  • Agents timing out on giant, ambiguous tasks.
  • PRs that technically “worked” but ignored standards.
  • A growing sense that I was the bottleneck—not the tools.

The fix was simple but non‑negotiable: every feature gets decomposed into small, spec‑driven phases that respect my standards and constraints. That’s exactly what my spec-writer command enforces.

Instead of trying to hold everything in my head, I sit down with an AI spec writer that refuses to let me hand‑wave the hard parts. It drags the full picture out of me—architecture, UI, data, tests, logging, email rules, Firestore collections—before anyone (human or agent) writes a line of code.

I used to “ship and pray.” Now I “spec and pray less.”

Tooling & Prerequisites (But Vendor‑Agnostic)

Here’s the concrete stack I’m using today for this workflow:

  • Claude Code in Cursor with custom slash commands defined under .claude/commands/tutorpro/.
  • Claude skills under .claude/skills/ that encapsulate GitHub operations, refactoring discipline, database design, and UI testing.
  • A GitHub account and the GitHub CLI (gh) for creating issues and wiring everything into my repo's workflow.
  • A big set of project standards under docs/standards/ and meta‑guidance in AGENTS.md that define how everything should work.

But while the implementation is very "TutorPro + Claude Code + GitHub," the pattern itself is portable:

  • Any LLM with custom tools/macros can play the "spec-writer" role (ChatGPT with custom GPTs or actions, Gemini with Tools, Sourcegraph Cody, etc.).
  • Any issue tracker with an API or CLI can stand in for GitHub (Jira, Linear, ClickUp, YouTrack).
  • Any well‑organized standards corpus (docs repo, Notion wiki, Confluence) can provide the constraints.

So think of this as:

LLM-powered spec assistant + standards corpus + CLI automation → phased issues

The specifics are mine. The pattern can be yours.

Minimal Prerequisites Checklist

To replicate the spirit of this setup, you need:

  • An LLM that can run custom commands/tools
  • A source of truth for standards
    • For me: AGENTS.md plus docs/standards/global|frontend|backend|testing/*.
  • A programmable issue tracker
    • I use GitHub + gh issue create (guided by .claude/skills/github.md).
  • A willingness to let the AI interrogate you
    • This is a collaboration, not a magic trick. You bring context; the agent brings structure.

Inside the spec-writer Command

The spec-writer command lives in .claude/commands/tutorpro/spec-writer.md. It’s not a single prompt—it’s a full workflow.

At a high level, it runs through seven stages:

  1. Initial context gathering
  2. Feature decomposition into phases
  3. Interactive question round per phase
  4. Specification synthesis
  5. GitHub issue template creation
  6. GitHub issue creation via gh
  7. Summary of phases, dependencies, and estimates

Let me unpack those the way the command does.

Step 1: Initial Context Gathering

When I trigger /tutorpro:spec-writer with Claude or /tutorpro/spec-writer in Cursor chat, the command doesn’t immediately start writing specs. It first:

  • Asks me to classify the work:
    • New feature, enhancement, bug fix, refactor with behavior changes, etc.
  • Uses the Explore agent to find relevant files:
    • Components in src/, Cloud Functions in functions/, or tests in tests/ or functions/test/.
  • Inspects the live UI using the Chrome DevTools MCP:
    • Navigates to http://localhost:5173 (once I have npm run dev running).
    • Asks which role to log in as (tutor, parent, student, admin).
    • Takes snapshots and screenshots of the relevant parts of the app.
  • Reviews project context:
    • AGENTS.md for global rules.
    • docs/standards/global/tech-stack.md for platform constraints.
    • docs/standards/* for the right coding, API, validation, and testing rules.

The command file literally instructs the agent to inspect the UI, note existing patterns, and look for inconsistencies to avoid. It’s not guessing from vibes; it’s reading the same source of truth I would.

Step 2: Feature Decomposition into 3–8 Hour Phases

Next, the command forces a decomposition step.

It has built‑in guidance around what makes a good phase:

  • Roughly 3–8 hours of focused work.
  • Independently testable.
  • Provides incremental value on its own.
  • Has clear boundaries and no circular dependencies.

It also provides decomposition patterns right in the spec:

  • Data → UI → Features (schema → basic CRUD → advanced flows).
  • Foundation → Enhancements → Polish (MVP → UX improvements → edge cases/perf).
  • Role‑based Functional Spec (admin first, then tutors, then parents/students).

The spec-writer uses those patterns to propose phases, then asks me to confirm or adjust them before proceeding.

Step 3: Interactive Question Phase (Per Phase)

Once phases are sketched, the command goes phase‑by‑phase and asks the right questions only for this phase:

  • What does this phase accomplish, specifically?
  • Who can access it (roles, permissions, authz rules)?
  • Where does it live in the UI (routes, components, navigation)?
  • What data is involved (inputs/outputs, validation, Firestore collections)?
  • What happens when it fails (error surfaces, logging, retries)?
  • How do we test it (critical scenarios, edge cases, roles)?

The command also knows when to pull in specialized skills:

  • Database changes? Use .claude/skills/database-design.md.
  • Refactor? Use .claude/skills/refactoring.md.
  • UI? Use .claude/skills/testing-user-interface.md.
  • GitHub operations? Use .claude/skills/github.md. (I am exploring Github MCP as an alternative)

This is where the agent forces me to think about the whole picture:

“Okay Jeremy, if we add this Firestore collection, did you remember security rules, indexes, seed scripts, and schema docs?”

Step 4: Specification Synthesis

For each phase, the command synthesizes a concise spec with:

  • A short description of what this phase delivers.
  • A focused user story.
  • 3–6 concrete requirements.
  • Explicit notes on permissions, error handling, and UI states (if applicable).
  • Initial acceptance criteria and test scenarios.

It intentionally keeps each spec small and sharp. If a phase is bloated or ambiguous, the command tells me to break it down again.

Step 5–6: GitHub Issue Templates and gh issue create

Once the specs look right, the command moves into GitHub‑mode, using patterns from .claude/skills/github.md to:

  • Turn each phase spec into a GitHub issue body.
  • Include:
    • Summary
    • Requirements
    • Testing checklist
    • Standards compliance checklist
    • Related issues / dependencies
  • Generate CLI commands to create those issues via gh.

For example, a phase might become something like this under the hood:

gh issue create \
  --title "Phase 1: Implement basic session scheduling data model" \
  --label "phase-1,feature/scheduling,priority/medium" \
  --body "$(cat <<'EOF'
## What This Phase Delivers
Implements the core Firestore collections, indexes, and security rules needed for basic session scheduling, without any UI.

## Requirements
1. Define Firestore collections and document shapes for sessions.
2. Add required entries to functions/functions/data-operations.js COLLECTIONS.
3. Add or update Firestore security rules and composite indexes.
4. Update docs/reference/database/schema.md with the new structures.

## Testing
- [ ] Emulator tests pass for read/write scenarios.
- [ ] Security rules verified for tutor/parent/student roles.
- [ ] No existing tests broken.

## Standards Compliance
- [ ] Follows docs/standards/backend/models.md
- [ ] Follows docs/standards/backend/queries.md
- [ ] Follows docs/standards/global/validation.md

> Generated with Claude Code spec-writer
EOF
)"

The spec-writer doesn’t just hand me text; it generates and runs GitHub CLI commands like this automatically via Claude’s tools, so issues get created for me while I watch.

Step 7: Summary & Dependency Graph

Finally, the command presents a dependency‑ordered list of phases:

  • Phase 1 (Data & Security) → URL
  • Phase 2 (Basic UI) → URL
  • Phase 3 (Advanced Flows) → URL

It also highlights dependencies between them (e.g., “Phase 2 requires Phase 1”) and gives a rough total estimate window based on 3–8 hour chunks.

At that point, I’ve gone from “I have a vague idea” to “I have a mini roadmap in GitHub with clear, testable units of work.” That’s where the magic really starts.

A Quick Conversation Snippet

Here’s a (lightly anonymized) example of how this feels in practice:

Me: /tutorpro/spec-writer — I want to add a new workflow that lets tutors propose session times and parents confirm them.
Spec Writer: Got it. Is this a new feature, an enhancement, or a refactor with behavior changes?
Me: New feature.
Spec Writer: Great. I’ll scan the existing scheduling and messaging code, plus your standards. Is the dev server running at http://localhost:5173 so I can inspect the UI?
Me: Yep.
Spec Writer: Which role should I log in as to see the relevant flows? (tutor, parent, student, admin)
Me: Tutor.
Spec Writer: Perfect. I see existing calendar patterns in the tutor dashboard and invitation flows I can mirror. I propose 3 phases: data & security, basic proposal/confirmation UI, and reminders & notifications. Does that match how you’re thinking about it?
Me: Yes, but let’s keep reminders out of scope for now.
Spec Writer: Cool, we’ll move reminders to a future phase. Also, I promise not to overbook your calendar—unlike your kids’ weekend activities.

It’s not “ask a single prompt, get perfect tickets.” It’s a guided interview that bakes your standards and context into the spec.

From Fuzzy Idea to Phased GitHub Issues

Let me walk through a typical session, end‑to‑end.

1. Start with the Fuzz

I sit down with a vague thought, like:

“Parents should be able to do X more easily.”

Historically, that’s where I’d jump straight into the editor. Now I:

  • Open Cursor in the TutorPro repo.
  • Trigger /tutorpro/spec-writer.
  • Paste in my high‑level idea and any relevant constraints I’m already aware of (roles, performance, rollout risk).

2. Let the Agent Gather Reality

The spec-writer:

  • Reads AGENTS.md to remember the “ALWAYS (NO EXCEPTIONS)” rules.
  • Skims docs/standards/global/* and the relevant frontend/backend/testing standards.
  • Uses the Explore / browser skills to inspect the current UI and code paths that touch the area I’m talking about.

It then reflects back what exists today, often catching things I forgot:

  • “You already have a similar pattern in component X.”
  • “Your email standards require support@tutorpro.kids as the sender.”
  • “All test data must use personally controlled domains.”

3. Co‑Design 2–4 Phases

We iterate on a set of phases, usually:

  • Phase 1: Data model + security rules + indexes.
  • Phase 2: Basic UI flows and validation (read/write).
  • Phase 3: Advanced workflows, notifications, or edge cases.

Each phase is trimmed down until it’s:

  • Independently shippable.
  • Independently testable.
  • Small enough that a human or AI agent can own it without drowning.

4. Lock in Specs, Requirements, and Tests

For each phase, the agent proposes:

  • User stories.
  • Requirements.
  • Error handling rules.
  • UI states (loading, empty, error, success).
  • Acceptance criteria and a few “Given/When/Then”‑style validation scenarios.

We tweak until it feels right, and then move on.

5. Materialize Issues via GitHub CLI

Finally, the spec-writer uses the GitHub skill to turn those specs into concrete gh commands and then runs them automatically for me.

Sometimes that means issuing them one at a time (roughly like this behind the scenes):

gh issue create \
  --title "Phase 2: Tutor proposal & parent confirmation UI" \
  --label "phase-2,feature/scheduling" \
  --assignee "@me" \
  --body "<full phase spec goes here>"

Sometimes it effectively chains multiple gh calls together in a mini shell session, conceptually like this:

PHASE1_URL=$(gh issue create --title "Phase 1: Scheduling data model" --body "<spec>" --label "phase-1,feature/scheduling,priority/medium")
PHASE2_URL=$(gh issue create --title "Phase 2: Tutor proposal UI" --body "<spec>" --label "phase-2,feature/scheduling,priority/medium")
PHASE3_URL=$(gh issue create --title "Phase 3: Parent confirmations & edge cases" --body "<spec>" --label "phase-3,feature/scheduling,priority/low")

echo "Created:"
echo "$PHASE1_URL"
echo "$PHASE2_URL"
echo "$PHASE3_URL"

Either way, the pipeline is the same:

Fuzzy idea → Structured conversation → Phases → Specs → GitHub issues → Implementation (by me or other agents).

The qualitative difference? I no longer wake up wondering “what was this ticket actually supposed to do?” I wrote it with an AI partner that refused to let me skip the thinking.

Precision and Velocity Through Constraints

The irony is that the more constraints I’ve added to this process, the faster I’ve gotten.

The spec-writer doesn’t just ask “what do you want built?” It continually pulls in:

  • Global standards from docs/standards/global/*
    • Coding style, validation, error handling, conventions.
  • Frontend guidelines
    • Component architecture, Tailwind usage, accessibility, responsiveness, logging.
  • Backend guidelines
    • API design, date/time handling, models, queries, logging.
  • Testing standards
    • How to write tests, manage resources, and run the right suites.
  • Email and data standards
    • Sender address rules, controlled domains for testing, seed/test script requirements.

All of that gets baked into the spec, so when an agent picks up a phase, it already knows:

  • Which files are likely to change.
  • What logging and error handling look like.
  • How tests should be structured.
  • What “done” actually means.

Concrete Example: Spec-Written Issue with Standards

Here’s a simplified sketch of the kind of issue body the spec-writer will output for a phase:

## What This Phase Delivers
Adds a read-only tutor dashboard view showing upcoming confirmed sessions, using existing layout and responsive patterns.

## Requirements
1. Fetch confirmed sessions for the logged-in tutor from Firestore using existing hooks.
2. Display sessions in a responsive list, following components in src/components/Dashboard.
3. Show loading, empty, and error states consistent with docs/standards/frontend/components.md.
4. Log errors using the shared logger utility (no direct console calls).

## Testing
- [ ] Unit tests for data fetching hook.
- [ ] UI tests covering loading/empty/error/happy paths.
- [ ] Verified on desktop and mobile breakpoints.

## Standards Compliance
- [ ] docs/standards/global/coding-style.md
- [ ] docs/standards/frontend/components.md
- [ ] docs/standards/frontend/css.md
- [ ] docs/standards/testing/test-writing.md

Multiply that by a few phases, and you see why this feels more like intentional architecture than “ticket stuffing.”

Alternatives and How to Adapt This Pattern

You don’t need my exact stack to steal this idea.

The core pattern is:

LLM-powered spec assistant + standards corpus + CLI/API → phased work items

There are multiple ways to get there:

  • Different LLMs / IDEs
  • Different standards homes
    • A docs/ folder like mine.
    • A Notion/Confluence space that the LLM can read.
    • A database or wiki of architecture decisions.
  • Different work trackers
    • Jira (via jira CLI or REST API).
    • Linear (via its GraphQL API or community CLIs).
    • ClickUp, Asana, or whatever your team uses.

If you don’t want to roll your own from scratch, there are also prebuilt ecosystems that aim in a similar direction:

  • AgentOS – An open-source framework designed to make AI coding agents work like skilled developers instead of "interns in training." It provides configurable workflows that encode your team's standards, tech stack, and project-specific patterns. The framework also ensures AI agents follow your established best practices and requirements—helping them deliver reliable code.
  • GitHub Spec Kit – GitHub's own experiment in Spec‑Driven Development, where specs are first‑class and can drive implementations. Great inspiration if you want to put specs at the center of your GitHub workflow. (docs)
  • Kiro (Amazon) – An emerging, spec‑centric developer experience from Amazon that’s focused on guiding teams from prototype to production with an agentic, spec‑driven flow. At the time of writing, public links and tooling are still evolving, but it’s worth keeping an eye on as it matures.

The point isn’t that you must use my spec-writer. It’s that your future development velocity is directly tied to how you spec work—and how well your tools can understand and act on those specs.

The Architect’s Perspective

The biggest change for me isn’t just faster shipping. It’s the mindset shift.

I no longer think of specs as a chore I rush through so I can “get to the real work.” Specs are the work. They’re where I:

  • Make the hard tradeoffs explicit.
  • Encode my standards and architecture decisions.
  • Decide what’s in or out of scope for this phase.

AI then amplifies that effort:

  • It interviews me so I don’t forget critical details.
  • It cross‑references the code and docs so I stay consistent.
  • It emits clean, CLI‑ready commands so my issues are predictable.

I still produce plenty of code—but now I’m acting more like a senior architect orchestrating agents than a solo dev panic‑coding features.

And because this is the internet and I am, in fact, a dad:

  • Writing specs without this kind of structure is like assembling IKEA furniture without the manual—you might get a chair, but it’s probably a wobbly bookshelf.
  • A good spec is like a good bedtime story: everyone knows how it ends, and nobody wakes up confused in the middle.

If you’re curious where to start, don’t try to automate everything on day one. Pick one feature, one LLM, one set of standards, and one way to auto‑create issues. Let your own spec-writer interview you, and see how it feels to ship with both precision and velocity.

–Jeremy


Thanks for reading! I'd love to hear your thoughts.

Have questions, feedback, or just want to say hello? I always enjoy connecting with readers.

Get in Touch

Published on December 03, 2025 in tech