by CaddyStack

AI agents don't learn from their mistakes.

SLOPE changes that.

A sprint scoring framework — inspired by golf — that turns every AI agent session into measurable, improvable data.

175 sprints and counting

AI Agents Are Powerful. But Chaotic.

Without structure, agent sprints become unpredictable. Work gets lost. Patterns repeat. Nobody knows if the team is improving.

Without SLOPE

ERROR: Agent crashed mid-sprint, no recovery point

$ git log --oneline | wc -l → 47 commits, no pattern

WARN: Same bug as 3 sprints ago

$ how many tests? → "not sure, maybe 200?"

ERROR: Context lost on compaction

$ what improved? → "hard to say"

... (scrolls for 500 more lines)

With SLOPE

Sprint 175 — Par 3 — Score: 3 (Par)

Fairways: 3/3 (100%) | GIR: 3/3 (100%)

Hazards: 1 (ordering bug) — logged & recoverable

Handicap trending: 2.1 → 1.2 (last 5)

Miss pattern: long (43%) — training focus set

Next: guided by yardage book + hazard map

175
Sprints
3,333
Tests
172
Endpoints
50
Screens
21
Rules
48
Migrations

The Framework

Five dimensions. One scoring system.

Each letter maps to a measurable aspect of sprint execution. Together they create a complete picture of agent performance.

S

Sprint Lifecycle

Every sprint is a hole with a par. Ticket count determines expected difficulty. Score against par, not against perfection.

1-2 tickets = Par 3 | 3-4 tickets = Par 4 | 5+ tickets = Par 5
L

Leverage

Club selection maps to approach complexity. A driver for risky new infrastructure. A putter for trivial fixes. Declare your club before the shot.

Driver → Long Iron → Short Iron → Wedge → Putter
O

Operational Performance

Track hazards hit, miss directions, and penalties. Every gotcha gets logged. Patterns emerge from data, not memory.

Fairway % | GIR % | Putts | Penalties | Miss Direction
P

Predictive Engine

Rolling handicap averages predict future performance. Last-5 and last-10 windows show trajectory. Getting better or worse?

Handicap = avg(score - par) over rolling window
E

Execution Routines

Pre-shot, post-shot, and 19th hole routines ensure nothing is skipped. Every sprint starts with the yardage book. Every ticket ends with scoring.

Pre-Round → Pre-Shot → Execute → Post-Shot → 19th Hole
Tee Box Sprint Start Fairway Clean Path Bunker Known Gotcha Water Breaking Change Green Sprint Complete

The Metaphor

Why Golf?

Golf is the only sport where you compete against yourself and the course — not an opponent. Software development is the same.

SLOPE borrows golf's vocabulary because it maps perfectly: known difficulty (par), personal benchmarking (handicap), hazard tracking (bunkers), and structured routines (pre-shot, post-shot) that elite players use to improve consistently.

Round Sprint
Hole Ticket
Par Expected complexity
Club selection Approach complexity
Fairway Clean execution path
Bunker Known gotcha
Water hazard Breaking change
Handicap Rolling performance average
Scorecard Sprint retrospective
Yardage book Codebase map
19th Hole Post-sprint reflection

The Par System

Know your difficulty before you start. Par is determined by ticket count. Slope factors add terrain difficulty on top.

3
1 8
Par
Par 4

Slope Factors (each adds +1 difficulty)

+1
Cross-package
Changes span multiple packages
+1
Schema migration
Database migration required
+1
New area
First time touching this code
+1
External dependency
Depends on external service
+1
Concurrent agents
Multiple agents working
Course Terrain
Moderate terrain — 3-4 tickets with slope factors

The Shot Cycle

Every ticket follows a routine. Preparation, execution, and review — so nothing falls through the cracks.

Pre-Shot

Before writing code

1.
Read yardage book
Check codebase map for target files
2.
Select club
Declare approach: driver → putter
3.
Scan hazards
Check bunker locations from past sprints

Execute

Writing code

1.
Write code
Implement the ticket on a feature branch
2.
Run tests
Pre-PR checklist: typecheck + unit + e2e
3.
Commit & push
Recovery point every 30 minutes

Post-Shot

After completing a ticket

1.
Score the shot
Fairway? Green? In the hole? Miss?
2.
Record hazards
Log gotchas for future yardage books
3.
Log miss direction
Long? Short? Left? Right?

The Scorecard

Real data from real sprints. Every scorecard is computed automatically from ticket outcomes — no manual counting.

Live Scorecards
from caddystack.fly.dev
Sprint Theme Par Score Result
#175 demo exploration 3 3 Par
#174 guided first-project 3 4 Bogey
Last 5 Rounds
1.2
85% FW | 78% GIR
Last 10 Rounds
1.5
82% FW | 75% GIR
All Time
2.1
76% FW | 70% GIR

After 175 sprints

2.1
All-time handicap
1.2
Last 5 sprints

Every hazard logged. Every miss direction tracked. The framework doesn't just measure — it teaches.

The Hazard Map

Track what trips you up. Every hazard is categorized, logged, and mapped so future sprints can avoid the same traps.

Bunker / Known Gotcha

A known issue logged in previous sprints. Sand trap — recoverable but costly.

Water / Breaking Change

Drop a stroke. Breaking changes force rework and downstream fixes.

OB / Scope Creep

Out of bounds. Work that extends beyond the ticket boundary.

Rough / Tech Debt

Playable but slower. Existing debt that drags down execution speed.

Trees / Blocking Deps

Can't take a direct shot. Dependencies that force detours.

Miss Direction Pattern

live data
Long: 12 Short: 8 Left: 5 Right: 3

The 19th Hole

In golf, the 19th hole is where you sit down after the round and talk about what happened. In SLOPE, it's where improvement actually happens.

Post-Round Reflection

1
Score the hole
Audit commits, compute final score vs par, assign a score label
2
Build the scorecard
Auto-computed stats: fairways, GIR, putts, penalties, miss directions
3
Distill learnings
New gotchas become bunker locations. Patterns become training drills.
4
Update the handicap
Rolling averages recalculate. The trajectory becomes visible.

What the Data Teaches

Dominant miss live
Long — overscoping is the #1 pattern

When most misses go long, sprints are being underestimated. The fix: tighter scoping, more wedge shots, fewer drivers.

Handicap trajectory
2.1 1.2

Consistent routines compound. The yardage book gets richer every sprint. Bunker locations prevent repeat mistakes.

Training program
Driving range research
Chipping feedback
Putting tests
Lessons specs

Start Scoring Your Sprints

SLOPE is open methodology. Read the framework, adopt what fits, and start turning your agent sessions into measurable progress.

Development Progress

live
Phase 1: Foundation COMPLETE
Phase 2: First Conversation COMPLETE
Phase 3: First Live Agent COMPLETE
Phase 4: First Managed Sprint COMPLETE
Phase 5: Production Ready ~60%
Phase 6: Scale NOT STARTED

SLOPE is the development methodology behind CaddyStack — a conversational AI platform for managing coding agents.

Built with Astro, Tailwind, and GSAP. Stats from the live CaddyStack API.