by CaddyStack

AI agents don't learn from their mistakes.

SLOPE changes that.

A sprint scoring framework — inspired by golf — that turns every AI agent session into measurable, improvable data.

175 sprints and counting

AI Agents Are Powerful. But Chaotic.

Without structure, agent sprints become unpredictable. Work gets lost. Patterns repeat. Nobody knows if the team is improving.

Without SLOPE

ERROR: Agent crashed mid-sprint, no recovery point

$ git log --oneline | wc -l → 47 commits, no pattern

WARN: Same bug as 3 sprints ago

$ how many tests? → "not sure, maybe 200?"

ERROR: Context lost on compaction

$ what improved? → "hard to say"

... (scrolls for 500 more lines)

With SLOPE

Sprint 175 — Par 3 — Score: 3 (Par)

Fairways: 3/3 (100%) | GIR: 3/3 (100%)

Hazards: 1 (ordering bug) — logged & recoverable

Handicap trending: 2.1 → 1.2 (last 5)

Miss pattern: long (43%) — training focus set

Next: guided by yardage book + hazard map

175

Sprints

3,333

Tests

172

Endpoints

Screens

Rules

Migrations

The Framework

Five dimensions. One scoring system.

Each letter maps to a measurable aspect of sprint execution. Together they create a complete picture of agent performance.

Sprint Lifecycle

Every sprint is a hole with a par. Ticket count determines expected difficulty. Score against par, not against perfection.

1-2 tickets = Par 3 | 3-4 tickets = Par 4 | 5+ tickets = Par 5

Leverage

Club selection maps to approach complexity. A driver for risky new infrastructure. A putter for trivial fixes. Declare your club before the shot.

Driver → Long Iron → Short Iron → Wedge → Putter

Operational Performance

Track hazards hit, miss directions, and penalties. Every gotcha gets logged. Patterns emerge from data, not memory.

Fairway % | GIR % | Putts | Penalties | Miss Direction

Predictive Engine

Rolling handicap averages predict future performance. Last-5 and last-10 windows show trajectory. Getting better or worse?

Handicap = avg(score - par) over rolling window

Execution Routines

Pre-shot, post-shot, and 19th hole routines ensure nothing is skipped. Every sprint starts with the yardage book. Every ticket ends with scoring.

Pre-Round → Pre-Shot → Execute → Post-Shot → 19th Hole

The Metaphor

Why Golf?

Golf is the only sport where you compete against yourself and the course — not an opponent. Software development is the same.

SLOPE borrows golf's vocabulary because it maps perfectly: known difficulty (par), personal benchmarking (handicap), hazard tracking (bunkers), and structured routines (pre-shot, post-shot) that elite players use to improve consistently.

Round Sprint

Hole Ticket

Par Expected complexity

Club selection Approach complexity

Fairway Clean execution path

Bunker Known gotcha

Water hazard Breaking change

Handicap Rolling performance average

Scorecard Sprint retrospective

Yardage book Codebase map

19th Hole Post-sprint reflection

The Par System

Know your difficulty before you start. Par is determined by ticket count. Slope factors add terrain difficulty on top.

Ticket Count

1 8

Par

Par 4

Slope Factors (each adds +1 difficulty)

Cross-package

Changes span multiple packages

Schema migration

Database migration required

New area

First time touching this code

External dependency

Depends on external service

Concurrent agents

Multiple agents working

Course Terrain

Moderate terrain — 3-4 tickets with slope factors

The Shot Cycle

Every ticket follows a routine. Preparation, execution, and review — so nothing falls through the cracks.

Pre-Shot

Before writing code

Read yardage book

Check codebase map for target files

Select club

Declare approach: driver → putter

Scan hazards

Check bunker locations from past sprints

Execute

Writing code

Write code

Implement the ticket on a feature branch

Run tests

Pre-PR checklist: typecheck + unit + e2e

Commit & push

Recovery point every 30 minutes

Post-Shot

After completing a ticket

Score the shot

Fairway? Green? In the hole? Miss?

Record hazards

Log gotchas for future yardage books

Log miss direction

Long? Short? Left? Right?

The Scorecard

Real data from real sprints. Every scorecard is computed automatically from ticket outcomes — no manual counting.

Live Scorecards

from caddystack.fly.dev

Sprint	Theme	Par	Score	Result
#175	demo exploration	3	3	Par
#174	guided first-project	3	4	Bogey

Last 5 Rounds

1.2

85% FW | 78% GIR

Last 10 Rounds

1.5

82% FW | 75% GIR

All Time

2.1

76% FW | 70% GIR

After 175 sprints

2.1

All-time handicap

1.2

Last 5 sprints

Every hazard logged. Every miss direction tracked. The framework doesn't just measure — it teaches.

The Hazard Map

Track what trips you up. Every hazard is categorized, logged, and mapped so future sprints can avoid the same traps.

Bunker / Known Gotcha

A known issue logged in previous sprints. Sand trap — recoverable but costly.

Water / Breaking Change

Drop a stroke. Breaking changes force rework and downstream fixes.

OB / Scope Creep

Out of bounds. Work that extends beyond the ticket boundary.

Rough / Tech Debt

Playable but slower. Existing debt that drags down execution speed.

Trees / Blocking Deps

Can't take a direct shot. Dependencies that force detours.

Miss Direction Pattern

live data

Long: 12 Short: 8 Left: 5 Right: 3

The 19th Hole

In golf, the 19th hole is where you sit down after the round and talk about what happened. In SLOPE, it's where improvement actually happens.

Post-Round Reflection

Score the hole

Audit commits, compute final score vs par, assign a score label

Build the scorecard

Auto-computed stats: fairways, GIR, putts, penalties, miss directions

Distill learnings

New gotchas become bunker locations. Patterns become training drills.

Update the handicap

Rolling averages recalculate. The trajectory becomes visible.

What the Data Teaches

Dominant miss live

Long — overscoping is the #1 pattern

When most misses go long, sprints are being underestimated. The fix: tighter scoping, more wedge shots, fewer drivers.

Handicap trajectory

2.1 1.2

Consistent routines compound. The yardage book gets richer every sprint. Bunker locations prevent repeat mistakes.

Training program

Driving range research

Chipping feedback

Putting tests

Lessons specs

Start Scoring Your Sprints

SLOPE is open methodology. Read the framework, adopt what fits, and start turning your agent sessions into measurable progress.

Read the SLOPE Framework

Development Progress

live

Phase 1: Foundation COMPLETE

Phase 2: First Conversation COMPLETE

Phase 3: First Live Agent COMPLETE

Phase 4: First Managed Sprint COMPLETE

Phase 5: Production Ready ~60%

Phase 6: Scale NOT STARTED

SLOPE is the development methodology behind CaddyStack — a conversational AI platform for managing coding agents.

Built with Astro, Tailwind, and GSAP. Stats from the live CaddyStack API.