AI Development Fundamentals

Essential concepts for building reliable systems with LLMs.

Working with AI tools like Claude Code requires a different mental model than traditional programming. This guide covers the fundamental differences and how to build guardrails around non-deterministic systems.

Deterministic vs Non-Deterministic

Traditional Programming: Deterministic

In traditional software development, calling a function produces the exact same result every time:

// Deterministic: Input A ALWAYS produces Output B
function calculateTotal(items: Item[]): number {
  return items.reduce((sum, item) => sum + item.price * item.quantity, 0);
}

calculateTotal([{ price: 10, quantity: 2 }]); // Always returns 20
calculateTotal([{ price: 10, quantity: 2 }]); // Always returns 20
calculateTotal([{ price: 10, quantity: 2 }]); // Always returns 20

Characteristics:

Predictable: Same input = same output, every time
Testable: Unit tests pass or fail consistently
Debuggable: Step through code, inspect variables
Reproducible: Bug happens once, you can reproduce it

LLM-Based Development: Non-Deterministic

When you work with Claude or any LLM, responses can vary even with identical inputs:

// Non-deterministic: Same prompt may produce different outputs
const response1 = await claude.complete("Implement user authentication");
const response2 = await claude.complete("Implement user authentication");

// response1 !== response2 (different code, same intent)

Characteristics:

Probabilistic: ~90% confident, not 100%
Variable outputs: Temperature, context length, message order affect results
Creative: Generates novel solutions, not just retrieves answers
Context-sensitive: Previous messages influence responses

Why This Matters

The Confidence Gap

Approach	Confidence	Example
Deterministic	100%	`Math.max(a, b)` returns larger number
LLM Response	~90%	"Implement login form" - probably correct, needs review
LLM + Guardrails	~99%	Specs + validation + tests catch drift

Real-World Implications

Without guardrails:

You: "Add user authentication"
Claude: Implements JWT auth
You: "Add user authentication" (same prompt, new session)
Claude: Implements session-based auth

// Different approaches, both valid, but inconsistent

With SpecWeave guardrails:

spec.md:
  AC-US1-01: JWT-based authentication with refresh tokens
  AC-US1-02: Session expires after 24 hours

You: "Add user authentication"
Claude: Checks spec → Implements JWT with refresh tokens

// Consistent because specs define success criteria

Strategies for Non-Deterministic Systems

1. Define Success Criteria Upfront

Don't rely on implicit understanding. Make acceptance criteria explicit:

### US-001: User Authentication
**As a** user, I want to log in securely...

#### Acceptance Criteria
- [x] **AC-US1-01**: JWT tokens with 24h expiry
- [x] **AC-US1-02**: Refresh token rotation
- [x] **AC-US1-03**: Password hashing with bcrypt (12 rounds)
- [x] **AC-US1-04**: Rate limiting: 5 attempts per 15 minutes

2. Break Work into Verifiable Chunks

Large tasks have more variability. Small tasks are easier to validate:

# ❌ Too broad - high variability
T-001: Implement authentication system

# ✅ Verifiable chunks - lower variability
T-001: Create User model with password hash field
T-002: Implement /auth/register endpoint
T-003: Implement /auth/login endpoint with JWT
T-004: Add refresh token rotation logic
T-005: Implement rate limiting middleware

3. Validate Outputs Automatically

Use hooks and tests to catch drift:

// Hook: Validate every generated file
hooks:
  post-task-completion:
    - npm test
    - npm run lint
    - npm run typecheck

4. Use Quality Gates

Don't trust "it works" - verify against specs:

# SpecWeave validates before closing
/sw:done 0023

# Checks:
# ✓ All tasks completed?
# ✓ All ACs marked done?
# ✓ Tests passing?
# ✓ No uncommitted changes?

Temperature and Variability

LLMs have a temperature parameter that controls randomness:

Temperature	Behavior	Use Case
0.0	Most deterministic	Code generation, factual queries
0.3-0.5	Balanced	General development tasks
0.7-1.0	More creative	Brainstorming, creative writing

Note: Even at temperature 0, LLMs aren't perfectly deterministic due to:

Token sampling strategies
Context window variations
Model updates and versions

The SpecWeave Approach

SpecWeave wraps non-deterministic AI with deterministic processes:

┌─────────────────────────────────────────────────────┐
│              SPECWEAVE STRUCTURE                     │
├─────────────────────────────────────────────────────┤
│                                                      │
│   📋 spec.md (Deterministic)                        │
│   └── Acceptance criteria define success            │
│                                                      │
│   🤖 Claude (Non-Deterministic)                     │
│   └── Generates implementation                      │
│                                                      │
│   ✅ Validation (Deterministic)                     │
│   └── Tests, hooks, quality gates verify output     │
│                                                      │
│   📦 Result (Deterministic)                         │
│   └── Either passes all checks or fails             │
│                                                      │
└─────────────────────────────────────────────────────┘

Key insight: You can't make the AI deterministic, but you can make the outcome deterministic by validating against explicit criteria.

Practical Tips

When Writing Prompts

# ❌ Vague - high variability
"Make the code better"

# ✅ Specific - lower variability
"Refactor the UserService to:
1. Extract database queries to a repository layer
2. Add input validation using zod schemas
3. Return typed errors instead of throwing"

When Reviewing AI Output

Always verify:

Does it match the acceptance criteria?
Do tests pass?
Is the approach consistent with existing code?
Are there security implications?

When Things Go Wrong

Non-deterministic doesn't mean random. If outputs are consistently wrong:

Prompt may be ambiguous - add specificity
Context may be missing - load relevant files
Specs may conflict - resolve contradictions

Summary

Concept	Traditional Code	LLM-Based Development
Predictability	100% same output	~90% similar output
Testing	Unit tests are reliable	Need spec-based validation
Debugging	Step through code	Review prompts and context
Success criteria	Implicit in code	Must be explicit in specs
Quality assurance	Tests catch bugs	Tests + specs catch drift

Remember: You're not writing scripts that execute the same way every time. You're orchestrating an AI that needs guidance, validation, and clear acceptance criteria to stay on track.

How This Relates to SpecWeave

SpecWeave Feature	How It Adds Determinism
spec.md	Explicit acceptance criteria
tasks.md	Verifiable work chunks
Hooks	Automatic validation after changes
Quality Gates	Block completion without verification
Test Integration	Automated pass/fail checks

Next: Testing Fundamentals - how to build reliable test suites for AI-assisted development.

Deterministic vs Non-Deterministic​

Traditional Programming: Deterministic​

LLM-Based Development: Non-Deterministic​

Why This Matters​

The Confidence Gap​

Real-World Implications​

Strategies for Non-Deterministic Systems​

1. Define Success Criteria Upfront​

2. Break Work into Verifiable Chunks​

3. Validate Outputs Automatically​

4. Use Quality Gates​

Temperature and Variability​

The SpecWeave Approach​

Practical Tips​

When Writing Prompts​

When Reviewing AI Output​

When Things Go Wrong​

Summary​

How This Relates to SpecWeave​