Skip to main content

ML/AI & Machine Learning

Understanding machine learning, artificial intelligence, and LLM-powered applications.


Overview

ML/AI terms cover machine learning models, training, inference, and AI-powered application development. These concepts enable teams to integrate intelligent features, from simple predictions to complex LLM-powered assistants. SpecWeave itself is an AI-native framework designed to work seamlessly with Claude Code and other AI tools.

Core Concepts

Machine Learning Fundamentals

Machine Learning

  • Systems that learn from data without explicit programming
  • Types: Supervised, Unsupervised, Reinforcement Learning
  • Use cases: recommendations, fraud detection, image recognition
  • When to use: pattern recognition, large datasets, predictive tasks

Model Training

  • Process of teaching a model using data
  • Steps: data collection, preprocessing, training, validation
  • Metrics: accuracy, precision, recall, F1 score
  • Tools: TensorFlow, PyTorch, scikit-learn

Inference

  • Using a trained model to make predictions
  • Real-time: API endpoint (fast, expensive)
  • Batch: Process large datasets (slow, cheap)
  • Edge: On-device inference (privacy, no network)

Large Language Models (LLMs)

LLM (Large Language Model)

  • AI models trained on massive text datasets
  • Examples: GPT-4, Claude, Gemini, Llama
  • Capabilities: text generation, code, reasoning, translation
  • SpecWeave is LLM-native (designed for Claude Code)

Prompt Engineering

  • Crafting effective prompts for LLMs
  • Techniques: few-shot learning, chain-of-thought, system prompts
  • SpecWeave uses: structured prompts in skills, agents, commands
  • Best practices: clear instructions, examples, constraints

Context Window

  • Maximum tokens an LLM can process
  • Claude Sonnet 4.5: 200K tokens
  • SpecWeave optimization: 75%+ reduction via plugins
  • Trade-off: context size vs cost/speed

Token

  • Basic unit of LLM processing
  • Roughly: 1 token ≈ 4 characters or 0.75 words
  • Cost: charged per input/output token
  • SpecWeave tracks: token usage per increment

AI Frameworks

TensorFlow

  • Open-source ML framework by Google
  • Best for: production ML, complex models
  • Deployment: TensorFlow Serving, TFLite (mobile)
  • When to use: enterprise ML, need for scale

PyTorch

  • Open-source ML framework by Meta
  • Best for: research, experimentation, flexibility
  • More Pythonic, dynamic computation graphs
  • When to use: research, prototyping, NLP

Scikit-learn

  • Python ML library (classical algorithms)
  • Algorithms: regression, classification, clustering
  • Best for: traditional ML (not deep learning)
  • When to use: simple ML tasks, need interpretability

AI Development Patterns

RAG (Retrieval-Augmented Generation)

  • Combine LLM with external knowledge retrieval
  • Steps: Retrieve relevant docs → Generate response
  • Use cases: chatbots, documentation search, Q&A
  • SpecWeave uses: context loading from living docs

Fine-Tuning

  • Adapt pre-trained model to specific task
  • Requires: labeled data, training infrastructure
  • Benefits: better performance on domain tasks
  • Trade-offs: cost, maintenance vs prompt engineering

Vector Database

  • Database optimized for similarity search
  • Use cases: RAG, recommendations, semantic search
  • Tools: Pinecone, Weaviate, Chroma, pgvector
  • SpecWeave plugin: specweave-vector-search (planned)

When to Use These Terms

TermUse WhenDon't Use When
Machine LearningPattern recognition, large datasetsSimple rules, small data, need explainability
LLMText generation, code, reasoningReal-time (<10ms), strict determinism
Prompt EngineeringWorking with LLMs, need better outputsTraditional programming sufficient
RAGLLM needs external knowledgeAll knowledge fits in context window
Fine-TuningDomain-specific tasks, have labeled dataGeneric tasks, no training data
TensorFlowProduction ML, need scaleResearch, prototyping
PyTorchResearch, experimentationProduction (though improving)
Vector DatabaseSemantic search, RAG, recommendationsTraditional keyword search sufficient

Real-World Examples

Building an AI-Powered Customer Support Bot

Phase 1: Simple Keyword Matching (No ML)

# ❌ Rule-based (not ML, brittle)
def get_response(message):
message_lower = message.lower()

if "refund" in message_lower:
return "To request a refund, visit: example.com/refunds"
elif "shipping" in message_lower:
return "Shipping takes 3-5 business days."
else:
return "Please contact support@example.com"

# Problems:
# - Doesn't understand intent ("I want my money back" → no match)
# - No context (can't reference previous messages)
# - Hard to maintain (add rules forever)

Phase 2: LLM-Powered (Basic)

# ✅ LLM-powered (understands intent)
import anthropic

client = anthropic.Anthropic(api_key="...")

def get_response(message):
response = client.messages.create(
model="claude-3-5-sonnet-20250929",
max_tokens=1024,
system="You are a helpful customer support agent for an e-commerce company. Be polite and concise.",
messages=[
{"role": "user", "content": message}
]
)

return response.content[0].text

# Benefits:
# ✅ Understands intent ("I want my money back" → refund help)
# ✅ Natural language (no keyword matching)
# ✅ Flexible (handles unexpected questions)

# Problems:
# ❌ No company knowledge (can't answer "What's your return policy?")
# ❌ No context (can't reference previous messages)

Phase 3: RAG (Knowledge Retrieval)

# ✅ RAG: LLM + Knowledge Base
import anthropic
from embeddings import get_embedding, search_similar

# Step 1: Index company knowledge
knowledge_base = [
"Our return policy allows returns within 30 days of purchase.",
"Shipping is free for orders over $50.",
"We offer refunds to the original payment method.",
# ... 1000+ more documents
]

# Create embeddings for all documents (one-time)
embeddings = [get_embedding(doc) for doc in knowledge_base]

# Step 2: Retrieve relevant knowledge
def get_response(message):
# Find relevant documents (semantic search)
query_embedding = get_embedding(message)
relevant_docs = search_similar(query_embedding, embeddings, top_k=3)

# Build context from relevant docs
context = "\n".join(relevant_docs)

# Generate response with context
response = client.messages.create(
model="claude-3-5-sonnet-20250929",
max_tokens=1024,
system=f"""You are a customer support agent. Use the following knowledge to answer:

{context}

Be polite and accurate. If you don't know, say so.""",
messages=[
{"role": "user", "content": message}
]
)

return response.content[0].text

# Example:
# User: "What's your return policy?"
# Relevant docs: "Our return policy allows returns within 30 days..."
# Response: "You can return items within 30 days of purchase. We offer..."

# Benefits:
# ✅ Access to company knowledge
# ✅ Accurate answers (grounded in docs)
# ✅ Scales (add more docs without retraining)

Phase 4: Multi-Turn Conversations

# ✅ Conversation history (stateful)
conversation_history = {}

def get_response(user_id, message):
# Retrieve conversation history
if user_id not in conversation_history:
conversation_history[user_id] = []

# Retrieve relevant knowledge (RAG)
query_embedding = get_embedding(message)
relevant_docs = search_similar(query_embedding, embeddings, top_k=3)
context = "\n".join(relevant_docs)

# Add user message to history
conversation_history[user_id].append({
"role": "user",
"content": message
})

# Generate response with history + context
response = client.messages.create(
model="claude-3-5-sonnet-20250929",
max_tokens=1024,
system=f"""Customer support agent. Use this knowledge:

{context}""",
messages=conversation_history[user_id] # Full history!
)

# Add assistant response to history
assistant_message = response.content[0].text
conversation_history[user_id].append({
"role": "assistant",
"content": assistant_message
})

return assistant_message

# Example conversation:
# User: "I want to return my order"
# Bot: "I can help with that. What's your order number?"
# User: "12345" ← Bot remembers previous context!
# Bot: "Order 12345 is eligible for return. I'll send you a return label."

# Benefits:
# ✅ Multi-turn conversations
# ✅ Context retention
# ✅ Natural dialogue

SpecWeave ML/AI Example

# Increment 0060: AI-Powered Code Review Assistant

## Acceptance Criteria
- **AC-US1-01**: Analyze pull requests automatically (P1)
- **AC-US1-02**: Suggest code improvements (P1)
- **AC-US1-03**: Detect security issues (P2)
- **AC-US1-04**: Generate review comments (P2)

## Architecture Decisions

**ADR-060**: Use Claude API for code analysis
- **Rationale**: Code understanding, natural language output, no training needed
- **Alternatives**: Fine-tune model (expensive), rule-based (brittle)
- **Trade-offs**: API cost vs flexibility

**ADR-061**: RAG with codebase context
- **Rationale**: LLM needs project-specific patterns, conventions
- **Alternatives**: Fine-tuning (expensive), no context (generic)
- **Trade-offs**: Indexing overhead vs quality

## Implementation Plan

**T-001**: Set up GitHub webhook
- Listen for pull request events
- Fetch diff (changed files)
- Queue for AI analysis
- Test webhook delivery

**T-002**: Implement code analysis pipeline
- Extract changed files from diff
- Retrieve relevant context (RAG):
- Similar code patterns
- Project conventions
- Security policies
- Send to Claude API
- Parse response (suggestions, issues)

**T-003**: Generate review comments
- Format AI suggestions as GitHub comments
- Add inline comments on specific lines
- Include rationale + examples
- Test comment posting

**T-004**: Security issue detection
- Check OWASP patterns (SQL injection, XSS)
- Verify authentication/authorization
- Detect secrets in code
- Alert on high-severity issues

## Prompt Engineering

### System Prompt

```markdown
You are an expert code reviewer. Analyze pull requests and provide:
1. Code quality issues (complexity, duplication, naming)
2. Security vulnerabilities (OWASP Top 10)
3. Performance concerns (N+1 queries, inefficient algorithms)
4. Best practice violations (project conventions)

Project context:
- Language: TypeScript + Node.js
- Framework: Express + Prisma
- Style: ESLint + Prettier
- Patterns: See examples below

{RAG_CONTEXT}

Format:
- Issue: Brief description
- Severity: High/Medium/Low
- Location: File:Line
- Suggestion: How to fix
- Example: Code snippet

Be concise. Only report significant issues.

Example AI Response

## Code Review Summary

### High Severity Issues (2)

**1. SQL Injection Vulnerability**
- Location: `src/api/users.ts:45`
- Issue: User input directly in query
- Suggestion: Use parameterized query
- Example:
```typescript
// ❌ Before (vulnerable)
const users = await db.query(`SELECT * FROM users WHERE email = '${req.body.email}'`);

// ✅ After (safe)
const users = await db.users.findUnique({ where: { email: req.body.email } });

2. Authentication Missing

  • Location: src/api/orders.ts:78
  • Issue: No auth check before accessing user data
  • Suggestion: Add authentication middleware
  • Example:
    // Add: app.get('/api/orders', authenticateToken, async (req, res) => {

Medium Severity Issues (3)

3. N+1 Query Problem

  • Location: src/api/posts.ts:23
  • Issue: Fetching comments in loop (101 queries for 100 posts)
  • Suggestion: Use Prisma include
  • Example:
    // ✅ Single query with JOIN
    const posts = await db.post.findMany({ include: { comments: true } });

... (more issues)


## Test Plan

**Given** PR with code changes → **When** AI analysis → **Then** suggestions posted

**Test Cases**:
- Unit: Prompt construction, response parsing (85% coverage)
- Integration: GitHub webhook, Claude API (80% coverage)
- E2E: Full PR review flow (100% critical path)

## Cost Analysis

**Token usage per PR**:
- Diff: ~2K tokens (input)
- RAG context: ~5K tokens (input)
- Response: ~3K tokens (output)
- Total: ~10K tokens/PR

**Pricing** (Claude Sonnet 4.5):
- Input: $3 / 1M tokens
- Output: $15 / 1M tokens
- Cost per PR: $0.06 (6 cents)

**Monthly cost** (100 PRs/month):
- $6/month (very affordable!)

How SpecWeave Uses ML/AI Terms

1. LLM-Native Framework

SpecWeave is designed for LLM-powered development:

Architecture:
┌─────────────────────────────────────────────┐
│ Claude Code (LLM) │
│ - Skills (auto-activate based on context) │
│ - Agents (isolated expert personas) │
│ - Commands (structured interactions) │
└─────────────────┬───────────────────────────┘

┌─────────────────▼───────────────────────────┐
│ SpecWeave Framework │
│ - Plugin system (modular capabilities) │
│ - Context optimization (75% reduction) │
│ - Living docs (always up-to-date) │
└─────────────────────────────────────────────┘

2. Context Optimization

SpecWeave reduces LLM context usage:

Before (monolithic):
- All 44 skills + 20 agents = 50K tokens
- Simple React app: 50K tokens consumed

After (plugin architecture):
- Core plugin: 12K tokens
- React plugin: 4K tokens
- Total: 16K tokens (68% reduction!)

Cost savings:
- Before: $0.15 per conversation
- After: $0.048 per conversation
- Savings: 68% cheaper!

3. Prompt Engineering in Skills

Skills use structured prompts:

---
name: increment-planner
description: Plans development increments with spec generation. Activates when user says: "create increment", "plan feature", "/inc", "new increment".
---

# Increment Planner Skill

## Prompt Structure

When activated, follow this prompt:

1. Clarify requirements (ask questions)
2. Generate user stories (As a..., I want..., So that...)
3. Define acceptance criteria (AC-US1-01, AC-US1-02)
4. Estimate complexity (time, dependencies)
5. Invoke agents (PM → Architect → Tech Lead)

## Example Prompt

User: "/inc Add user authentication"

Response:
"I'll help plan the user authentication increment. Let me ask a few questions:

1. Authentication methods needed? (email/password, OAuth, SSO)
2. Security requirements? (2FA, rate limiting, session management)
3. User types? (admin, regular user, guest)

Once I understand, I'll:
- Generate spec.md (user stories + acceptance criteria)
- Create plan.md (architecture + implementation)
- Generate tasks.md (implementation tasks with tests)"

4. Agent Persona Design

Agents use role-based prompts:

# PM Agent

You are an expert Product Manager with 10+ years of experience.

Your responsibilities:
- Clarify user requirements
- Generate user stories
- Define acceptance criteria
- Prioritize features (P1, P2, P3)

Your communication style:
- Ask clarifying questions
- Think from user perspective
- Focus on business value
- Be concise

When planning increments:
1. Understand user needs (interview user)
2. Generate user stories (BDD format)
3. Define acceptance criteria (testable)
4. Estimate value + complexity

5. ML/AI Plugin (Planned)

specweave-ml-ops plugin (planned):

  • ML model training workflows
  • Experiment tracking (MLflow)
  • Model versioning
  • Inference deployment
  • Cost tracking (compute + tokens)


Learn More

Guides

  • Building with Claude Code (coming soon)
  • Prompt Engineering Best Practices (coming soon)
  • RAG Applications (coming soon)

Books

  • "Designing Machine Learning Systems" by Chip Huyen
  • "Deep Learning" by Ian Goodfellow
  • "Hands-On Machine Learning" by Aurélien Géron
  • "The Hundred-Page Machine Learning Book" by Andriy Burkov

External Resources


Navigation: