Software 2.0: It's Not Vibe Coding, But It's Not Business as Usual Either

NHDNUG / The Woodlands, TX / February 19, 2026

Software 1.0 is software you specify.
Software 2.0 is software you verify.

— Aaron Stannard

Three Core Assertions

  1. Some developers are already obsolete
    Those who refuse to learn LLM-assisted techniques - "coasters"
  2. Those who adapt will be more productive than ever
    LLMs are force multipliers for skilled developers
  3. LLMs have fundamental, irresolvable limitations
    Mathematical constraints, not engineering problems to be solved

Let's Start Something Running

A RALPH loop in the wild

TurboMqtt

  • High-performance MQTT client for .NET
  • Open source: github.com/petabridge/TurboMqtt
  • Goal: implement MQTT 5.0 protocol features autonomously
  • Uses Claude Code + RALPH loop infrastructure

We're going to kick this off right now and check back on it throughout the talk.

LIVE DEMO

Kicking Off the RALPH Loop

$ cat IMPLEMENTATION_PLAN.md
$ cat ralph.sh
$ ./ralph.sh

"We'll check back on this later."

How LLMs Actually Work

The short version

What Influences Output Quality?

  • Model size - more parameters = richer representations
  • Mixture of Experts (MoE) - specialized sub-networks activated per task
  • Quantization - precision trade-offs for speed
  • Training data quality - garbage in, garbage out still applies
  • Active parameters - not all weights fire for every token

LLMs Are Stateless

Advantages

  • Easy to parallelize and clone
  • Good single-shot responses
  • No accumulated state bugs
  • Reproducible given same context

Disadvantages

  • No persistent memory between sessions
  • Must rebuild context every time
  • Knowledge frozen at training cutoff
  • Can't learn from mistakes across sessions

Key Takeaway

Prompts activate different regions of the model's learned weights.

More specific prompts = better activation = better results.

This is why context matters more than anything else.

Writing Prompts That Don't Suck

From single prompts to prompting systems

Bad Prompt vs. Good Prompt

Bad

"Write me a marketing page."

Vague task, no context, no process, no output criteria

Good

"First analyze target audience pain points from @docs/personas.md.

Then define positioning against competitors in @docs/competitive.md.

Then write the page. Show reasoning at each step.

Output: HTML with Tailwind, mobile-first."

Context, process, verification, output format

Beyond Single Prompts

A single good prompt is fine for one-off tasks.

But a prompting system is what you need for serious work:

  • System prompts that establish identity + context
  • Skills that encode reusable processes
  • Implementation plans that break work into context-window-sized tasks

This leads us to agentic operating systems...

Agentic Operating Systems

They're just markdown files

The Core Files

CLAUDE.md / AGENTS.md

The "constitution" - loaded into every agent session automatically

  • Build, test, deploy instructions
  • Skill routing table - which /command handles which task
  • Documentation pointers for deeper context
  • Project conventions and coding standards
  • Tool configuration (CLI tools, MCP servers)

Context Files

PROJECT_CONTEXT.md
  • Project purpose and mission
  • Adjacent repositories
  • SDLC phase (active dev? maintenance?)
  • Key architectural decisions
TOOLING.md
  • Tech stack details
  • CLI tools and their usage
  • Deployment procedures
  • CI/CD pipeline structure

These establish "Who am I working on?" and "What tools do I have?"

IMPLEMENTATION_PLAN.md

Breaking work into context-window-sized tasks

## Phase 2: MQTT 5.0 Auth
### Task 2.1: Implement AUTH packet
- Parse AUTH reason codes per MQTT 5.0 spec s3.15
- Add property parsing for auth-method, auth-data
- Verify: AuthPacketSpecs.cs must pass
- Verify: dotnet build succeeds with zero warnings

### Task 2.2: Implement SASL challenge flow
- Wire AUTH into connection state machine
- Support multi-step SASL exchanges
- Verify: SaslAuthIntegrationSpecs.cs must pass

The spec litmus test: "Can the agent execute without clarifying questions?"

Skills: Reusable Prompt Templates

Invoked with /commands

Productivity Multipliers

  • /commit - GPG-signed commits with proper messages
  • /pr - Create PR with summary and test plan
  • /review-pr - Comprehensive code review

Domain Skills

  • /close-deal - CRM workflow for sales
  • /draft-customer-email - Email with style guide
  • /create-benchmark - BenchmarkDotNet setup

Skills encode tribal knowledge as executable prompts

LIVE DEMO

TurboMqtt's Agentic OS

$ cat CLAUDE.md
$ cat IMPLEMENTATION_PLAN.md
$ ls .claude/skills/
$ cat ralph.sh

The actual file structure powering our running demo

Choosing Your AI Harness

The tool that connects LLMs to your code

What is a Harness?

A tool that gives LLMs access to files, shell, web, and context.

Claude Code

CLI-based, terminal-native

OpenCode

Open source, multi-model

Cursor / Windsurf

IDE-integrated

GitHub Copilot Workspace

Cloud-based, PR-oriented

What to Look For in a Harness

  1. Model Selection
    Match the model to the task. Not everything needs Opus.
  2. Intelligent Context Gathering
    Load skills, files, and relevant context automatically when needed.
  3. Local Environment Feedback
    e.g., OpenCode + C# LSP = real-time compiler feedback without running dotnet build

Model Routing

Not all tasks need the most expensive model.

Task Type Model Why
Complex architecture Opus Needs deep reasoning
Standard coding tasks Sonnet Good balance of speed and quality
Formatting, renaming Haiku Fast, cheap, good enough
Browser automation Haiku Token-heavy but simple logic

Example: playwright-gopher agent uses Haiku for browser tasks that would burn through Opus budget

Context Windows

The hidden bottleneck

Attention Degrades at Scale

Maximum effective context << advertised context

"Context Is What You Need" (arXiv:2509.21361)

200K

Advertised context window

~30-50K

Effective attention range

Context Compaction

What happens when the window gets too large

📝

Full conversation
50K tokens

🗜️

Compacted summary
10K tokens

⚠️

Information
LOST

Compaction is lossy. Critical details get summarized away.

Context Management is a Skill

Treat each agent session like a well-designed function:

One job. Executed well. Completed decisively.

  • Don't have multi-hour conversations with agents
  • Break large tasks into discrete, completable units
  • Start fresh sessions for fresh tasks

This leads us directly to RALPH loops...

RALPH Loops

Autonomous coding done right

Two Steep Requirements

1. Really Good Planning

Developers become project managers whether they want to or not

2. Really Good Verification

Non-deterministic output demands deterministic checks

"Historically, software developers are beyond terrible at both of these.
It is time to git gud."

LIVE DEMO

TurboMqtt RALPH Progress Check

What has the autonomous loop accomplished so far?

$ git log --oneline -10
$ dotnet test

Where LLMs Go Wrong

Understanding the failure modes

Hallucinations Are Mathematical

LLMs predict plausible outputs, not verified truths.

  • This is inherent to transformer architecture
  • It will NEVER be fully "fixed"
  • Better models hallucinate less, but never zero
  • Even a 0.1% hallucination rate = real bugs in production

LLMs Don't Reason - They Predict

They predict what text humans previously used to describe the world, one token at a time.

Not understanding - pattern matching on training data
Can produce alien-looking decisions - optimizing for prediction, not logic
But emergent behaviors ARE real - autonomous coding, image generation, translation

Three Irresolvable Limitations

1. Hallucinations - Mathematical constraint of transformer architecture
2. Finite Effective Context - Orders of magnitude smaller than advertised
3. Misalignment - Model biases can diverge from your actual goals

These aren't getting "fixed." Design your workflow around them.

Guardrail 1: Become an Effective Planner

Most Software 2.0 time is spent here

Planning Modes

Read-only research, no writes

Claude Code: /plan command or shift+tab to toggle plan mode
OpenCode: Built-in plan mode with LSP integration
General principle: Research first, write second. Always.

Three Things to Provide

  1. Detailed description of goals and desired output
    What does "done" look like? Be explicit.
  2. Constraints
    Visible from source (API contracts, types) + invisible ones ("40MB App Store limit", "must run on .NET 8")
  3. How the LLM can verify it did its job
    Specific test files, build commands, acceptance criteria

The Spec Litmus Test

"Can the agent execute your specification
without clarifying questions?"

If not, you haven't been specific enough.

The Fundamental Shift

Software 1.0

20% planning

80% coding

Software 2.0

80% planning + review

20% coding

You're becoming a specification author and project manager.

Guardrail 2: Verification at Every Level

Non-determinism demands deterministic checks

Same Prompt, Different Results

Non-determinism demands deterministic checks

Run 1

Result A

Run 2

Result B

Run 3

Result C

Deterministic verification gates

Beyond Code Correctness

  • UI Design Verification
    Screenshot comparison against approved mockups
  • Documentation Fact-Checking
    Link validators, API reference accuracy, code sample testing
  • Adversarial Review
    LLM-reviews-LLM with sufficient context and fresh eyes
  • Behavioral Verification
    Does the feature actually work the way users expect?

Minimize Slack

More verification = less room for error
= more trust in autonomous operation

Every unchecked dimension is a dimension where hallucinations can hide.
Close the gaps. Tighten the checks. Trust the process, not the output.

Observability

Trust but verify

Why Observability Matters

Understanding HOW agents make decisions

  • Audit trail - what files were read, what was modified, in what order?
  • Decision tracing - why did the agent choose approach A over B?
  • Failure analysis - when something goes wrong, can you trace back to the root cause?
  • Trust building - over time, patterns emerge that build or erode confidence

LIVE DEMO

RALPH Output Log Review

Tracing agent reasoning through the log

$ tail -100 ralph-output.log
$ git log --oneline --stat -5
$ git diff HEAD~3..HEAD --stat

What decisions did the agent make? How can we trace its reasoning?

The Voyage

Nothing to be afraid of

Start Tomorrow

Three concrete steps to begin your Software 2.0 journey

  1. Write a CLAUDE.md
    Describe how to build, test, and deploy your project. Takes 30 minutes.
  2. Close one verification gap
    Add a linter rule. Write a test for that untested function. Enable a formatter.
  3. Apply LLM assistance to that tech debt you've been avoiding
    Perfect first project. Low stakes, high learning, immediate value.

Thank You!

Questions?

Blog Post: aaronstannard.com - "Software 2.0: Code is Cheap, Good Taste is Not"

dotnet-skills: github.com/petabridge/dotnet-skills

Twitter/X: @Aaronontheweb