Software 2.0: It's Not Vibe Coding, But It's Not Business as Usual Either

NHDNUG / The Woodlands, TX / February 19, 2026

Software 1.0 is software you specify.
Software 2.0 is software you verify.

— Aaron Stannard

Three Core Assertions

Some developers are already obsolete
Those who refuse to learn LLM-assisted techniques - "coasters"
Those who adapt will be more productive than ever
LLMs are force multipliers for skilled developers
LLMs have fundamental, irresolvable limitations
Mathematical constraints, not engineering problems to be solved

Let's Start Something Running

A RALPH loop in the wild

TurboMqtt

High-performance MQTT client for .NET
Open source: github.com/petabridge/TurboMqtt
Goal: implement MQTT 5.0 protocol features autonomously
Uses Claude Code + RALPH loop infrastructure

We're going to kick this off right now and check back on it throughout the talk.

LIVE DEMO

Kicking Off the RALPH Loop

$ cat IMPLEMENTATION_PLAN.md
$ cat ralph.sh
$ ./ralph.sh

"We'll check back on this later."

How LLMs Actually Work

The short version

What Influences Output Quality?

Model size - more parameters = richer representations
Mixture of Experts (MoE) - specialized sub-networks activated per task
Quantization - precision trade-offs for speed
Training data quality - garbage in, garbage out still applies
Active parameters - not all weights fire for every token

LLMs Are Stateless

Advantages

Easy to parallelize and clone
Good single-shot responses
No accumulated state bugs
Reproducible given same context

Disadvantages

No persistent memory between sessions
Must rebuild context every time
Knowledge frozen at training cutoff
Can't learn from mistakes across sessions

Key Takeaway

Prompts activate different regions of the model's learned weights.

More specific prompts = better activation = better results.

This is why context matters more than anything else.

Writing Prompts That Don't Suck

From single prompts to prompting systems

Bad Prompt vs. Good Prompt

Bad

"Write me a marketing page."

Vague task, no context, no process, no output criteria

Good

"First analyze target audience pain points from @docs/personas.md.

Then define positioning against competitors in @docs/competitive.md.

Then write the page. Show reasoning at each step.

Output: HTML with Tailwind, mobile-first."

Context, process, verification, output format

Beyond Single Prompts

A single good prompt is fine for one-off tasks.

But a prompting system is what you need for serious work:

System prompts that establish identity + context
Skills that encode reusable processes
Implementation plans that break work into context-window-sized tasks

This leads us to agentic operating systems...

Agentic Operating Systems

They're just markdown files

The Core Files

CLAUDE.md / AGENTS.md

The "constitution" - loaded into every agent session automatically

Build, test, deploy instructions
Skill routing table - which /command handles which task
Documentation pointers for deeper context
Project conventions and coding standards
Tool configuration (CLI tools, MCP servers)

Context Files

PROJECT_CONTEXT.md

Project purpose and mission
Adjacent repositories
SDLC phase (active dev? maintenance?)
Key architectural decisions

TOOLING.md

Tech stack details
CLI tools and their usage
Deployment procedures
CI/CD pipeline structure

These establish "Who am I working on?" and "What tools do I have?"

IMPLEMENTATION_PLAN.md

Breaking work into context-window-sized tasks

    ## Phase 2: MQTT 5.0 Auth

    ### Task 2.1: Implement AUTH packet

    - Parse AUTH reason codes per MQTT 5.0 spec s3.15

    - Add property parsing for auth-method, auth-data

    - Verify: AuthPacketSpecs.cs must pass

    - Verify: dotnet build succeeds with zero warnings

    ### Task 2.2: Implement SASL challenge flow

    - Wire AUTH into connection state machine

    - Support multi-step SASL exchanges

    - Verify: SaslAuthIntegrationSpecs.cs must pass

The spec litmus test: "Can the agent execute without clarifying questions?"

Skills: Reusable Prompt Templates

Invoked with /commands

Productivity Multipliers

/commit - GPG-signed commits with proper messages
/pr - Create PR with summary and test plan
/review-pr - Comprehensive code review

Domain Skills

/close-deal - CRM workflow for sales
/draft-customer-email - Email with style guide
/create-benchmark - BenchmarkDotNet setup

Skills encode tribal knowledge as executable prompts

LIVE DEMO

TurboMqtt's Agentic OS

$ cat CLAUDE.md
$ cat IMPLEMENTATION_PLAN.md
$ ls .claude/skills/
$ cat ralph.sh

The actual file structure powering our running demo

Choosing Your AI Harness

The tool that connects LLMs to your code

What is a Harness?

A tool that gives LLMs access to files, shell, web, and context.

Claude Code

CLI-based, terminal-native

OpenCode

Open source, multi-model

Cursor / Windsurf

IDE-integrated

GitHub Copilot Workspace

Cloud-based, PR-oriented

What to Look For in a Harness

Model Selection
Match the model to the task. Not everything needs Opus.
Intelligent Context Gathering
Load skills, files, and relevant context automatically when needed.
Local Environment Feedback
e.g., OpenCode + C# LSP = real-time compiler feedback without running dotnet build

Model Routing

Not all tasks need the most expensive model.

Task Type	Model	Why
Complex architecture	Opus	Needs deep reasoning
Standard coding tasks	Sonnet	Good balance of speed and quality
Formatting, renaming	Haiku	Fast, cheap, good enough
Browser automation	Haiku	Token-heavy but simple logic

Example: playwright-gopher agent uses Haiku for browser tasks that would burn through Opus budget

Context Windows

The hidden bottleneck

Attention Degrades at Scale

Maximum effective context << advertised context

"Context Is What You Need" (arXiv:2509.21361)

200K

Advertised context window

~30-50K

Effective attention range

Context Compaction

What happens when the window gets too large

📝

Full conversation
50K tokens

→

🗜️

Compacted summary
10K tokens

→

⚠️

Information
LOST

Compaction is lossy. Critical details get summarized away.

Context Management is a Skill

Treat each agent session like a well-designed function:

One job. Executed well. Completed decisively.

Don't have multi-hour conversations with agents
Break large tasks into discrete, completable units
Start fresh sessions for fresh tasks

This leads us directly to RALPH loops...

RALPH Loops

Autonomous coding done right

Two Steep Requirements

1. Really Good Planning

Developers become project managers whether they want to or not

2. Really Good Verification

Non-deterministic output demands deterministic checks

"Historically, software developers are beyond terrible at both of these.
It is time to git gud."

LIVE DEMO

TurboMqtt RALPH Progress Check

What has the autonomous loop accomplished so far?

$ git log --oneline -10
$ dotnet test

Where LLMs Go Wrong

Understanding the failure modes

Hallucinations Are Mathematical

LLMs predict plausible outputs, not verified truths.

This is inherent to transformer architecture
It will NEVER be fully "fixed"
Better models hallucinate less, but never zero
Even a 0.1% hallucination rate = real bugs in production

LLMs Don't Reason - They Predict

They predict what text humans previously used to describe the world, one token at a time.

Not understanding - pattern matching on training data

Can produce alien-looking decisions - optimizing for prediction, not logic

But emergent behaviors ARE real - autonomous coding, image generation, translation

Three Irresolvable Limitations

1. Hallucinations - Mathematical constraint of transformer architecture

2. Finite Effective Context - Orders of magnitude smaller than advertised

3. Misalignment - Model biases can diverge from your actual goals

These aren't getting "fixed." Design your workflow around them.

Guardrail 1: Become an Effective Planner

Most Software 2.0 time is spent here

Planning Modes

Read-only research, no writes

Claude Code: /plan command or shift+tab to toggle plan mode

OpenCode: Built-in plan mode with LSP integration

General principle: Research first, write second. Always.

Three Things to Provide

Detailed description of goals and desired output
What does "done" look like? Be explicit.
Constraints
Visible from source (API contracts, types) + invisible ones ("40MB App Store limit", "must run on .NET 8")
How the LLM can verify it did its job
Specific test files, build commands, acceptance criteria

The Spec Litmus Test

"Can the agent execute your specification
without clarifying questions?"

If not, you haven't been specific enough.

The Fundamental Shift

Software 1.0

20% planning

80% coding

Software 2.0

80% planning + review

20% coding

You're becoming a specification author and project manager.

Guardrail 2: Verification at Every Level

Non-determinism demands deterministic checks

Same Prompt, Different Results

Non-determinism demands deterministic checks

Run 1

Result A

Run 2

Result B

Run 3

Result C

↓

Deterministic verification gates

Beyond Code Correctness

UI Design Verification
Screenshot comparison against approved mockups
Documentation Fact-Checking
Link validators, API reference accuracy, code sample testing
Adversarial Review
LLM-reviews-LLM with sufficient context and fresh eyes
Behavioral Verification
Does the feature actually work the way users expect?

Minimize Slack

More verification = less room for error
= more trust in autonomous operation

Every unchecked dimension is a dimension where hallucinations can hide.
Close the gaps. Tighten the checks. Trust the process, not the output.

Observability

Trust but verify

Why Observability Matters

Understanding HOW agents make decisions

Audit trail - what files were read, what was modified, in what order?
Decision tracing - why did the agent choose approach A over B?
Failure analysis - when something goes wrong, can you trace back to the root cause?
Trust building - over time, patterns emerge that build or erode confidence

LIVE DEMO

RALPH Output Log Review

Tracing agent reasoning through the log

$ tail -100 ralph-output.log
$ git log --oneline --stat -5
$ git diff HEAD~3..HEAD --stat

What decisions did the agent make? How can we trace its reasoning?

The Voyage

Nothing to be afraid of

Start Tomorrow

Three concrete steps to begin your Software 2.0 journey

Write a CLAUDE.md
Describe how to build, test, and deploy your project. Takes 30 minutes.
Close one verification gap
Add a linter rule. Write a test for that untested function. Enable a formatter.
Apply LLM assistance to that tech debt you've been avoiding
Perfect first project. Low stakes, high learning, immediate value.

Thank You!

Questions?

Blog Post: aaronstannard.com - "Software 2.0: Code is Cheap, Good Taste is Not"

dotnet-skills: github.com/petabridge/dotnet-skills

Twitter/X: @Aaronontheweb