Customer Development Plan

Validate the execution layer for AI-core organizations premise.

20 to 30 discovery conversations and 5 design-partner agreements. Sequenced to kill the ICP hypothesis fast if it's wrong, and to lock the positioning candidate (B or C) before more engineering investment.

Start-here checklist Falsifiable claims Decision gates

The shape of the work

Customer discovery for an infrastructure premise has three failure modes. Every step below is designed to avoid them.

Failure mode #1

Selling instead of learning

Pitching "execution layer for AI-core orgs" and watching people nod politely.

Failure mode #2

Anchoring on architecture

Describing shell mediation, PTY approvals, TEE attestation when the buyer doesn't know what those are.

Failure mode #3

Talking to the wrong layer

Practitioners feel the pain but don't buy. Buyers don't feel the pain. Both conversations matter, separately.

Nine steps, in order

Step 1

Convert the premise into falsifiable claims

The v4 docs pitch is a thesis. Discovery needs a list of claims that can be killed by an interview. Before you talk to anyone, write these down. Each interview should leave you with support / contradict / neutral against each one.

#	Claim	What kills it
C1	Companies running — or wanting to run — agents on consequential systems have hit operational friction the secrets/IAM/observability stack doesn't solve.	"Vault + Datadog + GitHub Actions covers it."
C2	The friction has been named and owned by a specific person, platform-eng lead or VP Eng.	Pain is real but diffuse; nobody owns it; no budget line.
C3	A security/compliance review, an incident, or a preemptive policy decision triggered concrete action.	No event, no policy, no proposal — nobody's had to think about it.
C4	A runtime-layer abstraction beats per-agent SDKs or per-LLM-provider managed agents.	"We'll just use Anthropic Managed Agents / OpenAI Agent Builder."
C5	Customer-operated deployment is acceptable, or even preferred, vs. hosted.	Buyers want SaaS only; on-prem is a non-starter; v4 ordering inverts.
C6	Connector curation by Aileron is valuable; teams won't roll their own.	Teams already have an internal connector layer they like.
C7	$25K to $100K/yr is allocatable without procurement for this category.	Every conversation drags into a 6-month enterprise cycle.

Without this rubric, 30 conversations become 30 anecdotes.

Step 2

Don't ask about the future. Ask about the past.

The Mom Test in one sentence: people are bad at predicting their behavior, fine at recounting it. Replace every "would you" with a "did you."

Don't ask

"Would you pay for a runtime that mediates credentials for your agents?"
"How important is audit for agent actions?"

Ask

"Walk me through the last time someone on your team tried to put an agent in front of a production system. What happened?"
"When SOC 2 came around last year, what did you say about the agent in your support pipeline?"

Past behavior is the only signal. If they've never tried to do the thing, the premise hasn't reached them yet, which is itself information.

Step 3

Recruit three layers, separately

The ICP hypothesis names buyers. That's correct but insufficient. You need three populations, around 10 each.

Layer 1

Practitioners

Platform engineers, AI infra engineers, sec engineers who touched the problem.

You learn

Whether the pain is real and what they actually tried.

Layer 2

Buyers

Platform-eng leads, VPEs, heads of AI infra at 50 to 500 engineer cos.

You learn

Whether the pain has an owner, a budget, a trigger event.

Layer 3, highest signal

Lost prospects

Teams who chose Anthropic Managed Agents, OpenAI Frontier, Clawvisor, or built their own substrate (Klarna, Block/Goose, Cloudflare iMARS, EY).

You learn

Why the v4 story would not have reached them. Sharpest signal in the set.

Step 4

Sourcing — where the 30 come from

The ICP hypothesis is "50 to 500 engineers, SaaS, US, already on Docker/K8s, already shipped an agent-powered internal tool." Channels in priority order:

1. Warm intros from your founder network. YC alumni, investor partner network, previous-company colleagues at target-shaped cos. Highest reply rate, lowest selection bias.
2. AI-ops / platform-eng communities. MLOps Community Slack, r/devops, Rands Leadership Slack, AI-engineering Discords. Post a specific ask. Don't pitch.
3. Conference networks. AI Engineer Summit, KubeCon AI track, Re:Invent platform-eng track. Recent attendees are pre-filtered.
4. LinkedIn cold outreach to titles at companies. Build a list of 100 companies matching ICP shape, identify the platform-eng lead, send a specific opener referencing something concrete about their stack.
5. Inbound from docs / GitHub / community. People showing up are self-selected. Those who file issues are gold. They tried it.

Realistic numbers. Warm intros convert at around 50%, community posts at around 5%, cold outreach at 2 to 4%. To get 30 conversations you need around 150 outreach attempts.

Step 5

The interview scripts (30 min default, 60 min full)

Five blocks, each timed. Two variants share the same shape and C1–C7 anchoring: a 30-minute default for first conversations and cold-outreach slots, and a 60-minute version for warm contacts who'll go deep. Variability in answers is signal; variability in questions is noise.

Both fillable scripts — with specific question prompts, probes, and per-block don'ts — live at the interview script hub. Pick the script that matches the slot length. The block breakdown below reflects the 60-minute timings; the 30-minute variant compresses to roughly half.

1. Context

10 min

Role, team, AI footprint, and the consequential systems they rely on. Calibrate "agent" and "consequential" on their terms. No leading.

2. The trigger event

20 min — load-bearing

Has the team put an agent in front of a consequential system — shipped, tried, wanted, or never proposed? The branch matters as much as the answer. C1, C2, C3 live or die here.

3. What they tried

15 min

Tools, scripts, internal wrappers, hosted offerings they evaluated. What was rejected and why. C4, C5, C6 live or die here.

4. The pitch test (end only)

10 min

One sentence: "If there were a substrate that let your teams share the same vetted skills, gated approvals where they mattered, kept the irreversible actions deterministic instead of LLM-decided, and gave you an audit trail per action — would that have helped here?" Watch the reaction shape. Don't sell. Read.

5. Budget & authority

5 min

"Who would sign off on something like this?" "How does that kind of purchase happen?" C2 and C7 live or die here.

Step 6

Capture in a single coding sheet, not in notes

Each interview becomes a meeting page at /meetings/ with structured C1–C7 codes in its frontmatter. Those rows aggregate live at /discovery/matrix/ — the coding sheet, the per-claim tally, and the decision-gate counter all on one page. After every 5 conversations, re-read it:

Are claims clustering toward support or contradict?
Is one of the positioning candidates (A fleet / B compliance / C vendor-neutral) emerging from buyer language?
Are the competing tools the ones you expected, or new ones you hadn't mapped?

Open the C1–C7 matrix →

Step 7

Pre-commit to decision points

Don't run 30 conversations and then decide. Pre-commit at three milestones.

After 10 conversations

Is the trigger event showing up?

If fewer than 2 out of 10 had one, the premise is wrong-shaped. Revise the script, possibly the ICP, before continuing.

After 20 conversations

Lock the positioning candidate

Which of A / B / C is buyers' language gravitating toward? Lock the pitch for the final 10.

After 30 conversations

Go / pivot / kill

If 5 conversations end with "yes, I'd deploy this if you'd talk to me again," you have your design partners. If 0 do, the runtime-first thesis needs surgery before more engineering investment.

Step 8

Convert pattern matches to design partners

Design partners ≠ interviewees who said nice things. They sign something, even a one-page LOI, that commits to:

A concrete deployment timeline (not "someday").
Willingness to pay at the eventual price point, even if v4 is free.
A weekly feedback cadence.
Permission to be referenced (logos help the next 30 conversations).

If you can't get a one-pager signed, the verbal interest wasn't real. Push the conversation until you find out which it is.

Step 9

What to not do

×Don't demo the v4 architecture. Shell-layer interception and PTY approvals are how you deliver the value; they are not the value the buyer hears.
×Don't lead with security. B-positioning is one of three candidate stories, not the validated one. Let the buyer surface security on their own.
×Don't talk about Clawvisor / Infisical unprompted. When buyers raise them, listen carefully. That's your competitive landscape.
×Don't promise features. Every "we could build that" is debt. The right answer is "tell me more about that need" and then write it down.

Start-here checklist

The hardest part is starting. Once 3 conversations are on the calendar, the muscle memory takes over.

Write the falsifiable claim list (Step 1)

30 minutes
Pick a script at the interview hub and copy it into your notes surface per conversation (/interview/)

5 minutes per interview
Identify the first 10 conversations

3 warm + 3 community + 4 lost-prospects, around 1 hour
Send the first 10 outreach messages

1 hour
Set up the coding sheet (Step 6)

30 minutes

Next moves

Interview scripts as fillable docs

Two scripts — 30-minute default and 60-minute full — both expanded into specific question prompts with timing markers. Browse the hub.

Outreach message templates

Four variants: warm / community / cold / lost-prospect, each tuned to its channel. Open the templates.

Target recruiting list

~150 named candidates across YC W26, recent funding, AI-mature buyers, and competitor customer pools, with LinkedIn search hints.

Sharper falsifiable-claim list

C1 to C7 rewritten with explicit numeric kill criteria. Tracked as issue #6.