Validate the execution layer for AI-core organizations premise.
20 to 30 discovery conversations and 5 design-partner agreements. Sequenced to kill the ICP hypothesis fast if it's wrong, and to lock the positioning candidate (B or C) before more engineering investment.
Customer discovery for an infrastructure premise has three failure modes. Every step below is designed to avoid them.
Nine steps, in order
Step 1
Convert the premise into falsifiable claims
The v4 docs pitch is a thesis. Discovery needs a list of claims that can be killed by an interview. Before you talk to anyone, write these down. Each interview should leave you with support / contradict / neutral against each one.
#
Claim
What kills it
C1
Companies running — or wanting to run — agents on consequential systems have hit operational friction the secrets/IAM/observability stack doesn't solve.
"Vault + Datadog + GitHub Actions covers it."
C2
The friction has been named and owned by a specific person, platform-eng lead or VP Eng.
Pain is real but diffuse; nobody owns it; no budget line.
C3
A security/compliance review, an incident, or a preemptive policy decision triggered concrete action.
No event, no policy, no proposal — nobody's had to think about it.
C4
A runtime-layer abstraction beats per-agent SDKs or per-LLM-provider managed agents.
"We'll just use Anthropic Managed Agents / OpenAI Agent Builder."
C5
Customer-operated deployment is acceptable, or even preferred, vs. hosted.
Buyers want SaaS only; on-prem is a non-starter; v4 ordering inverts.
C6
Connector curation by Aileron is valuable; teams won't roll their own.
Teams already have an internal connector layer they like.
C7
$25K to $100K/yr is allocatable without procurement for this category.
Every conversation drags into a 6-month enterprise cycle.
Without this rubric, 30 conversations become 30 anecdotes.
Step 2
Don't ask about the future. Ask about the past.
The Mom Test in one sentence: people are bad at predicting their behavior, fine at recounting it. Replace every "would you" with a "did you."
Past behavior is the only signal. If they've never tried to do the thing, the premise hasn't reached them yet, which is itself information.
Step 3
Recruit three layers, separately
The ICP hypothesis names buyers. That's correct but insufficient. You need three populations, around 10 each.
Step 4
Sourcing — where the 30 come from
The ICP hypothesis is "50 to 500 engineers, SaaS, US, already on Docker/K8s, already shipped an agent-powered internal tool." Channels in priority order:
1. Warm intros from your founder network. YC alumni, investor partner network, previous-company colleagues at target-shaped cos. Highest reply rate, lowest selection bias.
2. AI-ops / platform-eng communities. MLOps Community Slack, r/devops, Rands Leadership Slack, AI-engineering Discords. Post a specific ask. Don't pitch.
3. Conference networks. AI Engineer Summit, KubeCon AI track, Re:Invent platform-eng track. Recent attendees are pre-filtered.
4. LinkedIn cold outreach to titles at companies. Build a list of 100 companies matching ICP shape, identify the platform-eng lead, send a specific opener referencing something concrete about their stack.
5. Inbound from docs / GitHub / community. People showing up are self-selected. Those who file issues are gold. They tried it.
Realistic numbers.
Warm intros convert at around 50%, community posts at around 5%, cold outreach at 2 to 4%. To get 30 conversations you need around 150 outreach attempts.
Step 5
The interview scripts (30 min default, 60 min full)
Five blocks, each timed. Two variants share the same shape and C1–C7 anchoring: a 30-minute default for first conversations and cold-outreach slots, and a 60-minute version for warm contacts who'll go deep. Variability in answers is signal; variability in questions is noise.
Both fillable scripts — with specific question prompts, probes, and per-block don'ts — live at the interview script hub. Pick the script that matches the slot length. The block breakdown below reflects the 60-minute timings; the 30-minute variant compresses to roughly half.
1. Context
10 min
Role, team, AI footprint, and the consequential systems they rely on. Calibrate "agent" and "consequential" on their terms. No leading.
2. The trigger event
20 min — load-bearing
Has the team put an agent in front of a consequential system — shipped, tried, wanted, or never proposed? The branch matters as much as the answer. C1, C2, C3 live or die here.
3. What they tried
15 min
Tools, scripts, internal wrappers, hosted offerings they evaluated. What was rejected and why. C4, C5, C6 live or die here.
4. The pitch test (end only)
10 min
One sentence: "If there were a substrate that let your teams share the same vetted skills, gated approvals where they mattered, kept the irreversible actions deterministic instead of LLM-decided, and gave you an audit trail per action — would that have helped here?" Watch the reaction shape. Don't sell. Read.
5. Budget & authority
5 min
"Who would sign off on something like this?" "How does that kind of purchase happen?" C2 and C7 live or die here.
Step 6
Capture in a single coding sheet, not in notes
Each interview becomes a meeting page at /meetings/ with structured C1–C7 codes in its frontmatter. Those rows aggregate live at /discovery/matrix/ — the coding sheet, the per-claim tally, and the decision-gate counter all on one page. After every 5 conversations, re-read it:
Are claims clustering toward support or contradict?
Is one of the positioning candidates (A fleet / B compliance / C vendor-neutral) emerging from buyer language?
Are the competing tools the ones you expected, or new ones you hadn't mapped?