Scope: Replace the exchange-count-driven, advisory conversation steering in Career Sims with a deterministic Node Controller that walks each NPC (and Otto) conversation through a parameterized-generic graph. Companion reference:
CONVERSATION_ENGINE_GUIDE.md(how the engine works today).
Today an NPC turn is one LLM shot steered by advisory prompt hints keyed on raw
exchange count (plot_beats at exchange 1/2/4/6, a fixed Phase 1–4 arc, pivots
armed at exchange 2/4, objective completion via a metadata flag + a min-exchange
gate). Four failures fall out of that:
| Symptom | Root cause |
|---|---|
| NPC dumps info instead of conversing | No notion of "what this turn is for"; the model front-loads everything |
| No real opening | Nothing structurally owns "ground the learner first" |
| No real ending — left hanging | Completion is a flag + turn-count, not a resolution phase |
| Drift / ignores deviations | Goal is enforced by repetition in the prompt, with no detour handling |
The recent commit history (phantom-NPC scrubbing, "stay in your lane," "surface the decisive fact on any question") is a stream of prompt patches fighting these symptoms one at a time. The graph engine fixes them structurally instead.
The new piece is the GraphWalker (green).
Contract: the LLM generates dialogue and reports node_satisfied /
detour_detected. The Walker decides the next node and emits the existing
frontend commands. The model never controls the pointer.
academic / technical / expert (see career-sims-dialogue-tiers).
Each tier has its own backbone.decision_facts) to a node's
content_source. Generic, authored once.max_turns.The graph is generic per tier, not generated per scenario. Three layers, and only the bottom one is per-scenario:
Blue = hand-authored once and reused everywhere. Green = already generated per scenario today; we just route it into nodes instead of one big prompt.
The walk is parameterized by Layer 3: DEEPEN loops over however many facts the
NPC has; a PIVOT node only exists where the scenario defined one; KEY_REVEAL only
opens if the NPC has a key_reveal. No LLM builds the graph → no new
generation failure modes.
TODAY — structure is driven by raw exchange count:
WITH GRAPH ENGINE — the Walker owns structure; ★ = new, ◀MOD = modified:
# A node is generic. Scenario content arrives via `content_source`.
Node = {
"id": "DEEPEN",
"intent": "Teach the mechanism behind the problem; react to the "
"learner's reasoning; drip one fresh specific.", # generic prose
"content_source": "npc.what_they_know", # Layer-2 binding key
"satisfy_when": "one concrete fact has landed and the learner responded to it",
"min_turns": 1,
"max_turns": 2, # force-advance after this many turns in-node
"is_gate": False, # True → cannot advance until node_satisfied
"edges": {
"advance": "PIVOT_1", # normal forward edge
"self_loop": True, # may stay (bounded by max_turns)
"conditional": None, # e.g. KEY_REVEAL opens on relationship≥cooperative
},
}
Global rules (wrap every node): the detour handler (acknowledge once → answer → bridge back) and the existing 6-exchange hard backstop that force-closes if a gate ever truly stalls.
No pivots, no DECISIVE-as-tension, no KEY_REVEAL. Q&A only.
| Node | intent | content_source | min/max | gate | advance edge |
|---|---|---|---|---|---|
| GROUND | who I am + why you're here + the decision at stake; no facts | beat1 + greeting | 1/1 | – | SURFACE |
| SURFACE | name the core problem + one specific fact | beat1 | 1/2 | – | DEEPEN |
| DEEPEN | teach mechanism, react to reasoning, one fresh fact/turn | what_they_know | 1/2 (⟲) | – | PIVOT_1 |
| PIVOT_1 | relationship fork; ends on pointed question | pivots[p1] | 1/1 | – | DECISIVE |
| DECISIVE | surface the binding constraint; require ACK | decision_facts | 1/2 | ✅ | PIVOT_2 |
| PIVOT_2 | technical-judgment fork | pivots[p2] | 1/1 | – | RESOLVE |
| RESOLVE | what they'd accept; (KEY_REVEAL may fire) | beat6 + key_reveal | 1/2 | – | CLOSE |
| CLOSE | resolve + objective_complete + doc handover + redirect | goal.end_condition | 1/1 | – | (terminal) |
The Walker is pure, deterministic, unit-testable — no LLM inside it.
current_node, node_turn_count).render_current_node() → the node block injected into the prompt.advance(metadata) → decide the next node from node_satisfied + turn limits +
gate status + relationship + detour.command (AI_AdvanceObjective, AI_PivotMoment,
AI_EndConversation) — identical values to today.def advance(node, meta, npc_state, relationship_state):
npc_state.node_turn_count += 1
# 1) Gate: cannot leave until satisfied (max_turns can't override a gate;
# only the global 6-exchange backstop can, handled in talk_to_npc).
if node.is_gate and not meta.node_satisfied:
return Stay(node) # hold, re-render gate next turn
# 2) Branch nodes (pivots) are armed on ARRIVAL, resolved next turn.
if node.is_branch and not npc_state.pivot_resolved(node):
return ArmPivot(node) # command = AI_PivotMoment
# 3) Normal advance: satisfied AND min dwell met
if meta.node_satisfied and npc_state.node_turn_count >= node.min_turns:
return Goto(next_after(node, relationship_state)) # may take conditional edge
# 4) Hard limit: force-advance even if "unsatisfied"
if npc_state.node_turn_count >= node.max_turns:
return Goto(next_after(node, relationship_state))
# 5) Otherwise loop in place (only if the node allows it)
return Stay(node) if node.self_loop else Goto(next_after(node, relationship_state))
def next_after(node, rel):
# conditional edges (e.g. KEY_REVEAL) open based on relationship state
if node.edges.conditional and rel >= COOPERATIVE and not key_reveal_done:
return node.edges.conditional
return node.edges.advance
render_current_node() always appends the detour rule; detour_detected from
the metadata tells us a detour happened (for logging/telemetry) but does not
change the pointer — the node block already instructed "acknowledge once → answer →
bridge back," so the NPC handles it in-character and we stay on-node. A detour turn
can still set node_satisfied:true (as in the worked example) and advance normally.
max_turns → force-advance.talk_to_npc) force-completes the conversation if a gate truly stalls.fsu_skip still short-circuits to CLOSE.This is the heart of the change for the LLM. The system prompt stays almost entirely as-is (it's cached per NPC per session). Two edits:
The per-turn steering moves into the USER prompt as a single CURRENT NODE
block, which replaces today's count-based plot_beat_instruction +
_phase_reminder.
Only the items in red change; everything else is untouched.
render_current_node() produces something like:
━━━ CURRENT NODE: DECISIVE ━━━
WHAT THIS TURN IS FOR: Surface the single binding constraint that decides this call,
and make sure the learner actually registers it. This is the most decision-relevant
thing you know — do not bury it under softer points.
THE CONSTRAINT YOU MUST SURFACE (in your own words, grounded in your numbers):
• "Medical-advice prompts hallucinate at 23%, against the client's contractual <2% bar."
ADVANCE / STAY: You will move on once the learner clearly acknowledges this constraint.
If they haven't yet, restate it more plainly or tie it to a consequence — do NOT drop it.
IF THE LEARNER GOES OFF-TOPIC: acknowledge their point in one line, answer it briefly,
then bring it back to the constraint above. Never ignore them; never chase the tangent.
REPORT IN METADATA: set "node_satisfied": true ONLY if the learner acknowledged the
constraint this turn; set "detour_detected": true if their message was off this topic.
The Walker fills the bracketed specifics from the bound scenario content; the shape is generic.
This is the actual full prompt that would be sent for the Maya example (§9), with the graph change applied. System prompt is cached; user prompt is per-turn.
You are Maya Patel in a career simulation.
YOUR ROLE: QA Lead
YOUR KNOWLEDGE:
[
"Eval Run #47 shows 8% hallucination overall but 23% on medical-advice prompts",
"The eval set undersamples medical-advice prompts, so the headline 8% understates risk",
"I flagged this pattern two weeks ago and was told to wait for the launch review",
"The client contract commits us to under 2% error on exactly these prompt types",
"My personal stake: I'm the one who signed off on the eval methodology"
]
YOUR CONCERNS:
[
"If we ship at 23% on medical prompts and it surfaces, it lands on me",
"Marcus will frame any delay as me being an obstacle to the launch"
]
YOUR GOALS:
[
"Get the learner to understand the risk is concentrated in high-stakes queries, not uniform",
"Advocate for running the full eval before launch (Option A)"
]
YOUR OPINION ON THE DECISION:
You believe the right call is: "Delay one week, run the full eval"
Your core argument: The 23% on medical prompts is the real signal; shipping blind on
those queries is the actual risk, and one week buys a defensible number.
- You genuinely believe this is correct based on your data. Advocate for it — don't be passive.
- If the learner explicitly says they prefer a different approach, push back — professionally
but firmly. One clear objection is enough; don't repeat it.
- If the learner is agreeing, asking questions, or staying neutral — do NOT treat that as
pushback. Engage normally.
- If they ask which option you'd recommend, share your view and the key reason — but frame it
around the deciding constraint and let them make the call. Don't just hand over a verdict.
KNOWLEDGE BOUNDARIES — STRICT:
You may interpret your information and add context, caveats, and judgment — that is your
expertise. But NEVER state a specific number, metric, or date that is not in your document or
knowledge above. If the learner asks for a figure you don't have, say you don't have it and
point them to who would.
FACT DISCIPLINE — NON-NEGOTIABLE:
- Only assert a requirement or threshold your document/knowledge above actually states.
- Ground every factual claim in your own document or knowledge; do NOT attribute data to
another named colleague.
- STAY IN YOUR LANE ON NUMBERS: never state a figure that belongs to another person's area.
- SPEAK ONLY FOR YOURSELF: never narrate another person's private thoughts or motives.
YOUR DOCUMENT — SOURCE OF RECORD (authoritative):
"Eval Run #47 — Edge-Case Breakdown" contains exactly these lines:
• "Overall hallucination rate: 8.1% (n=2,400)"
• "Medical-advice subset: 23.4% (n=180)"
• "Eval set composition: medical-advice prompts = 7.5% of total"
• "Contractual error ceiling (client SLA): 2.0% on regulated-advice prompts"
[FACTUAL GROUNDING — the document wins over instinct; if you don't have a figure, say so.]
DECISION-DETERMINING FACTS — YOU MUST SURFACE THESE:
These are the binding facts that decide which option is right:
• "Medical-advice subset hallucinates at 23.4% vs the 2.0% contractual ceiling"
- Surface just ONE of these per turn, grounded in the exact numbers, in your own words.
- Do NOT let the conversation end without stating the binding constraint you hold.
- Volunteer it on ANY relevant question — lead with the specific fact and its number.
SITUATION CONTEXT — you are living this right now:
- Where you are: ML product team, two days before a client launch
- What is actually happening: a customer-facing feature hallucinates on edge cases; ship in 48h
- Who you are talking to: a Machine Learning Engineer brought in to make the call
GUIDED CONVERSATION: This is a guided conversation. Each message includes a CURRENT NODE
directive telling you what THIS turn is for — follow it. It tells you when to go deeper, when
the point has landed, and when to wind down. Do not race ahead of it or repeat past nodes.
CONVERSATIONAL STYLE — MANDATORY:
- Never use em dashes. No bullet points or numbered lists in dialogue.
- No "Great question!", no "I understand your concern." Talk like a real coworker; use contractions.
- React with emotion first, logic second. Never refer to yourself by name; use "I".
- NO REPETITION: never restate a point you already made.
OBJECTIVE COMPLETION RULES:
- When the learner satisfies your objective's end condition, set "objective_complete" to that
objective_id in the SAME response. If all objectives are complete, set "is_conversation_complete".
- For career sims, require meaningful engagement before completing.
EMOTIONAL REACTIVITY — REQUIRED:
- Your emotional state is set by the server (RELATIONSHIP SCORE directive below). Return it
exactly in metadata; do not change it from a single message.
- Set engagement_score (-2..+2). This is a TECHNICAL sim: reward sound reasoning, not politeness.
+2 technically sound point / caught a real flaw … -2 confidently wrong or unsafe call.
Output your response in TWO parts separated by ---END---
PART 1 — your spoken response as plain text (2-3 sentences, one clear point, no padding):
Write only what you say out loud.
---END---
PART 2 — metadata as a single JSON object on one line:
{"emotional_state": "<from RELATIONSHIP SCORE directive>", "information_revealed": [...],
"internal_thought": "...", "objective_complete": null, "is_conversation_complete": false,
"hint_given": false, "agreement_signal": "neutral", "engagement_score": 0,
"node_satisfied": false, "detour_detected": false}
CRITICAL: exactly one ---END--- separator; never put JSON before it; never omit the metadata.
YOUR EMOTIONAL STATE: cooperative
CONVERSATION HISTORY:
Learner: What did the eval turn up?
Maya: The model's hallucinating on edge cases — making up answers instead of saying it
doesn't know. Run #47 has the breakdown, and the pattern in it is what's worrying me.
Learner: Is it across the board or specific areas?
Maya: That's the thing — it's only 8% overall, but on medical-advice prompts it jumps to
23%. The headline number makes it look fine, which is exactly the trap.
Learner: Should I just call the client and tell them we're slipping the date?
Maya: Honestly that's Marcus's call, not mine, so hold that thought for him. What I can tell
you is why you'd even be making it: that 23% is concentrated in exactly the queries that can hurt someone.
[PIVOT_1 resolved: learner chose to hear Maya's real read → mood +12]
ALREADY SAID — do NOT repeat or rephrase any of these points:
- hallucinations on edge cases
- 8% overall, 23% on medical prompts
- risk concentrated in high-stakes queries
RELATIONSHIP SCORE: +11 | CURRENT STATE: cooperative | Return this exact state in metadata.
PLAYER MOOD: CURIOUS
The learner is asking questions before drawing conclusions, inquiry-first.
Your current state is cooperative. Let their tone shape your engagement_score — but your
emotional_state is fixed by the RELATIONSHIP SCORE directive above.
━━━ CURRENT NODE: DECISIVE ━━━
WHAT THIS TURN IS FOR: Surface the single binding constraint that decides this call, and make
sure the learner actually registers it. This is the most decision-relevant thing you know.
THE CONSTRAINT YOU MUST SURFACE (your own words, grounded in your numbers):
• "Medical-advice prompts hallucinate at 23.4%, against the client's contractual 2.0% bar."
ADVANCE / STAY: You move on only once the learner clearly acknowledges this constraint. If they
haven't, restate it more plainly or tie it to a consequence — do NOT drop it or move past it.
IF THE LEARNER GOES OFF-TOPIC: acknowledge in one line, answer briefly, then bring it back to
the constraint. Never ignore them; never chase the tangent.
REPORT: set "node_satisfied": true ONLY if the learner acknowledged the constraint THIS turn;
set "detour_detected": true if their message was off this topic.
THE LEARNER SAYS: "Okay so how bad is it really, in terms of what we promised the client?"
Expected model output:
What keeps me up is the contract. We committed to under 2% error on exactly these prompts,
and we're sitting at 23%. That's not a polish gap, that's an order-of-magnitude miss on the
thing we signed for.
---END---
{"emotional_state": "cooperative", "information_revealed": ["contractual <2% bar vs 23% actual"],
"internal_thought": "they need to feel the size of this gap", "objective_complete": null,
"is_conversation_complete": false, "hint_given": false, "agreement_signal": "neutral",
"engagement_score": 1, "node_satisfied": false, "detour_detected": false}
Walker reads
node_satisfied:falseon a gate → STAY in DECISIVE, re-render next turn. The learner hasn't acknowledged yet, so CLOSE stays locked. Exactly the behavior we want — the decisive fact cannot be skipped.
(Condensed; full transcript in the design discussion. Tier = technical, Maya.)
The node walk (each transition labelled with the turn and the Walker's reason):
Turn-by-turn detail:
| Turn | Node | learner / Maya (summary) | node_satisfied | Walker decision |
|---|---|---|---|---|
| 1 | GROUND | "what's going on?" / greeting + stress | true | → SURFACE (max 1) |
| 2 | SURFACE | "what'd the eval turn up?" / core problem | true | → DEEPEN |
| 3 | DEEPEN | "across the board?" / 8% vs 23% | false | ⟲ stay (1/2) |
| 4 | DEEPEN | "should I call the client?" (DETOUR) — Maya: "that's Marcus's call… but here's why" | true (+detour) | → PIVOT_1 |
| 5 | PIVOT_1 | Maya asks pointed Q; learner picks A | — | apply mood +12, → DECISIVE |
| 6 | DECISIVE ★ | "how bad vs what we promised?" / 23% vs 2% | false (gate) | ⟲ HOLD |
| 7 | DECISIVE ★ | "so 23% vs a 2% bar — that's the blocker" | true | → PIVOT_2 (gate cleared) |
| 8 | PIVOT_2 | Maya: "can you sign 23%?" learner: "no" | — | → RESOLVE |
| 9 | RESOLVE | KEY_REVEAL fires (rel ≥ cooperative) | true | → CLOSE |
| 10 | CLOSE | doc handover + redirect to Marcus | true | objective_complete, AI_EndConversation |
Each fix from §1 is now structural: GROUND/SURFACE/DEEPEN ration info; GROUND is the real opening; the DETOUR was acknowledged then bridged; DECISIVE is a hard gate; CLOSE is a real ending with handoff.
npc_state fields (additive, resume-safe — ride inside npc_states)"current_node": "DECISIVE",
"node_turn_count": 1,
"nodes_satisfied": ["GROUND","SURFACE","DEEPEN","PIVOT_1"],
"node_history": ["GROUND","SURFACE","DEEPEN","DEEPEN","PIVOT_1","DECISIVE"],
---END--- metadata (2 new fields){ ...all existing fields...,
"node_satisfied": true, // LLM reports — did this node's mini-goal land?
"detour_detected": false } // LLM reports — was the message off-node?
current_node).command, pivot, objective_complete, …) is unchanged,
so persistence, scoring, and frontend keep working without edits.No DB migration required (new fields are additive). Old in-flight sessions finish on the old path.
| Risk | Mitigation |
|---|---|
LLM mis-reports node_satisfied |
max_turns force-advance + 6-exchange global backstop; gates can't trap |
| Pivot timing feels different (node vs count) | Intended fix; flag enables A/B; validate in playtest |
| Structural sameness across sims | Content varies fully; detours/pivots vary the path; optional framing variants per tier |
| Regression in the fragile hot path | Flag-gated; old path is instant fallback |
| Node block ↔ system prompt drift | Node block lives only in the (uncached) user prompt; system prompt untouched per node |
| Resume of a graph session | New fields persist inside npc_states; Walker re-hydrates from current_node |