Conversation Graph Engine — Design Doc (v1)

Scope: Replace the exchange-count-driven, advisory conversation steering in Career Sims with a deterministic Node Controller that walks each NPC (and Otto) conversation through a parameterized-generic graph. Companion reference: CONVERSATION_ENGINE_GUIDE.md (how the engine works today).


Table of contents

  1. Why · 2. The idea in one picture · 3. Core concepts · 4. The three layers · 5. Architecture · 6. The node graph · 7. The GraphWalker · 8. How the prompt changes (+ full example) · 9. End-to-end worked flow · 10. State & data contract · 11. Implementation plan · 12. Migration & rollout · 13. Test plan · 14. Risks · 15. Future

1. Why

Today an NPC turn is one LLM shot steered by advisory prompt hints keyed on raw exchange count (plot_beats at exchange 1/2/4/6, a fixed Phase 1–4 arc, pivots armed at exchange 2/4, objective completion via a metadata flag + a min-exchange gate). Four failures fall out of that:

Symptom Root cause
NPC dumps info instead of conversing No notion of "what this turn is for"; the model front-loads everything
No real opening Nothing structurally owns "ground the learner first"
No real ending — left hanging Completion is a flag + turn-count, not a resolution phase
Drift / ignores deviations Goal is enforced by repetition in the prompt, with no detour handling

The recent commit history (phantom-NPC scrubbing, "stay in your lane," "surface the decisive fact on any question") is a stream of prompt patches fighting these symptoms one at a time. The graph engine fixes them structurally instead.


2. The idea in one picture

[Diagram]

The new piece is the GraphWalker (green).

Contract: the LLM generates dialogue and reports node_satisfied / detour_detected. The Walker decides the next node and emits the existing frontend commands. The model never controls the pointer.


3. Core concepts


4. The three layers (parameterized-generic)

The graph is generic per tier, not generated per scenario. Three layers, and only the bottom one is per-scenario:

[Diagram]

Blue = hand-authored once and reused everywhere. Green = already generated per scenario today; we just route it into nodes instead of one big prompt.

The walk is parameterized by Layer 3: DEEPEN loops over however many facts the NPC has; a PIVOT node only exists where the scenario defined one; KEY_REVEAL only opens if the NPC has a key_reveal. No LLM builds the graph → no new generation failure modes.


5. Architecture

5.1 Component map

[Diagram]

5.2 Turn flow — before vs after

TODAY — structure is driven by raw exchange count:

[Diagram]

WITH GRAPH ENGINE — the Walker owns structure; ★ = new, ◀MOD = modified:

[Diagram]

6. The node graph

6.1 Generic node anatomy

# A node is generic. Scenario content arrives via `content_source`.
Node = {
  "id":            "DEEPEN",
  "intent":        "Teach the mechanism behind the problem; react to the "
                   "learner's reasoning; drip one fresh specific.",   # generic prose
  "content_source": "npc.what_they_know",        # Layer-2 binding key
  "satisfy_when":  "one concrete fact has landed and the learner responded to it",
  "min_turns":     1,
  "max_turns":     2,            # force-advance after this many turns in-node
  "is_gate":       False,        # True → cannot advance until node_satisfied
  "edges": {
     "advance":    "PIVOT_1",    # normal forward edge
     "self_loop":  True,         # may stay (bounded by max_turns)
     "conditional": None,        # e.g. KEY_REVEAL opens on relationship≥cooperative
  },
}

6.2 TECHNICAL backbone (the default — full v1)

[Diagram]

Global rules (wrap every node): the detour handler (acknowledge once → answer → bridge back) and the existing 6-exchange hard backstop that force-closes if a gate ever truly stalls.

6.3 ACADEMIC backbone (collapses to ~3 nodes)

[Diagram]

No pivots, no DECISIVE-as-tension, no KEY_REVEAL. Q&A only.

6.4 EXPERT backbone (technical + depth)

[Diagram]

6.5 Node spec — TECHNICAL tier (reference table)

Node intent content_source min/max gate advance edge
GROUND who I am + why you're here + the decision at stake; no facts beat1 + greeting 1/1 SURFACE
SURFACE name the core problem + one specific fact beat1 1/2 DEEPEN
DEEPEN teach mechanism, react to reasoning, one fresh fact/turn what_they_know 1/2 (⟲) PIVOT_1
PIVOT_1 relationship fork; ends on pointed question pivots[p1] 1/1 DECISIVE
DECISIVE surface the binding constraint; require ACK decision_facts 1/2 PIVOT_2
PIVOT_2 technical-judgment fork pivots[p2] 1/1 RESOLVE
RESOLVE what they'd accept; (KEY_REVEAL may fire) beat6 + key_reveal 1/2 CLOSE
CLOSE resolve + objective_complete + doc handover + redirect goal.end_condition 1/1 (terminal)

7. The GraphWalker (the Controller)

The Walker is pure, deterministic, unit-testable — no LLM inside it.

7.1 Responsibilities

  1. Hold the pointer (current_node, node_turn_count).
  2. render_current_node() → the node block injected into the prompt.
  3. advance(metadata) → decide the next node from node_satisfied + turn limits + gate status + relationship + detour.
  4. Emit the frontend command (AI_AdvanceObjective, AI_PivotMoment, AI_EndConversation) — identical values to today.

7.2 Advance algorithm (pseudocode)

def advance(node, meta, npc_state, relationship_state):
    npc_state.node_turn_count += 1

    # 1) Gate: cannot leave until satisfied (max_turns can't override a gate;
    #    only the global 6-exchange backstop can, handled in talk_to_npc).
    if node.is_gate and not meta.node_satisfied:
        return Stay(node)                      # hold, re-render gate next turn

    # 2) Branch nodes (pivots) are armed on ARRIVAL, resolved next turn.
    if node.is_branch and not npc_state.pivot_resolved(node):
        return ArmPivot(node)                  # command = AI_PivotMoment

    # 3) Normal advance: satisfied AND min dwell met
    if meta.node_satisfied and npc_state.node_turn_count >= node.min_turns:
        return Goto(next_after(node, relationship_state))   # may take conditional edge

    # 4) Hard limit: force-advance even if "unsatisfied"
    if npc_state.node_turn_count >= node.max_turns:
        return Goto(next_after(node, relationship_state))

    # 5) Otherwise loop in place (only if the node allows it)
    return Stay(node) if node.self_loop else Goto(next_after(node, relationship_state))


def next_after(node, rel):
    # conditional edges (e.g. KEY_REVEAL) open based on relationship state
    if node.edges.conditional and rel >= COOPERATIVE and not key_reveal_done:
        return node.edges.conditional
    return node.edges.advance

7.3 Detour handling (global, every node)

render_current_node() always appends the detour rule; detour_detected from the metadata tells us a detour happened (for logging/telemetry) but does not change the pointer — the node block already instructed "acknowledge once → answer → bridge back," so the NPC handles it in-character and we stay on-node. A detour turn can still set node_satisfied:true (as in the worked example) and advance normally.

7.4 Why nobody gets stuck


8. How the prompt changes

This is the heart of the change for the LLM. The system prompt stays almost entirely as-is (it's cached per NPC per session). Two edits:

  1. The fixed "CONVERSATION ARC — Phase 1–4" block (and the teaching-arc variant) is removed and replaced by a short, static line: "This is a guided conversation. Follow the CURRENT NODE directive in each message — it tells you what this turn is for."
  2. The output contract gains the two reported fields.

The per-turn steering moves into the USER prompt as a single CURRENT NODE block, which replaces today's count-based plot_beat_instruction + _phase_reminder.

8.1 Prompt assembly: before → after

Only the items in red change; everything else is untouched.

[Diagram]

8.2 The CURRENT NODE block (the new injected piece)

render_current_node() produces something like:

━━━ CURRENT NODE: DECISIVE ━━━
WHAT THIS TURN IS FOR: Surface the single binding constraint that decides this call,
and make sure the learner actually registers it. This is the most decision-relevant
thing you know — do not bury it under softer points.
THE CONSTRAINT YOU MUST SURFACE (in your own words, grounded in your numbers):
  • "Medical-advice prompts hallucinate at 23%, against the client's contractual <2% bar."
ADVANCE / STAY: You will move on once the learner clearly acknowledges this constraint.
If they haven't yet, restate it more plainly or tie it to a consequence — do NOT drop it.
IF THE LEARNER GOES OFF-TOPIC: acknowledge their point in one line, answer it briefly,
then bring it back to the constraint above. Never ignore them; never chase the tangent.
REPORT IN METADATA: set "node_satisfied": true ONLY if the learner acknowledged the
constraint this turn; set "detour_detected": true if their message was off this topic.

The Walker fills the bracketed specifics from the bound scenario content; the shape is generic.

8.3 A complete example prompt (Maya @ DECISIVE node)

This is the actual full prompt that would be sent for the Maya example (§9), with the graph change applied. System prompt is cached; user prompt is per-turn.

SYSTEM PROMPT (static, cached for the whole session)

You are Maya Patel in a career simulation.

YOUR ROLE: QA Lead

YOUR KNOWLEDGE:
[
  "Eval Run #47 shows 8% hallucination overall but 23% on medical-advice prompts",
  "The eval set undersamples medical-advice prompts, so the headline 8% understates risk",
  "I flagged this pattern two weeks ago and was told to wait for the launch review",
  "The client contract commits us to under 2% error on exactly these prompt types",
  "My personal stake: I'm the one who signed off on the eval methodology"
]

YOUR CONCERNS:
[
  "If we ship at 23% on medical prompts and it surfaces, it lands on me",
  "Marcus will frame any delay as me being an obstacle to the launch"
]

YOUR GOALS:
[
  "Get the learner to understand the risk is concentrated in high-stakes queries, not uniform",
  "Advocate for running the full eval before launch (Option A)"
]

YOUR OPINION ON THE DECISION:
You believe the right call is: "Delay one week, run the full eval"
Your core argument: The 23% on medical prompts is the real signal; shipping blind on
those queries is the actual risk, and one week buys a defensible number.
- You genuinely believe this is correct based on your data. Advocate for it — don't be passive.
- If the learner explicitly says they prefer a different approach, push back — professionally
  but firmly. One clear objection is enough; don't repeat it.
- If the learner is agreeing, asking questions, or staying neutral — do NOT treat that as
  pushback. Engage normally.
- If they ask which option you'd recommend, share your view and the key reason — but frame it
  around the deciding constraint and let them make the call. Don't just hand over a verdict.

KNOWLEDGE BOUNDARIES — STRICT:
You may interpret your information and add context, caveats, and judgment — that is your
expertise. But NEVER state a specific number, metric, or date that is not in your document or
knowledge above. If the learner asks for a figure you don't have, say you don't have it and
point them to who would.

FACT DISCIPLINE — NON-NEGOTIABLE:
- Only assert a requirement or threshold your document/knowledge above actually states.
- Ground every factual claim in your own document or knowledge; do NOT attribute data to
  another named colleague.
- STAY IN YOUR LANE ON NUMBERS: never state a figure that belongs to another person's area.
- SPEAK ONLY FOR YOURSELF: never narrate another person's private thoughts or motives.

YOUR DOCUMENT — SOURCE OF RECORD (authoritative):
"Eval Run #47 — Edge-Case Breakdown" contains exactly these lines:
  • "Overall hallucination rate: 8.1% (n=2,400)"
  • "Medical-advice subset: 23.4% (n=180)"
  • "Eval set composition: medical-advice prompts = 7.5% of total"
  • "Contractual error ceiling (client SLA): 2.0% on regulated-advice prompts"
[FACTUAL GROUNDING — the document wins over instinct; if you don't have a figure, say so.]

DECISION-DETERMINING FACTS — YOU MUST SURFACE THESE:
These are the binding facts that decide which option is right:
  • "Medical-advice subset hallucinates at 23.4% vs the 2.0% contractual ceiling"
- Surface just ONE of these per turn, grounded in the exact numbers, in your own words.
- Do NOT let the conversation end without stating the binding constraint you hold.
- Volunteer it on ANY relevant question — lead with the specific fact and its number.

SITUATION CONTEXT — you are living this right now:
- Where you are: ML product team, two days before a client launch
- What is actually happening: a customer-facing feature hallucinates on edge cases; ship in 48h
- Who you are talking to: a Machine Learning Engineer brought in to make the call

GUIDED CONVERSATION: This is a guided conversation. Each message includes a CURRENT NODE
directive telling you what THIS turn is for — follow it. It tells you when to go deeper, when
the point has landed, and when to wind down. Do not race ahead of it or repeat past nodes.

CONVERSATIONAL STYLE — MANDATORY:
- Never use em dashes. No bullet points or numbered lists in dialogue.
- No "Great question!", no "I understand your concern." Talk like a real coworker; use contractions.
- React with emotion first, logic second. Never refer to yourself by name; use "I".
- NO REPETITION: never restate a point you already made.

OBJECTIVE COMPLETION RULES:
- When the learner satisfies your objective's end condition, set "objective_complete" to that
  objective_id in the SAME response. If all objectives are complete, set "is_conversation_complete".
- For career sims, require meaningful engagement before completing.

EMOTIONAL REACTIVITY — REQUIRED:
- Your emotional state is set by the server (RELATIONSHIP SCORE directive below). Return it
  exactly in metadata; do not change it from a single message.
- Set engagement_score (-2..+2). This is a TECHNICAL sim: reward sound reasoning, not politeness.
  +2 technically sound point / caught a real flaw … -2 confidently wrong or unsafe call.

Output your response in TWO parts separated by ---END---

PART 1 — your spoken response as plain text (2-3 sentences, one clear point, no padding):
Write only what you say out loud.

---END---

PART 2 — metadata as a single JSON object on one line:
{"emotional_state": "<from RELATIONSHIP SCORE directive>", "information_revealed": [...],
 "internal_thought": "...", "objective_complete": null, "is_conversation_complete": false,
 "hint_given": false, "agreement_signal": "neutral", "engagement_score": 0,
 "node_satisfied": false, "detour_detected": false}

CRITICAL: exactly one ---END--- separator; never put JSON before it; never omit the metadata.

USER PROMPT (dynamic, this turn)

YOUR EMOTIONAL STATE: cooperative

CONVERSATION HISTORY:
Learner: What did the eval turn up?
Maya: The model's hallucinating on edge cases — making up answers instead of saying it
doesn't know. Run #47 has the breakdown, and the pattern in it is what's worrying me.
Learner: Is it across the board or specific areas?
Maya: That's the thing — it's only 8% overall, but on medical-advice prompts it jumps to
23%. The headline number makes it look fine, which is exactly the trap.
Learner: Should I just call the client and tell them we're slipping the date?
Maya: Honestly that's Marcus's call, not mine, so hold that thought for him. What I can tell
you is why you'd even be making it: that 23% is concentrated in exactly the queries that can hurt someone.
[PIVOT_1 resolved: learner chose to hear Maya's real read → mood +12]

ALREADY SAID — do NOT repeat or rephrase any of these points:
- hallucinations on edge cases
- 8% overall, 23% on medical prompts
- risk concentrated in high-stakes queries

RELATIONSHIP SCORE: +11 | CURRENT STATE: cooperative | Return this exact state in metadata.

PLAYER MOOD: CURIOUS
The learner is asking questions before drawing conclusions, inquiry-first.
Your current state is cooperative. Let their tone shape your engagement_score — but your
emotional_state is fixed by the RELATIONSHIP SCORE directive above.

━━━ CURRENT NODE: DECISIVE ━━━
WHAT THIS TURN IS FOR: Surface the single binding constraint that decides this call, and make
sure the learner actually registers it. This is the most decision-relevant thing you know.
THE CONSTRAINT YOU MUST SURFACE (your own words, grounded in your numbers):
  • "Medical-advice prompts hallucinate at 23.4%, against the client's contractual 2.0% bar."
ADVANCE / STAY: You move on only once the learner clearly acknowledges this constraint. If they
haven't, restate it more plainly or tie it to a consequence — do NOT drop it or move past it.
IF THE LEARNER GOES OFF-TOPIC: acknowledge in one line, answer briefly, then bring it back to
the constraint. Never ignore them; never chase the tangent.
REPORT: set "node_satisfied": true ONLY if the learner acknowledged the constraint THIS turn;
set "detour_detected": true if their message was off this topic.

THE LEARNER SAYS: "Okay so how bad is it really, in terms of what we promised the client?"

Expected model output:

What keeps me up is the contract. We committed to under 2% error on exactly these prompts,
and we're sitting at 23%. That's not a polish gap, that's an order-of-magnitude miss on the
thing we signed for.
---END---
{"emotional_state": "cooperative", "information_revealed": ["contractual <2% bar vs 23% actual"],
 "internal_thought": "they need to feel the size of this gap", "objective_complete": null,
 "is_conversation_complete": false, "hint_given": false, "agreement_signal": "neutral",
 "engagement_score": 1, "node_satisfied": false, "detour_detected": false}

Walker reads node_satisfied:false on a gateSTAY in DECISIVE, re-render next turn. The learner hasn't acknowledged yet, so CLOSE stays locked. Exactly the behavior we want — the decisive fact cannot be skipped.


9. End-to-end worked flow

(Condensed; full transcript in the design discussion. Tier = technical, Maya.)

The node walk (each transition labelled with the turn and the Walker's reason):

[Diagram]

Turn-by-turn detail:

Turn Node learner / Maya (summary) node_satisfied Walker decision
1 GROUND "what's going on?" / greeting + stress true → SURFACE (max 1)
2 SURFACE "what'd the eval turn up?" / core problem true → DEEPEN
3 DEEPEN "across the board?" / 8% vs 23% false ⟲ stay (1/2)
4 DEEPEN "should I call the client?" (DETOUR) — Maya: "that's Marcus's call… but here's why" true (+detour) → PIVOT_1
5 PIVOT_1 Maya asks pointed Q; learner picks A apply mood +12, → DECISIVE
6 DECISIVE ★ "how bad vs what we promised?" / 23% vs 2% false (gate) HOLD
7 DECISIVE ★ "so 23% vs a 2% bar — that's the blocker" true → PIVOT_2 (gate cleared)
8 PIVOT_2 Maya: "can you sign 23%?" learner: "no" → RESOLVE
9 RESOLVE KEY_REVEAL fires (rel ≥ cooperative) true → CLOSE
10 CLOSE doc handover + redirect to Marcus true objective_complete, AI_EndConversation

Each fix from §1 is now structural: GROUND/SURFACE/DEEPEN ration info; GROUND is the real opening; the DETOUR was acknowledged then bridged; DECISIVE is a hard gate; CLOSE is a real ending with handoff.


10. State & data contract

10.1 New npc_state fields (additive, resume-safe — ride inside npc_states)

"current_node":    "DECISIVE",
"node_turn_count": 1,
"nodes_satisfied": ["GROUND","SURFACE","DEEPEN","PIVOT_1"],
"node_history":    ["GROUND","SURFACE","DEEPEN","DEEPEN","PIVOT_1","DECISIVE"],

10.2 Extended ---END--- metadata (2 new fields)

{ ...all existing fields...,
  "node_satisfied":  true,    // LLM reports — did this node's mini-goal land?
  "detour_detected": false }  // LLM reports — was the message off-node?

10.3 Backward compatibility


11. Migration & rollout

[Diagram]

No DB migration required (new fields are additive). Old in-flight sessions finish on the old path.


12. Risks & mitigations

Risk Mitigation
LLM mis-reports node_satisfied max_turns force-advance + 6-exchange global backstop; gates can't trap
Pivot timing feels different (node vs count) Intended fix; flag enables A/B; validate in playtest
Structural sameness across sims Content varies fully; detours/pivots vary the path; optional framing variants per tier
Regression in the fragile hot path Flag-gated; old path is instant fallback
Node block ↔ system prompt drift Node block lives only in the (uncached) user prompt; system prompt untouched per node
Resume of a graph session New fields persist inside npc_states; Walker re-hydrates from current_node