Onboarding Flow — Agentic Architecture Deep Dive

A visual tour of the Dolly onboarding agent (codename Atlas), told as the story of one creator — Vlad — building his AI twin by voice. Every claim is grounded in the code across five repos: legends/packages/onboarding-service (the agent), legends/packages/dolly (the Atlas UI), legends-platform (the LiveKit SDK), and catalyst-server/legends + wix-vmr-repo/legends-platform (the Scala backend: voice-service, vendor-gateway-service, persona-chat-service).

Legend for the diagrams: 🟣 = agentic (LLM-driven / non-deterministic) · 🟦 = deterministic scaffolding · 🟡 = an illustrative value I filled in (field names + structure are from code) · 🔶 = a link whose mechanism is proven but whose exact runtime wiring lives in deploy config, not in any repo (see §12).

Meet Vlad — and how to read this doc

Vlad is a startup coach. He opens Dolly, and a friendly avatar named Atlas ("Mike") voice- interviews him to build his AI twin. We follow Vlad's words through the system, turn by turn.

There are two layers to keep separate, because conflating them causes endless confusion:

Transport — how Vlad's voice gets in and the reply gets out. This is LiveKit: Vlad's browser joins a LiveKit room; a backend Cloud voice agent does speech-to-text / turn- detection / text-to-speech; in video mode a Tavus avatar renders the talking head.
Brain — the agent logic that decides what to say and which tools to call. This is the onboarding-service LangGraph (a deterministic state machine wrapping a stochastic LLM).

The brain's logic is transport-agnostic: it only ever sees an OpenAI-compatible text request and emits text. Whether Tavus or a LiveKit Cloud agent delivers that request doesn't change the agent — which is exactly why this doc spends most of its time on the brain.

0. The one-paragraph mental model

This is a structured agent: a deterministic state machine (a LangGraph StateGraph) wraps a stochastic LLM. The state machine decides when and in what role the agent runs; the LLM decides what to say and which tools to call; hard-coded guardrails decide whether a given tool call is actually allowed. The agent is stateless — its memory lives outside it, reloaded every turn. This "deterministic harness + stochastic core" split is the single most important agentic pattern in the whole design.

[Diagram]

Vlad's lens: when Vlad says "Hi, I'm Vlad," the state machine says "we're in the name/intent step," the LLM decides to call dollyNameUpdater(name:"Vlad"), and a gate confirms that's allowed right now. Three different parts, three different jobs.

1. Where the agent sits (the perception/action boundary)

Vlad never talks to the brain directly. He talks into a LiveKit room; a Cloud voice agent turns his speech into text and the brain's tokens back into speech; in video mode a Tavus avatar renders the face. The brain (onboarding-service) only ever sees an OpenAI-compatible text request and emits text tokens. Clean perception → cognition → action.

[Diagram]

Agentic concept — embodied agent boundary: the model never touches the world directly. Inputs are normalized into a text "observation"; outputs are text "actions" that scaffolding turns into side effects. Swapping Tavus-direct for a LiveKit Cloud agent didn't change the agent — which is precisely what happened in this codebase.

2. The mode graph = a router over specialized sub-agents

Onboarding isn't one prompt — it's one specialized agent per step, selected by a router. Each mode has its own system prompt, its own tool set, and its own job. A routerNode reads state and dispatches to the right mode agent (the "v11" split).

[Diagram]

Mode (sub-agent)	Has tools?	Vlad's experience
`GetUserNameAndIntent`	✅	tells Atlas his name + what the twin is for
`ImageUpload`	✅	uploads a photo; the twin's first message is generated
`ImageSelectionLoading`	❌	waits while styled images render
`ImageSelection`	✅	picks a style
`VoiceRecording`	❌	records a voice sample
`Preview`	❌	hears Atlas narrate the finished twin
`MikeChat`	(getUiReference only)	experiment-endpoint terminal chat

Agentic concept — router / supervisor + specialist agents: focused agents with small tool sets. Smaller surface = fewer wrong tool calls, cheaper prompts, easier evals. The router is deterministic code, not an LLM — control flow you can trust.

3. The single-turn agent loop (ReAct, in code)

Every request from the Cloud agent runs one pass of a perceive → reason → act → observe → respond loop. Steps 1–2 and 5–7 are deterministic; the agent (🟣) is only steps 3–4.

[Diagram]

Vlad's lens: "Hi, I'm Vlad" is plain speech → it skips the signal branches, routes to GetUserNameAndIntent, reasons, calls a gated tool, and streams a reply. A button-click like "photo uploaded" instead enters as a hidden signal (§7).

Agentic concept — ReAct loop with a thin model slice: "agentic" ≠ "the model does everything." The model reasons and chooses actions inside a controlled loop.

4. Tool execution — three strategies + self-correcting guardrails

This is the richest agentic logic (graph/modes/get-name-intent.ts). When the LLM emits tool calls, each one is gated, then executed by one of three strategies depending on whether the agent needs to see the result before speaking.

[Diagram]

Vlad's lens: Vlad's name fires the optimistic path — Atlas says "It's great to meet you, Vlad" instantly while the backend write settles in the background (§6). If Vlad had stated his intent before his name, the gate would reject dollyIntentUpdater and Atlas would self-correct: "I'd love to — first, what should I call you?"

Two safety rules worth calling out (both real, both from PR-review comments in the code):

Confirmations only for executed tools — a gated tool's optimistic "Got it!" must never reach Vlad, or you get a false success.
Voice-first always responds — if every tool was gated, force a follow-up LLM call so Atlas still says something (silence is a broken turn).

5. Three-axis tool control (defense in depth)

A tool firing requires passing three independent gates, each owned by a different party. This is how you keep a non-deterministic model safe and PM-controllable at once.

[Diagram]

Vlad's lens: dollyIntentUpdater is invisible outside its mode (axis 1), its mode only runs after name+intent flags (axis 2), and it's hard-blocked unless userName is set (axis 3) — so even if the LLM tries to record Vlad's intent before his name, the gate stops it. Prompts are soft; gates are hard.

6. Optimistic execution — the voice-latency trick

Voice agents can't afford a second round-trip before speaking. So the agent speaks its confirmation optimistically and runs the real backend write in the background.

[Diagram]

Agentic concept — speculative action & optimistic UI: when an action almost always succeeds and the response doesn't depend on its result, act and respond in parallel. Reserve the slower observe-then-speak loop for when the answer genuinely depends on tool output.

7. The environment talks to the agent: hidden signals

Vlad doesn't only speak — he also clicks ("upload photo," "pick this style"). Those UI events can't reach the brain directly, so they're smuggled in as <HIDDEN>{signal}</HIDDEN> envelopes inside the user message. The harness decides how each signal enters perception — as an observation, a silent state change, or not at all.

[Diagram]

Why this matters: the LLM would happily parrot back any raw marker it sees, so signals are rewritten into natural language (or hidden entirely), and a two-sided sanitizer strips any <HIDDEN> block the model hallucinates on the way out.

🔶 Open ingress detail: onboarding-service definitely still parses <HIDDEN> markers (hidden-protocol.ts, ADR-006). In the original Tavus-direct model the UI's sendAppMessage went through Tavus into the message stream. In the LiveKit world the UI publishes over the data channel and the Cloud agent forwards a transcript + context to the LLM — but the exact place UI clicks become <HIDDEN> text now lives in the external Cloud agent, not in any repo I can read. The brain's handling is unchanged either way.

8. Memory: a stateless agent with an external brain

The graph keeps no in-process memory. Every turn, state is reloaded from CloudStore and the conversation transcript is replayed. State is split into a server half (agent-owned) and a client half (UI-owned), sharing one record with optimistic-concurrency writes.

[Diagram]

Vlad's lens: the CloudStore key is Vlad's conversationId — a stable Wix id minted when the conversation is created and echoed back on every turn (never invented per-request). Because the agent is stateless, any pod can serve Vlad's next turn and restarts never lose his progress.

Agentic concept — externalized memory + idempotent turns: memory is a store the harness manages, not hidden state inside the model. The CAS write means a slow UI update can't clobber a fresh agent write.

9. Context engineering: how each turn's prompt is built

The agent's behavior per turn is assembled from slots — most of them PM-editable in the Genie AI Assistant Builder (assistant_hub) and hot-reloaded within 5 minutes, no deploy.

[Diagram]

Two deliberate choices:

currentState includes only what's captured (e.g. name: Vlad, not intent: not set) — telling the model what's missing tempts it to prematurely "fill the gap." Omission is a prompt-design guardrail.
Prompts live in Genie's Builder, not code — PMs tune Atlas's voice/guidelines live; code owns control flow, tools, and gates. ADR-008 notes the onboarding-service re-implements a Genie AI Assistant Builder skill as a standalone LangGraph service — so the config is Genie's, the control flow is the service's.

10. The transport, end to end (the unified pipeline)

Now the layer under §1. The Atlas UI uses the platform LiveKit SDK (@wix/legends-platform/livekit/sdk) and supports both audio and video modes. Both modes ride the same machinery — only the rendered face differs:

[Diagram]

Proven (Scala backend): voice-service is a thin orchestrator → vendor-gateway-service mints a LiveKit token + session payload and dispatches a LiveKit Cloud agent (STT/turn/TTS). The agent's LLM URL is always {aiAssistantUrl}/<vendor>/chat/completions (HourOneImplementor.scala, HttpClientUtils.scala); llmModel/llmProvider ride as metadata, and CustomLlm.baseUrl is text-chat only. So the brain is reached via the Genie ai-assistant endpoint, with onboarding-service as the standalone LangGraph implementing that onboarding skill (the 🔶 in §12).

11. One full turn, narrated (with real payloads)

The two turns that matter most, with payload shapes pulled from the code. 🟡 = illustrative value.

11a. Turn 1 — Vlad says his name (optimistic fast path)

[Diagram]

Real payload shapes behind this turn (verbatim field names from code):

Request: body.metadata.{conversationId,userId} (completion-request.ts) — ids minted Wix-side at conversation creation, echoed back; conversationId is the CloudStore key.
Tool call: function.arguments is a JSON string → {name:"Vlad", spokenResponse:"…"}. The verbatim baseline spokenResponse is "It's great to meet you, Vlad. Now we're going to shape your legend's goal…" (full-happy-path-v11.baseline.json).
Optimistic result: {success:true, extractedName:"Vlad"} (dolly-tools.ts).
RPC: updateDollyAssistantFlow({conversationId, dolly:{name:"Vlad"}, fieldMask:["name"]}).
No mode transition yet — intent is still missing, so currentMode stays.

11b. Turn 2 — Vlad says what he does (awaited + gate + knowledge + transition)

[Diagram]

Why this turn is different:

Awaited path — dollyIntentUpdater is awaited because ordering matters; its result is parsed before responding.
Gate ✅ because userName is set. If it weren't, the brain returns {success:false, error:"tool_gated", reason:"User's name has not been captured yet…", tool:"dollyIntentUpdater"} (tool-gates.ts) and self-corrects without speaking the optimistic line.
Knowledge extraction fires fire-and-forget: bulkUpsertKnowledgeSnippetAssistantFlow({chatId, namespace:"DOLLY", generatedSnippet:[…]}).
Mode transition — with userName ✅ + userIntent ✅, the state machine moves Vlad to ImageUpload and emits a mode-changed Duplexer event.

12. Two brains and the one open question

The platform has two LLM brains, and onboarding's relationship to them is the last 🔶:

	Onboarding (Atlas)	Published persona chat
Brain	`onboarding-service` — standalone LangGraph, exposes `/v1/chat/completions`, pulls prompt/tool config from Genie `assistant_hub`, runs its own LLM (OpenRouter/Cerebras)	the shared Genie ai-assistant at `{aiAssistantUrl}/<vendor>/chat/completions`
Reached how	🔶 see below	LiveKit Cloud agent → `{aiAssistantUrl}/<vendor>` (PROVEN)
Vendor / face	Tavus avatar in video mode	HourOne/ElevenLabs/Hume (voice) + Tavus (video)

What's proven:

Onboarding's transport is the same LiveKit + Cloud-agent pipeline as published chat (the Atlas UI calls createInteractiveConversation with a namespace + configurationId).
The Cloud agent's LLM URL is always {aiAssistantUrl}/<vendor>/chat/completions — not a caller-supplied base_url.
onboarding-service exposes only /v1/chat/completions (labeled "for Tavus custom LLM") and is a standalone re-implementation of a Genie assistant skill (ADR-008).

The single open question (🔶): for a live Atlas voice turn, is onboarding-service the endpoint that {aiAssistantUrl} resolves to (or that Genie invokes) for the onboarding configurationId? That binding lives in two runtime/deploy values not in any repo: the aiAssistantUrl databag, and the onboarding persona's AssistantConfiguration. Everything in code is consistent with onboarding-service being the brain (its internal logic in §3–§9, §11 is exactly the onboarding behavior); confirming how it's invoked is a runtime lookup.

13. The full journey (Vlad, end to end)

Zooming all the way out — Vlad from first hello to a published, chat-ready twin. (The middle stages are detailed in LEGENDS_ARCHITECTURE.md.)

[Diagram]

Vlad's payoff: his audience talks to the published twin through persona chat (the other brain in §12) — RAG over everything Vlad taught it, wrapped in his persona prompt, streamed back as text, voice, or a Tavus video avatar.

TL;DR — the agentic patterns to take away

Pattern	Where it shows up
Deterministic harness wraps stochastic LLM	the whole `StateGraph` design
Router + specialized sub-agents	one mode = one prompt + tool set
ReAct loop (reason → act → observe → respond)	`get-name-intent.ts` tool handling
Tool/function calling with structured args	`dolly-tools.ts` (data + `spokenResponse`)
Guardrails as code + self-correction	`tool-gates.ts` → `tool_gated` rejection
Three-axis tool control (visibility/timing/precondition)	Genie Builder × state machine × gates
Speculative / optimistic execution	fire-and-forget tools + `spokenResponse` fast path
Environment events as observations	hidden-signal system
Externalized memory + stateless idempotent turns	CloudStore load/save per turn
Context engineering + hot-swappable prompts	`assembleSystemPrompt` + Genie `assistant_hub`
Pluggable transport in front of a fixed brain	LiveKit Cloud agent → `onboarding-service`

Non-agentic backbone (the unsung 🟦): LiveKit (WebRTC rooms + Cloud agent for STT/turn/TTS), Tavus (video avatar only), voice-service + vendor-gateway, the Genie ai-assistant endpoint, SSE streaming, Duplexer WebSocket push, CloudStore (CAS KV), ambassador RPC, auth/identity injection, and per-pod dedup. None of it is "AI" — but the agent is only reliable because this scaffolding is.

Provenance note: transport facts are from legends-platform (LiveKit SDK), catalyst-server/ legends/{voice-service,vendor-gateway-service}, and wix-vmr-repo/legends-platform/ persona-chat-service. Brain facts are from legends/packages/onboarding-service (code + ADRs 002/006/008). 🔶 markers denote the one binding that lives in runtime/deploy config, not code.