Onboarding Flow — Agentic Architecture Deep Dive

A visual tour of the Dolly onboarding agent (codename Atlas), told as the story of one creator — Vlad — building his AI twin by voice. Every claim is grounded in the code across five repos: legends/packages/onboarding-service (the agent), legends/packages/dolly (the Atlas UI), legends-platform (the LiveKit SDK), and catalyst-server/legends + wix-vmr-repo/legends-platform (the Scala backend: voice-service, vendor-gateway-service, persona-chat-service).

Legend for the diagrams: 🟣 = agentic (LLM-driven / non-deterministic) · 🟦 = deterministic scaffolding · 🟡 = an illustrative value I filled in (field names + structure are from code) · 🔶 = a link whose mechanism is proven but whose exact runtime wiring lives in deploy config, not in any repo (see §12).


Meet Vlad — and how to read this doc

Vlad is a startup coach. He opens Dolly, and a friendly avatar named Atlas ("Mike") voice- interviews him to build his AI twin. We follow Vlad's words through the system, turn by turn.

There are two layers to keep separate, because conflating them causes endless confusion:

The brain's logic is transport-agnostic: it only ever sees an OpenAI-compatible text request and emits text. Whether Tavus or a LiveKit Cloud agent delivers that request doesn't change the agent — which is exactly why this doc spends most of its time on the brain.


0. The one-paragraph mental model

This is a structured agent: a deterministic state machine (a LangGraph StateGraph) wraps a stochastic LLM. The state machine decides when and in what role the agent runs; the LLM decides what to say and which tools to call; hard-coded guardrails decide whether a given tool call is actually allowed. The agent is stateless — its memory lives outside it, reloaded every turn. This "deterministic harness + stochastic core" split is the single most important agentic pattern in the whole design.

[Diagram]

Vlad's lens: when Vlad says "Hi, I'm Vlad," the state machine says "we're in the name/intent step," the LLM decides to call dollyNameUpdater(name:"Vlad"), and a gate confirms that's allowed right now. Three different parts, three different jobs.


1. Where the agent sits (the perception/action boundary)

Vlad never talks to the brain directly. He talks into a LiveKit room; a Cloud voice agent turns his speech into text and the brain's tokens back into speech; in video mode a Tavus avatar renders the face. The brain (onboarding-service) only ever sees an OpenAI-compatible text request and emits text tokens. Clean perception → cognition → action.

[Diagram]

Agentic concept — embodied agent boundary: the model never touches the world directly. Inputs are normalized into a text "observation"; outputs are text "actions" that scaffolding turns into side effects. Swapping Tavus-direct for a LiveKit Cloud agent didn't change the agent — which is precisely what happened in this codebase.


2. The mode graph = a router over specialized sub-agents

Onboarding isn't one prompt — it's one specialized agent per step, selected by a router. Each mode has its own system prompt, its own tool set, and its own job. A routerNode reads state and dispatches to the right mode agent (the "v11" split).

[Diagram]
Mode (sub-agent) Has tools? Vlad's experience
GetUserNameAndIntent tells Atlas his name + what the twin is for
ImageUpload uploads a photo; the twin's first message is generated
ImageSelectionLoading waits while styled images render
ImageSelection picks a style
VoiceRecording records a voice sample
Preview hears Atlas narrate the finished twin
MikeChat (getUiReference only) experiment-endpoint terminal chat

Agentic concept — router / supervisor + specialist agents: focused agents with small tool sets. Smaller surface = fewer wrong tool calls, cheaper prompts, easier evals. The router is deterministic code, not an LLM — control flow you can trust.


3. The single-turn agent loop (ReAct, in code)

Every request from the Cloud agent runs one pass of a perceive → reason → act → observe → respond loop. Steps 1–2 and 5–7 are deterministic; the agent (🟣) is only steps 3–4.

[Diagram]

Vlad's lens: "Hi, I'm Vlad" is plain speech → it skips the signal branches, routes to GetUserNameAndIntent, reasons, calls a gated tool, and streams a reply. A button-click like "photo uploaded" instead enters as a hidden signal (§7).

Agentic concept — ReAct loop with a thin model slice: "agentic" ≠ "the model does everything." The model reasons and chooses actions inside a controlled loop.


4. Tool execution — three strategies + self-correcting guardrails

This is the richest agentic logic (graph/modes/get-name-intent.ts). When the LLM emits tool calls, each one is gated, then executed by one of three strategies depending on whether the agent needs to see the result before speaking.

[Diagram]

Vlad's lens: Vlad's name fires the optimistic path — Atlas says "It's great to meet you, Vlad" instantly while the backend write settles in the background (§6). If Vlad had stated his intent before his name, the gate would reject dollyIntentUpdater and Atlas would self-correct: "I'd love to — first, what should I call you?"

Two safety rules worth calling out (both real, both from PR-review comments in the code):


5. Three-axis tool control (defense in depth)

A tool firing requires passing three independent gates, each owned by a different party. This is how you keep a non-deterministic model safe and PM-controllable at once.

[Diagram]

Vlad's lens: dollyIntentUpdater is invisible outside its mode (axis 1), its mode only runs after name+intent flags (axis 2), and it's hard-blocked unless userName is set (axis 3) — so even if the LLM tries to record Vlad's intent before his name, the gate stops it. Prompts are soft; gates are hard.


6. Optimistic execution — the voice-latency trick

Voice agents can't afford a second round-trip before speaking. So the agent speaks its confirmation optimistically and runs the real backend write in the background.

[Diagram]

Agentic concept — speculative action & optimistic UI: when an action almost always succeeds and the response doesn't depend on its result, act and respond in parallel. Reserve the slower observe-then-speak loop for when the answer genuinely depends on tool output.


7. The environment talks to the agent: hidden signals

Vlad doesn't only speak — he also clicks ("upload photo," "pick this style"). Those UI events can't reach the brain directly, so they're smuggled in as <HIDDEN>{signal}</HIDDEN> envelopes inside the user message. The harness decides how each signal enters perception — as an observation, a silent state change, or not at all.

[Diagram]

Why this matters: the LLM would happily parrot back any raw marker it sees, so signals are rewritten into natural language (or hidden entirely), and a two-sided sanitizer strips any <HIDDEN> block the model hallucinates on the way out.

🔶 Open ingress detail: onboarding-service definitely still parses <HIDDEN> markers (hidden-protocol.ts, ADR-006). In the original Tavus-direct model the UI's sendAppMessage went through Tavus into the message stream. In the LiveKit world the UI publishes over the data channel and the Cloud agent forwards a transcript + context to the LLM — but the exact place UI clicks become <HIDDEN> text now lives in the external Cloud agent, not in any repo I can read. The brain's handling is unchanged either way.


8. Memory: a stateless agent with an external brain

The graph keeps no in-process memory. Every turn, state is reloaded from CloudStore and the conversation transcript is replayed. State is split into a server half (agent-owned) and a client half (UI-owned), sharing one record with optimistic-concurrency writes.

[Diagram]

Vlad's lens: the CloudStore key is Vlad's conversationId — a stable Wix id minted when the conversation is created and echoed back on every turn (never invented per-request). Because the agent is stateless, any pod can serve Vlad's next turn and restarts never lose his progress.

Agentic concept — externalized memory + idempotent turns: memory is a store the harness manages, not hidden state inside the model. The CAS write means a slow UI update can't clobber a fresh agent write.


9. Context engineering: how each turn's prompt is built

The agent's behavior per turn is assembled from slots — most of them PM-editable in the Genie AI Assistant Builder (assistant_hub) and hot-reloaded within 5 minutes, no deploy.

[Diagram]

Two deliberate choices:


10. The transport, end to end (the unified pipeline)

Now the layer under §1. The Atlas UI uses the platform LiveKit SDK (@wix/legends-platform/livekit/sdk) and supports both audio and video modes. Both modes ride the same machinery — only the rendered face differs:

[Diagram]

Proven (Scala backend): voice-service is a thin orchestrator → vendor-gateway-service mints a LiveKit token + session payload and dispatches a LiveKit Cloud agent (STT/turn/TTS). The agent's LLM URL is always {aiAssistantUrl}/<vendor>/chat/completions (HourOneImplementor.scala, HttpClientUtils.scala); llmModel/llmProvider ride as metadata, and CustomLlm.baseUrl is text-chat only. So the brain is reached via the Genie ai-assistant endpoint, with onboarding-service as the standalone LangGraph implementing that onboarding skill (the 🔶 in §12).


11. One full turn, narrated (with real payloads)

The two turns that matter most, with payload shapes pulled from the code. 🟡 = illustrative value.

11a. Turn 1 — Vlad says his name (optimistic fast path)

[Diagram]

Real payload shapes behind this turn (verbatim field names from code):

11b. Turn 2 — Vlad says what he does (awaited + gate + knowledge + transition)

[Diagram]

Why this turn is different:


12. Two brains and the one open question

The platform has two LLM brains, and onboarding's relationship to them is the last 🔶:

Onboarding (Atlas) Published persona chat
Brain onboarding-service — standalone LangGraph, exposes /v1/chat/completions, pulls prompt/tool config from Genie assistant_hub, runs its own LLM (OpenRouter/Cerebras) the shared Genie ai-assistant at {aiAssistantUrl}/<vendor>/chat/completions
Reached how 🔶 see below LiveKit Cloud agent → {aiAssistantUrl}/<vendor> (PROVEN)
Vendor / face Tavus avatar in video mode HourOne/ElevenLabs/Hume (voice) + Tavus (video)

What's proven:

The single open question (🔶): for a live Atlas voice turn, is onboarding-service the endpoint that {aiAssistantUrl} resolves to (or that Genie invokes) for the onboarding configurationId? That binding lives in two runtime/deploy values not in any repo: the aiAssistantUrl databag, and the onboarding persona's AssistantConfiguration. Everything in code is consistent with onboarding-service being the brain (its internal logic in §3–§9, §11 is exactly the onboarding behavior); confirming how it's invoked is a runtime lookup.


13. The full journey (Vlad, end to end)

Zooming all the way out — Vlad from first hello to a published, chat-ready twin. (The middle stages are detailed in LEGENDS_ARCHITECTURE.md.)

[Diagram]

Vlad's payoff: his audience talks to the published twin through persona chat (the other brain in §12) — RAG over everything Vlad taught it, wrapped in his persona prompt, streamed back as text, voice, or a Tavus video avatar.


TL;DR — the agentic patterns to take away

Pattern Where it shows up
Deterministic harness wraps stochastic LLM the whole StateGraph design
Router + specialized sub-agents one mode = one prompt + tool set
ReAct loop (reason → act → observe → respond) get-name-intent.ts tool handling
Tool/function calling with structured args dolly-tools.ts (data + spokenResponse)
Guardrails as code + self-correction tool-gates.tstool_gated rejection
Three-axis tool control (visibility/timing/precondition) Genie Builder × state machine × gates
Speculative / optimistic execution fire-and-forget tools + spokenResponse fast path
Environment events as observations hidden-signal system
Externalized memory + stateless idempotent turns CloudStore load/save per turn
Context engineering + hot-swappable prompts assembleSystemPrompt + Genie assistant_hub
Pluggable transport in front of a fixed brain LiveKit Cloud agent → onboarding-service

Non-agentic backbone (the unsung 🟦): LiveKit (WebRTC rooms + Cloud agent for STT/turn/TTS), Tavus (video avatar only), voice-service + vendor-gateway, the Genie ai-assistant endpoint, SSE streaming, Duplexer WebSocket push, CloudStore (CAS KV), ambassador RPC, auth/identity injection, and per-pod dedup. None of it is "AI" — but the agent is only reliable because this scaffolding is.

Provenance note: transport facts are from legends-platform (LiveKit SDK), catalyst-server/ legends/{voice-service,vendor-gateway-service}, and wix-vmr-repo/legends-platform/ persona-chat-service. Brain facts are from legends/packages/onboarding-service (code + ADRs 002/006/008). 🔶 markers denote the one binding that lives in runtime/deploy config, not code.