A visual tour of the Dolly onboarding agent (codename Atlas), told as the story of one creator — Vlad — building his AI twin by voice. Every claim is grounded in the code across five repos:
legends/packages/onboarding-service(the agent),legends/packages/dolly(the Atlas UI),legends-platform(the LiveKit SDK), andcatalyst-server/legends+wix-vmr-repo/legends-platform(the Scala backend:voice-service,vendor-gateway-service,persona-chat-service).Legend for the diagrams: 🟣 = agentic (LLM-driven / non-deterministic) · 🟦 = deterministic scaffolding · 🟡 = an illustrative value I filled in (field names + structure are from code) · 🔶 = a link whose mechanism is proven but whose exact runtime wiring lives in deploy config, not in any repo (see §12).
Vlad is a startup coach. He opens Dolly, and a friendly avatar named Atlas ("Mike") voice- interviews him to build his AI twin. We follow Vlad's words through the system, turn by turn.
There are two layers to keep separate, because conflating them causes endless confusion:
onboarding-service LangGraph (a deterministic state machine wrapping a stochastic LLM).The brain's logic is transport-agnostic: it only ever sees an OpenAI-compatible text request and emits text. Whether Tavus or a LiveKit Cloud agent delivers that request doesn't change the agent — which is exactly why this doc spends most of its time on the brain.
This is a structured agent: a deterministic state machine (a LangGraph StateGraph)
wraps a stochastic LLM. The state machine decides when and in what role the agent runs;
the LLM decides what to say and which tools to call; hard-coded guardrails decide
whether a given tool call is actually allowed. The agent is stateless — its memory lives
outside it, reloaded every turn. This "deterministic harness + stochastic core" split is the
single most important agentic pattern in the whole design.
Vlad's lens: when Vlad says "Hi, I'm Vlad," the state machine says "we're in the name/intent step," the LLM decides to call
dollyNameUpdater(name:"Vlad"), and a gate confirms that's allowed right now. Three different parts, three different jobs.
Vlad never talks to the brain directly. He talks into a LiveKit room; a Cloud voice agent
turns his speech into text and the brain's tokens back into speech; in video mode a Tavus
avatar renders the face. The brain (onboarding-service) only ever sees an OpenAI-compatible
text request and emits text tokens. Clean perception → cognition → action.
Agentic concept — embodied agent boundary: the model never touches the world directly. Inputs are normalized into a text "observation"; outputs are text "actions" that scaffolding turns into side effects. Swapping Tavus-direct for a LiveKit Cloud agent didn't change the agent — which is precisely what happened in this codebase.
Onboarding isn't one prompt — it's one specialized agent per step, selected by a router.
Each mode has its own system prompt, its own tool set, and its own job. A routerNode
reads state and dispatches to the right mode agent (the "v11" split).
| Mode (sub-agent) | Has tools? | Vlad's experience |
|---|---|---|
GetUserNameAndIntent |
✅ | tells Atlas his name + what the twin is for |
ImageUpload |
✅ | uploads a photo; the twin's first message is generated |
ImageSelectionLoading |
❌ | waits while styled images render |
ImageSelection |
✅ | picks a style |
VoiceRecording |
❌ | records a voice sample |
Preview |
❌ | hears Atlas narrate the finished twin |
MikeChat |
(getUiReference only) | experiment-endpoint terminal chat |
Agentic concept — router / supervisor + specialist agents: focused agents with small tool sets. Smaller surface = fewer wrong tool calls, cheaper prompts, easier evals. The router is deterministic code, not an LLM — control flow you can trust.
Every request from the Cloud agent runs one pass of a perceive → reason → act → observe → respond loop. Steps 1–2 and 5–7 are deterministic; the agent (🟣) is only steps 3–4.
Vlad's lens: "Hi, I'm Vlad" is plain speech → it skips the signal branches, routes to
GetUserNameAndIntent, reasons, calls a gated tool, and streams a reply. A button-click like "photo uploaded" instead enters as a hidden signal (§7).
Agentic concept — ReAct loop with a thin model slice: "agentic" ≠ "the model does everything." The model reasons and chooses actions inside a controlled loop.
This is the richest agentic logic (graph/modes/get-name-intent.ts). When the LLM emits tool
calls, each one is gated, then executed by one of three strategies depending on whether
the agent needs to see the result before speaking.
Vlad's lens: Vlad's name fires the optimistic path — Atlas says "It's great to meet you, Vlad" instantly while the backend write settles in the background (§6). If Vlad had stated his intent before his name, the gate would reject
dollyIntentUpdaterand Atlas would self-correct: "I'd love to — first, what should I call you?"
Two safety rules worth calling out (both real, both from PR-review comments in the code):
A tool firing requires passing three independent gates, each owned by a different party. This is how you keep a non-deterministic model safe and PM-controllable at once.
Vlad's lens:
dollyIntentUpdateris invisible outside its mode (axis 1), its mode only runs after name+intent flags (axis 2), and it's hard-blocked unlessuserNameis set (axis 3) — so even if the LLM tries to record Vlad's intent before his name, the gate stops it. Prompts are soft; gates are hard.
Voice agents can't afford a second round-trip before speaking. So the agent speaks its confirmation optimistically and runs the real backend write in the background.
Agentic concept — speculative action & optimistic UI: when an action almost always succeeds and the response doesn't depend on its result, act and respond in parallel. Reserve the slower observe-then-speak loop for when the answer genuinely depends on tool output.
Vlad doesn't only speak — he also clicks ("upload photo," "pick this style"). Those UI events
can't reach the brain directly, so they're smuggled in as <HIDDEN>{signal}</HIDDEN> envelopes
inside the user message. The harness decides how each signal enters perception — as an
observation, a silent state change, or not at all.
Why this matters: the LLM would happily parrot back any raw marker it sees, so signals are
rewritten into natural language (or hidden entirely), and a two-sided sanitizer strips any
<HIDDEN> block the model hallucinates on the way out.
🔶 Open ingress detail:
onboarding-servicedefinitely still parses<HIDDEN>markers (hidden-protocol.ts, ADR-006). In the original Tavus-direct model the UI'ssendAppMessagewent through Tavus into the message stream. In the LiveKit world the UI publishes over the data channel and the Cloud agent forwards atranscript+contextto the LLM — but the exact place UI clicks become<HIDDEN>text now lives in the external Cloud agent, not in any repo I can read. The brain's handling is unchanged either way.
The graph keeps no in-process memory. Every turn, state is reloaded from CloudStore and the conversation transcript is replayed. State is split into a server half (agent-owned) and a client half (UI-owned), sharing one record with optimistic-concurrency writes.
Vlad's lens: the CloudStore key is Vlad's
conversationId— a stable Wix id minted when the conversation is created and echoed back on every turn (never invented per-request). Because the agent is stateless, any pod can serve Vlad's next turn and restarts never lose his progress.
Agentic concept — externalized memory + idempotent turns: memory is a store the harness manages, not hidden state inside the model. The CAS write means a slow UI update can't clobber a fresh agent write.
The agent's behavior per turn is assembled from slots — most of them PM-editable in the Genie
AI Assistant Builder (assistant_hub) and hot-reloaded within 5 minutes, no deploy.
Two deliberate choices:
currentState includes only what's captured (e.g. name: Vlad, not intent: not set) —
telling the model what's missing tempts it to prematurely "fill the gap." Omission is a
prompt-design guardrail.Now the layer under §1. The Atlas UI uses the platform LiveKit SDK
(@wix/legends-platform/livekit/sdk) and supports both audio and video modes. Both modes ride
the same machinery — only the rendered face differs:
Proven (Scala backend): voice-service is a thin orchestrator → vendor-gateway-service
mints a LiveKit token + session payload and dispatches a LiveKit Cloud agent (STT/turn/TTS).
The agent's LLM URL is always {aiAssistantUrl}/<vendor>/chat/completions
(HourOneImplementor.scala, HttpClientUtils.scala); llmModel/llmProvider ride as metadata,
and CustomLlm.baseUrl is text-chat only. So the brain is reached via the Genie ai-assistant
endpoint, with onboarding-service as the standalone LangGraph implementing that onboarding skill
(the 🔶 in §12).
The two turns that matter most, with payload shapes pulled from the code. 🟡 = illustrative value.
Real payload shapes behind this turn (verbatim field names from code):
body.metadata.{conversationId,userId} (completion-request.ts) — ids minted
Wix-side at conversation creation, echoed back; conversationId is the CloudStore key.function.arguments is a JSON string → {name:"Vlad", spokenResponse:"…"}.
The verbatim baseline spokenResponse is "It's great to meet you, Vlad. Now we're going to
shape your legend's goal…" (full-happy-path-v11.baseline.json).{success:true, extractedName:"Vlad"} (dolly-tools.ts).updateDollyAssistantFlow({conversationId, dolly:{name:"Vlad"}, fieldMask:["name"]}).currentMode stays.Why this turn is different:
dollyIntentUpdater is awaited because ordering matters; its result is
parsed before responding.userName is set. If it weren't, the brain returns
{success:false, error:"tool_gated", reason:"User's name has not been captured yet…", tool:"dollyIntentUpdater"}
(tool-gates.ts) and self-corrects without speaking the optimistic line.bulkUpsertKnowledgeSnippetAssistantFlow({chatId, namespace:"DOLLY", generatedSnippet:[…]}).userName ✅ + userIntent ✅, the state machine moves Vlad to
ImageUpload and emits a mode-changed Duplexer event.The platform has two LLM brains, and onboarding's relationship to them is the last 🔶:
| Onboarding (Atlas) | Published persona chat | |
|---|---|---|
| Brain | onboarding-service — standalone LangGraph, exposes /v1/chat/completions, pulls prompt/tool config from Genie assistant_hub, runs its own LLM (OpenRouter/Cerebras) |
the shared Genie ai-assistant at {aiAssistantUrl}/<vendor>/chat/completions |
| Reached how | 🔶 see below | LiveKit Cloud agent → {aiAssistantUrl}/<vendor> (PROVEN) |
| Vendor / face | Tavus avatar in video mode | HourOne/ElevenLabs/Hume (voice) + Tavus (video) |
What's proven:
createInteractiveConversation with a namespace + configurationId).{aiAssistantUrl}/<vendor>/chat/completions — not a
caller-supplied base_url.onboarding-service exposes only /v1/chat/completions (labeled "for Tavus custom LLM") and
is a standalone re-implementation of a Genie assistant skill (ADR-008).The single open question (🔶): for a live Atlas voice turn, is onboarding-service the
endpoint that {aiAssistantUrl} resolves to (or that Genie invokes) for the onboarding
configurationId? That binding lives in two runtime/deploy values not in any repo: the
aiAssistantUrl databag, and the onboarding persona's AssistantConfiguration. Everything in
code is consistent with onboarding-service being the brain (its internal logic in §3–§9, §11 is
exactly the onboarding behavior); confirming how it's invoked is a runtime lookup.
Zooming all the way out — Vlad from first hello to a published, chat-ready twin. (The middle
stages are detailed in LEGENDS_ARCHITECTURE.md.)
Vlad's payoff: his audience talks to the published twin through persona chat (the other brain in §12) — RAG over everything Vlad taught it, wrapped in his persona prompt, streamed back as text, voice, or a Tavus video avatar.
| Pattern | Where it shows up |
|---|---|
| Deterministic harness wraps stochastic LLM | the whole StateGraph design |
| Router + specialized sub-agents | one mode = one prompt + tool set |
| ReAct loop (reason → act → observe → respond) | get-name-intent.ts tool handling |
| Tool/function calling with structured args | dolly-tools.ts (data + spokenResponse) |
| Guardrails as code + self-correction | tool-gates.ts → tool_gated rejection |
| Three-axis tool control (visibility/timing/precondition) | Genie Builder × state machine × gates |
| Speculative / optimistic execution | fire-and-forget tools + spokenResponse fast path |
| Environment events as observations | hidden-signal system |
| Externalized memory + stateless idempotent turns | CloudStore load/save per turn |
| Context engineering + hot-swappable prompts | assembleSystemPrompt + Genie assistant_hub |
| Pluggable transport in front of a fixed brain | LiveKit Cloud agent → onboarding-service |
Non-agentic backbone (the unsung 🟦): LiveKit (WebRTC rooms + Cloud agent for STT/turn/TTS), Tavus (video avatar only),
voice-service+vendor-gateway, the Genie ai-assistant endpoint, SSE streaming, Duplexer WebSocket push, CloudStore (CAS KV), ambassador RPC, auth/identity injection, and per-pod dedup. None of it is "AI" — but the agent is only reliable because this scaffolding is.
Provenance note: transport facts are from
legends-platform(LiveKit SDK),catalyst-server/ legends/{voice-service,vendor-gateway-service}, andwix-vmr-repo/legends-platform/ persona-chat-service. Brain facts are fromlegends/packages/onboarding-service(code + ADRs 002/006/008). 🔶 markers denote the one binding that lives in runtime/deploy config, not code.