The companion to
ONBOARDING_AGENTIC_DEEP_DIVE.md. Onboarding was about building the twin. This is the payoff: an audience member talks to the finished twin. We follow Mia, a founder, chatting with Vlad's published Dolly. Grounded inwix-vmr-repo/legends-platform/persona-chat-service(Scala), with the cross‑repo voice pieces fromcatalyst-server/legends/{voice-service,vendor-gateway-service}.Legend: 🟣 = the LLM brain (non‑deterministic) · 🟦 = deterministic scaffolding · 🟡 = an illustrative value I filled in · 🔶 = a link whose mechanism is proven but whose exact runtime wiring lives in deploy config / another repo (see §9).
Provenance:
persona-chat-serviceis read closely (file:line cited). The actual model inference (Genie, AI Gateway), the knowledge-service, and the external Cloud agent live in other services — where the trail goes cold I say so.
Vlad finished onboarding; his twin is published. Mia opens it and types "How do you help with fundraising?" — or speaks it to the video avatar.
The one idea to anchor on: the published twin is a RAG‑grounded persona agent, and its brain is
Genie by default. Text and voice/video are not two systems — they converge on one pipeline
(ChatService.sendChatMessage), and the vendor (Tavus, etc.) only renders tokens Genie
produced. This is the "other brain" we kept hitting in the onboarding dig — now pinned down.
| Onboarding (Atlas) | Published chat (this doc) | |
|---|---|---|
| Who talks | the creator (Vlad) | an audience member (Mia) |
| Brain | onboarding-service LangGraph |
Genie (default), behind persona-chat-service |
| Job | capture fields via tools + gates | answer questions via RAG + persona prompt |
| Memory | CloudStore per‑turn state | persisted chat messages + the persona's knowledge |
| Personality | mode guidelines | persona data JSON + behaviorSettings |
Where onboarding was a state machine driving tools, chat is retrieval‑augmented generation in the persona's voice — with optional tools so the twin keeps learning.
Two doors into persona-chat-service, one room:
SendChatMessage(chatId, …) (PersonaChatService.scala:134).GenerateChatCompletionsStreamWithVendor (PersonaChatService.scala:248) — the
vendor (Tavus/ElevenLabs) POSTs an OpenAI‑style request to
/v1/generate-completions/{vendor}/chat/completions/stream; VendorChatService maps the vendor
conversation id → chat.Both call ChatService.sendChatMessage (ChatService.scala:513) — the single pipeline that does
RAG → prompt assembly → LLM routing → stream → persist.
Code path: sendChatMessage (ChatService.scala:513) resolves config →
routeToLlmClient (:422) → (default) resolveAssistantResponse (:724) → Genie; messages are
persisted via chatMessageSDL.createBulk (:574).
The "two brains" question, resolved. One fork decides the LLM:
AssistantChatServiceClient POSTs to
${chatServiceUrl}/v1/send-message-streamed (AssistantChatServiceClient.scala:235). Genie holds
the unified‑assistant inference; persona context + knowledge + prompt are passed in.useDirectAiGateway toggle is on or a system‑prompt override is set
(ChatService.scala:744) — this is the path with tool calling wired in.base_url from the AssistantConfiguration
(ChatService.scala:423). (Note: this is the text path; the voice path's LLM url is fixed by
vendor‑gateway — see §7.):509).Bottom line: a published‑twin chat — text or voice — is answered by Genie by default, with
AI Gateway / direct / custom as configurable alternatives. 🔶 Genie's internal inference is an
external service (wix.genie.chat_service.v1); the trail goes cold at its RPC boundary.
The system prompt is built from a persona template plus layered context
(AIGatewayChatServiceClient.scala:193, Prompts.scala):
effectivePrompt =
persona template (publishedMode_PersonaTemplate.txt) # "You are [name]… respond as you naturally would"
+ <behaviorSettings>: conversationStyle, communicationTone, # the very knobs set on the Dolly
responseStyle, specialInstructions
+ renderRelevantKnowledge(RAG snippets) # facts pulled for THIS question
+ renderDynamicKnowledge(assistant-domain context)
+ renderConversationMode(...)
The template is strikingly engineered: "You are [name]… respond exactly as you would
naturally," with hard rules to never reveal the behavior settings — so Mia can't get the twin to
dump its own config. This is where the BehaviorSettings/CommunicationStyle/AnswerPreferences
that Vlad set during onboarding (and dolly-service) actually take effect.
Before the LLM call, the service retrieves the most relevant knowledge for Mia's question
(AIGatewayChatServiceClient.scala:156 → AssistantTools.scala:51):
knowledgeServiceClient.searchKnowledge(
SearchKnowledgeRequest(personaId, query, namespace, mode))
knowledge-service is the unified facade — it federates snippets + documents (Vespa) +
links (the same store onboarding seeded and the teaching phase filled, §"How is knowledge
captured" earlier). The ranked results are injected as renderRelevantKnowledge into the prompt. So
the twin answers from Vlad's actual material, not just the base model.
🔶 The knowledge-service + Vespa internals live in other services (trail cold there).
On the AI Gateway path, the assistant can call tools during the chat
(AIGatewayChatServiceClient.scala:173; tools in AssistantTools.scala):
retrievePersonaKnowledge — fetch more snippets mid‑turn if the first RAG pass wasn't enough.triggerKnowledgeSnippetCreation — fire‑and‑forget: mine new snippets from the ongoing
conversation (so a good Q&A with Mia can become future knowledge).upsertKnowledgeSnippet — create/update a snippet.This mirrors onboarding's "knowledge as a byproduct" idea: the published twin doesn't just answer — it can grow its own knowledge from conversations. (Tool calling is wired in the AI Gateway path; the default Genie path's tool orchestration is internal to Genie — 🔶.)
{aiAssistantUrl} loopThis reconciles the open 🔶 from the onboarding voice dig. When Mia talks to the video twin, the LiveKit Cloud agent (or Tavus) doesn't have its own brain — it calls back into persona‑chat‑service, which runs the same pipeline:
The reconciliation: vendor‑gateway pointed the Cloud agent's LLM at {aiAssistantUrl}/<vendor>/chat/ completions, and that endpoint is almost certainly persona‑chat‑service's
GenerateChatCompletionsStreamWithVendor (path /v1/generate-completions/{vendor}/chat/completions/ stream, VendorChatService.scala:39). It maps the vendor conversation-id →
InteractiveConversationMapping → chatId/personaId, then runs the identical RAG+prompt+Genie
pipeline and streams tokens back to the vendor to render as voice/video.
🔶 The exact {aiAssistantUrl} value is deploy config (we couldn't read it), but the path shapes
match — this is a strong inference, not a guess.
PersonaChat (chatId, personaId, mode, assistantId, configurationId)
and ChatMessage (role, content, position, previousMessageId, vendorChatId).
Stored via personaChatSDL / chatMessageSDL (ChatService.scala:183, :574).personaId resolution: from the chat record (getChat), and mirrored into Genie conversation
metadata (AssistantChatServiceClient.scala:145) and the vendor mapping for voice/video.personaId is the through‑line — the same key onboarding minted (Persona at "Create"),
knowledge attaches to, and chat retrieves by. Capture → store → retrieve, all on personaId. persona_id (the through-line)
┌───────────────────────────┴───────────────────────────┐
ONBOARDING (Atlas) PUBLISHED CHAT (this doc)
onboarding-service LangGraph persona-chat-service → Genie
capture name/intent/photo/voice RAG over knowledge + persona prompt
seeds snippets inline can mine new snippets mid-chat
transport: LiveKit + Cloud agent transport: text OR LiveKit + Cloud agent
brain: onboarding-service brain: Genie (default)
Two agents, two brains, one persona and one transport layer. Onboarding writes the twin; chat reads it (and keeps writing, via tool‑driven snippet creation).
| Pattern | Where it shows up |
|---|---|
| Text + voice converge on one pipeline | ChatService.sendChatMessage |
| RAG / grounded generation | searchKnowledge → renderRelevantKnowledge |
| Persona prompt from stored settings | publishedMode_PersonaTemplate.txt + behaviorSettings |
| "Never reveal the config" guardrail | persona template hard rules |
| Pluggable LLM backend | Genie (default) / AI Gateway / direct / CustomLlm / SPI |
| Vendor renders, doesn't think | GenerateChatCompletionsStreamWithVendor → Genie → stream back |
| Twin keeps learning | triggerKnowledgeSnippetCreation mid-chat |
persona_id as the through-line |
capture → store → retrieve |
Open 🔶 (runtime/deploy or external): Genie's internal inference, AI Gateway internals, knowledge-service + Vespa internals, and the literal
{aiAssistantUrl}value all live outsidepersona-chat-service. Everything inside persona-chat-service (routing, RAG call, prompt assembly, persistence, vendor mapping) is code-grounded with file:line above.