The chat module provides an AI-driven conversational customer assistance system for the Muni Slowbar storefront. It uses a Retrieval-Augmented Generation (RAG) architecture and function calling (tool use) to answer customer inquiries about menu items, rewards, promotions, operating hours, locations, and order statuses.
| What it does | What it does NOT do |
|---|---|
| Conversational Support: Streams natural, warm, and helpful responses to storefront users. | Financial Transactions: Cannot process payments or modify orders directly. |
| Menu & Knowledge Retrieval (RAG): Accesses uploaded documents (.txt, .pdf) using hybrid keyword + semantic vector search. | POS Admin Operations: Does not expose cashier, staff management, or store management capabilities. |
| Customer Loyalty Context: Queries customer points and rewards balances using registered tool calls. | Direct Database Mutation: Writes are strictly limited to auditing conversations/messages and uploading vector chunks; it cannot modify catalogs or user details. |
| Hallucination Prevention: Enforces strict formatting and data policies preventing the AI from fabricating prices or menu items. | |
| Request Security Guardrails: Filters query inputs to block malicious prompt injections or overly long inputs. |
This module adheres to Clean Architecture and Domain-Driven Design (DDD) principles. It is structured into four distinct layers in src/modules/chat/storefront:
src/modules/chat/storefront/
āāā domain/ # Pure Business Entities (No external libs or NestJS)
āāā infrastructure/ # Database schemas, Repositories, Module definition
āāā application/ # Use cases, Services (RAG, Caching, Guardrails), Tools
āāā presentation/ # Controllers, Request DTOs
| Layer | File / Directory | Class / Interface | Responsibility |
|---|---|---|---|
| Domain | conversation.entity.ts | Conversation |
Represents a customer chat session. |
| document.entity.ts | Document, DocumentChunk |
Represents uploaded knowledge files and their chunked sections. | |
| message.entity.ts | Message |
Represents a single message in a conversation (user, assistant, or tool). | |
| Application | chat.use-case.ts | ChatUseCase |
Core orchestration logic: runs the guardrail validations, semantic cache checks, context lookup, and the tool-resolution loop. |
| document-upload.use-case.ts | DocumentUploadUseCase |
Receives raw files (.txt or .pdf), extracts text using PDF parser, saves the file metadata, and triggers chunking. | |
| document-ingestion.use-case.ts | DocumentIngestionUseCase |
Wrapper invoking DocumentService.ingest for storing documents. |
|
| document-search.use-case.ts | DocumentSearchUseCase |
Core search endpoint wrapper executing a query string against retrieval services. | |
| document-embedding.use-case.ts | DocumentEmbeddingUseCase |
Generates semantic embeddings for legacy documents missing vectors. | |
| document.service.ts | DocumentService |
Coordinates semantic chunking (600 max size, 120 overlap), context enrichment, and embedding generation. | |
| rag.service.ts | RagService |
Formulates structured prompt context, injects temporal headers, and handles chat response streaming. | |
| hybrid-retrieval.service.ts | HybridRetrievalService |
Integrates full-text and semantic search vectors using Reciprocal Rank Fusion (RRF). | |
| guardrail.service.ts | GuardrailService |
Inspects input length (< 800 chars) and checks for prompt injections. | |
| semantic-cache.service.ts | SemanticCacheService |
Manages cosine similarity caching (0.95 threshold) for common informational requests. | |
| tool.registry.ts | ToolRegistry |
Manages registration and execution of function calling actions requested by the LLM. | |
| tool-definition.ts | TOOLS |
Defines JSON parameters and descriptions of tools exposed to OpenAI/OpenRouter. | |
| tool-bootstrap.service.ts | ToolBootstrapService |
Hooks NestJS initialization to bind specific application tools to their corresponding Use Cases. | |
| Infrastructure | storefront-chat.module.ts | StorefrontChatModule |
NestJS module declaring dependencies (Supabase, LLM, Products, Orders, Rate Limiter) and services. |
| document.repository.ts | DocumentRepository |
Executes database transactions on documents and document_chunks (including Vector, Fulltext, and Hybrid Search RPCs). |
|
| storefront-chat.repository.ts | StorefrontChatRepository |
Manages conversation rows and history message records. | |
| Presentation | storefront-chat.controller.ts | StorefrontChatController |
REST endpoints exposing RAG stream channels (using SSE), file uploads, and search controllers. |
| chat.dto.ts | ChatDto |
Validates streaming chat body payload parameters. | |
| ingest-document.dto.ts | IngestDocumentDto |
Schema for document metadata parameters. |
This workflow depicts how the ChatUseCase orchestrates security validations, checks the cache, queries hybrid search context, streams response text, and executes a multi-iteration loop to support parallel tool calls.
This workflow outlines what happens when store administrators upload knowledge base documents (.txt or .pdf) to populate the RAG context.
The module operates on four Postgres tables hosted in Supabase. Vector indexing and search operations require the pgvector Postgres extension.
documentsStores metadata and full raw contents of files uploaded for RAG.
id (uuid, Primary Key): Unique document identifier.title (text): File name (e.g., Muni_Loyalty_Policy.pdf).content (text): Complete extracted plain text.embedding (vector, 1024): Optional document-level semantic representation.created_at (timestamp with time zone).document_chunksStores isolated chunks generated from documents, coupled with high-dimensional vectors for semantic lookup.
id (uuid, Primary Key): Chunk identifier.document_id (uuid, Foreign Key -> documents.id): References parent document.chunk_index (integer): Order index of the chunk.content (text): Context-enriched text snippet (contains document title).embedding (vector, 1024): Embeddings generated via Hugging Face.created_at (timestamp with time zone).conversationsTracks chat sessions created by storefront users.
id (uuid, Primary Key): Unique session ID.created_at (timestamp with time zone).messagesMaintains a log of all interactions occurring inside a conversation.
id (bigint, Primary Key): Incremental log ID.conversation_id (uuid, Foreign Key -> conversations.id).role (text): Message source (user, assistant, or tool).content (text): Plain text content or serialized tool result.tools (text, optional): Details of tool executions.created_at (timestamp with time zone).The database utilizes three SQL procedures called through Supabase RPC. These are defined inside the DocumentRepository:
do_vector_search_document_chunks)Performs cosine distance similarity lookup.
query_embedding (vector, 1024), match_count (integer).(1 - (chunk.embedding <=> query_embedding)).do_fulltext_search_document_chunks)Uses Postgres natural language indexes to match tokens.
search_text (string), match_count (integer).to_tsvector and to_tsquery.do_hybrid_search_document_chunks)Runs Reciprocal Rank Fusion (RRF) combining the strengths of vector search and keyword match.
search_text (string), query_embedding (vector, 1024), match_count (integer).[Formula]
Where $M$ is the set of searches (Vector and Keyword), and $Rank_m(d)$ is the position rank of document chunk $d$ in search method $m$ (capped at the default limit). If a chunk is absent in a search, its rank term is evaluated as $0$.
* **Relevance Threshold:** The system enforces a strict [RagService.ts:MIN_RELEVANCE_SCORE](file:///C:/Projects/muni-backend/src/modules/chat/storefront/application/services/rag.service.ts) filter of `0.01` to drop low-ranking tail search noise prior to prompt compilation.
---
## š External Services & Configuration
To enable the RAG system and LLM responses, set the following environment parameters in the deployment context:
### 1. Chat Completion API
* **Provider:** [OpenRouter](https://openrouter.ai/)
* **Default Model:** `openrouter/owl-alpha`
* **Base URL:** `https://openrouter.ai/api/v1`
* **Config key:** `LLM_CHAT_API_KEY` (OpenRouter API Token)
### 2. Embedding Model API
* **Provider:** [Hugging Face Serverless Inference API](https://huggingface.co/docs/api-inference)
* **Model:** `BAAI/bge-large-en-v1.5`
* **Vector Dimensions:** `1024`
* **API Base Domain:** `router.huggingface.co`
* **Config key:** `HF_TOKEN` or `HUGGINGFACE_API_KEY`
> [!WARNING]
> **DNS Resolution Fallbacks:** Legacy endpoints like `api-inference.huggingface.co` are prone to DNS resolution failures (such as `getaddrinfo ENOTFOUND`). To prevent execution crashes, the embedding pipeline strictly targets the unified **Hugging Face Inference Router** domain: `https://router.huggingface.co/hf-inference/models/BAAI/bge-large-en-v1.5/pipeline/feature-extraction`.
---
## š Security & Multi-Tenancy
* **Public Visibility:** The storefront chat endpoints are explicitly marked with `@Public()` in the controller. Authenticated customer accounts are optional; anyone browsing the web catalog can query the assistant.
* **Input Length Caps:** The [GuardrailService](file:///C:/Projects/muni-backend/src/modules/chat/storefront/application/services/guardrail.service.ts) blocks inputs exceeding **800 characters** to prevent denial of service (DoS) via context token bloat.
* **Injection Protections:** Simple Regex patterns detect system instructions bypasses (e.g. `ignore all previous instructions`, `bypass restrictions`). If triggered, the request is cut short and a static security disclaimer is returned.
* **Tenant/Customer Privacy (Cache Isolation):**
* To speed up requests, the system implements semantic caching.
* To prevent cross-customer data leaks, queries showing intention to ask about accounts (e.g., containing user member codes like `MBR-XXXX` or words like `my points`, `my rewards`) are deemed ineligible for caching.
* As an absolute safety measure, if the LLM initiates a sensitive tool call (such as `get_user_rewards_info_by_member_code`) during its execution loop, the transaction is marked as containing **Personal Data**, and its final response is blocked from being saved to the shared semantic cache.
---
## š§ Services Deep-Dive
### 1. `RagService` & Prompt Engineering
The system prompt in [rag.service.ts](file:///C:/Projects/muni-backend/src/modules/chat/storefront/application/services/rag.service.ts) enforces several strict policies:
* **Temporal Context Integration:** Before every query, the system fetches the current server date and time, translating it to the Manila Timezone (`Asia/Manila`). The LLM compares this temporal envelope against catalog schedules or campaign rules.
* **Source of Truth Ordering:**
1. *Tool Execution Outputs* (highest priority for real-time menu query results or loyalty statuses).
2. *RAG Context documents* (priority for general rules and policies).
* **Response Style Guidelines:**
* The assistant must converse as a warm, human member of the Muni Slowbar team.
* **Prohibited Vocabulary:** Under no circumstances should the LLM mention terms like `"tools"`, `"databases"`, `"retrieved documents"`, `"provided context"`, or `"knowledge base"`. It must never write phrases like *"Based on the information available..."*.
* **Hallucination Prevention:** If facts are not present in either the RAG context or the tool results, the model must output a friendly customer-facing disclaimer: *"I don't currently have that information available."* It is strictly forbidden from estimating or guessing.
### 2. `SemanticCacheService`
Matches user queries against previous answers using vector cosine similarity.
* **Threshold:** Cosmic similarity score must be `>= 0.95`.
* **TTL:** Cache items expire after **10 minutes** (`defaultTtlMs = 600,000 ms`).
* **DI Providers:** Swappable implementation through [cache-provider.interface.ts](file:///C:/Projects/muni-backend/src/modules/chat/storefront/application/services/cache-provider.interface.ts). Standard configurations include `InMemoryCacheProvider` (process-local) and `NoopCacheProvider` (cache disabled).
---
## š ļø Tool Registry & Bootstrap
Function calling enables the storefront AI to interact with live core business services. Mappings between LLM tool definitions and use cases are established in the [ToolBootstrapService](file:///C:/Projects/muni-backend/src/modules/chat/storefront/application/tools/tool-bootstrap.service.ts):
| Tool Name | Parameters | Core Use Case / Service | Caching Restriction |
| :--- | :--- | :--- | :--- |
| `search_knowledge_base` | `query` (string) | [SearchKnowledgeBaseTool](file:///C:/Projects/muni-backend/src/modules/chat/storefront/application/use-cases/search-knowledge-base.use-case.ts) | Eligible |
| `get_products` | `search` (string, optional) | [SearchProductUseCase](file:///C:/Projects/muni-backend/src/modules/chat/storefront/application/use-cases/search_product.use-case.ts) | Eligible |
| `get_user_rewards_info_by_member_code` | `memberCode` (string) | [GetRewardByMemberCodeUseCase](file:///C:/Projects/muni-backend/src/modules/loyalty/application/use-cases/get-reward-by-member-code.use-case.ts) | **Strictly Blocked** (Contains personal customer data) |
| `get_order_status` | `orderCode` (string), `storeId` (string, optional) | [GetOrderStatusUseCase](file:///C:/Projects/muni-backend/src/modules/chat/storefront/application/use-cases/get-order-status.use-case.ts) | Eligible (but tool registration is currently commented out in bootstrap) |
### Registering a New Tool
To expose a new operation to the storefront chat assistant:
1. Add the JSON definition schema in the `TOOLS` array within [tool-definition.ts](file:///C:/Projects/muni-backend/src/modules/chat/storefront/application/tools/tool-definition.ts).
2. If the tool accesses customer-specific information, add its string name to `PERSONAL_DATA_TOOLS` inside [chat.use-case.ts](file:///C:/Projects/muni-backend/src/modules/chat/storefront/application/use-cases/chat.use-case.ts).
3. Inject the relevant business Use Case in [tool-bootstrap.service.ts](file:///C:/Projects/muni-backend/src/modules/chat/storefront/application/tools/tool-bootstrap.service.ts) and register the tool handler inside the `onModuleInit()` hook:
```typescript
this.registry.register('my_new_tool_name', async (args) => {
const { param1 } = JSON.parse(JSON.stringify(args));
return await this.myNewUseCase.execute(param1);
});
```