Storefront Chat Module

The chat module provides an AI-driven conversational customer assistance system for the Muni Slowbar storefront. It uses a Retrieval-Augmented Generation (RAG) architecture and function calling (tool use) to answer customer inquiries about menu items, rewards, promotions, operating hours, locations, and order statuses.


šŸŽÆ Responsibilities

What it does What it does NOT do
Conversational Support: Streams natural, warm, and helpful responses to storefront users. Financial Transactions: Cannot process payments or modify orders directly.
Menu & Knowledge Retrieval (RAG): Accesses uploaded documents (.txt, .pdf) using hybrid keyword + semantic vector search. POS Admin Operations: Does not expose cashier, staff management, or store management capabilities.
Customer Loyalty Context: Queries customer points and rewards balances using registered tool calls. Direct Database Mutation: Writes are strictly limited to auditing conversations/messages and uploading vector chunks; it cannot modify catalogs or user details.
Hallucination Prevention: Enforces strict formatting and data policies preventing the AI from fabricating prices or menu items.
Request Security Guardrails: Filters query inputs to block malicious prompt injections or overly long inputs.

šŸ—ļø Module Architecture Blueprint

This module adheres to Clean Architecture and Domain-Driven Design (DDD) principles. It is structured into four distinct layers in src/modules/chat/storefront:

src/modules/chat/storefront/
ā”œā”€ā”€ domain/                  # Pure Business Entities (No external libs or NestJS)
ā”œā”€ā”€ infrastructure/          # Database schemas, Repositories, Module definition
ā”œā”€ā”€ application/             # Use cases, Services (RAG, Caching, Guardrails), Tools
└── presentation/            # Controllers, Request DTOs

Layer Details & File Breakdown

Layer File / Directory Class / Interface Responsibility
Domain conversation.entity.ts Conversation Represents a customer chat session.
document.entity.ts Document, DocumentChunk Represents uploaded knowledge files and their chunked sections.
message.entity.ts Message Represents a single message in a conversation (user, assistant, or tool).
Application chat.use-case.ts ChatUseCase Core orchestration logic: runs the guardrail validations, semantic cache checks, context lookup, and the tool-resolution loop.
document-upload.use-case.ts DocumentUploadUseCase Receives raw files (.txt or .pdf), extracts text using PDF parser, saves the file metadata, and triggers chunking.
document-ingestion.use-case.ts DocumentIngestionUseCase Wrapper invoking DocumentService.ingest for storing documents.
document-search.use-case.ts DocumentSearchUseCase Core search endpoint wrapper executing a query string against retrieval services.
document-embedding.use-case.ts DocumentEmbeddingUseCase Generates semantic embeddings for legacy documents missing vectors.
document.service.ts DocumentService Coordinates semantic chunking (600 max size, 120 overlap), context enrichment, and embedding generation.
rag.service.ts RagService Formulates structured prompt context, injects temporal headers, and handles chat response streaming.
hybrid-retrieval.service.ts HybridRetrievalService Integrates full-text and semantic search vectors using Reciprocal Rank Fusion (RRF).
guardrail.service.ts GuardrailService Inspects input length (< 800 chars) and checks for prompt injections.
semantic-cache.service.ts SemanticCacheService Manages cosine similarity caching (0.95 threshold) for common informational requests.
tool.registry.ts ToolRegistry Manages registration and execution of function calling actions requested by the LLM.
tool-definition.ts TOOLS Defines JSON parameters and descriptions of tools exposed to OpenAI/OpenRouter.
tool-bootstrap.service.ts ToolBootstrapService Hooks NestJS initialization to bind specific application tools to their corresponding Use Cases.
Infrastructure storefront-chat.module.ts StorefrontChatModule NestJS module declaring dependencies (Supabase, LLM, Products, Orders, Rate Limiter) and services.
document.repository.ts DocumentRepository Executes database transactions on documents and document_chunks (including Vector, Fulltext, and Hybrid Search RPCs).
storefront-chat.repository.ts StorefrontChatRepository Manages conversation rows and history message records.
Presentation storefront-chat.controller.ts StorefrontChatController REST endpoints exposing RAG stream channels (using SSE), file uploads, and search controllers.
chat.dto.ts ChatDto Validates streaming chat body payload parameters.
ingest-document.dto.ts IngestDocumentDto Schema for document metadata parameters.

šŸ”„ Core Workflows

1. Chat Execution & Parallel Tool Calling Loop

This workflow depicts how the ChatUseCase orchestrates security validations, checks the cache, queries hybrid search context, streams response text, and executes a multi-iteration loop to support parallel tool calls.

[Diagram]

2. Document Ingestion, Chunking & Semantic Vectorization

This workflow outlines what happens when store administrators upload knowledge base documents (.txt or .pdf) to populate the RAG context.

[Diagram]

šŸ“ Data Access & Database Schema

The module operates on four Postgres tables hosted in Supabase. Vector indexing and search operations require the pgvector Postgres extension.

1. Database Table Details

documents

Stores metadata and full raw contents of files uploaded for RAG.

document_chunks

Stores isolated chunks generated from documents, coupled with high-dimensional vectors for semantic lookup.

conversations

Tracks chat sessions created by storefront users.

messages

Maintains a log of all interactions occurring inside a conversation.

2. Search Store Procedures (RPCs)

The database utilizes three SQL procedures called through Supabase RPC. These are defined inside the DocumentRepository:

A. Vector Search (do_vector_search_document_chunks)

Performs cosine distance similarity lookup.

B. Full-Text Search (do_fulltext_search_document_chunks)

Uses Postgres natural language indexes to match tokens.

C. Hybrid Search (do_hybrid_search_document_chunks)

Runs Reciprocal Rank Fusion (RRF) combining the strengths of vector search and keyword match.

[Formula]

    Where $M$ is the set of searches (Vector and Keyword), and $Rank_m(d)$ is the position rank of document chunk $d$ in search method $m$ (capped at the default limit). If a chunk is absent in a search, its rank term is evaluated as $0$.
*   **Relevance Threshold:** The system enforces a strict [RagService.ts:MIN_RELEVANCE_SCORE](file:///C:/Projects/muni-backend/src/modules/chat/storefront/application/services/rag.service.ts) filter of `0.01` to drop low-ranking tail search noise prior to prompt compilation.

---

## šŸ”Œ External Services & Configuration

To enable the RAG system and LLM responses, set the following environment parameters in the deployment context:

### 1. Chat Completion API
*   **Provider:** [OpenRouter](https://openrouter.ai/)
*   **Default Model:** `openrouter/owl-alpha`
*   **Base URL:** `https://openrouter.ai/api/v1`
*   **Config key:** `LLM_CHAT_API_KEY` (OpenRouter API Token)

### 2. Embedding Model API
*   **Provider:** [Hugging Face Serverless Inference API](https://huggingface.co/docs/api-inference)
*   **Model:** `BAAI/bge-large-en-v1.5`
*   **Vector Dimensions:** `1024`
*   **API Base Domain:** `router.huggingface.co`
*   **Config key:** `HF_TOKEN` or `HUGGINGFACE_API_KEY`

> [!WARNING]
> **DNS Resolution Fallbacks:** Legacy endpoints like `api-inference.huggingface.co` are prone to DNS resolution failures (such as `getaddrinfo ENOTFOUND`). To prevent execution crashes, the embedding pipeline strictly targets the unified **Hugging Face Inference Router** domain: `https://router.huggingface.co/hf-inference/models/BAAI/bge-large-en-v1.5/pipeline/feature-extraction`.

---

## šŸ” Security & Multi-Tenancy

*   **Public Visibility:** The storefront chat endpoints are explicitly marked with `@Public()` in the controller. Authenticated customer accounts are optional; anyone browsing the web catalog can query the assistant.
*   **Input Length Caps:** The [GuardrailService](file:///C:/Projects/muni-backend/src/modules/chat/storefront/application/services/guardrail.service.ts) blocks inputs exceeding **800 characters** to prevent denial of service (DoS) via context token bloat.
*   **Injection Protections:** Simple Regex patterns detect system instructions bypasses (e.g. `ignore all previous instructions`, `bypass restrictions`). If triggered, the request is cut short and a static security disclaimer is returned.
*   **Tenant/Customer Privacy (Cache Isolation):**
    *   To speed up requests, the system implements semantic caching.
    *   To prevent cross-customer data leaks, queries showing intention to ask about accounts (e.g., containing user member codes like `MBR-XXXX` or words like `my points`, `my rewards`) are deemed ineligible for caching.
    *   As an absolute safety measure, if the LLM initiates a sensitive tool call (such as `get_user_rewards_info_by_member_code`) during its execution loop, the transaction is marked as containing **Personal Data**, and its final response is blocked from being saved to the shared semantic cache.

---

## 🧠 Services Deep-Dive

### 1. `RagService` & Prompt Engineering

The system prompt in [rag.service.ts](file:///C:/Projects/muni-backend/src/modules/chat/storefront/application/services/rag.service.ts) enforces several strict policies:

*   **Temporal Context Integration:** Before every query, the system fetches the current server date and time, translating it to the Manila Timezone (`Asia/Manila`). The LLM compares this temporal envelope against catalog schedules or campaign rules.
*   **Source of Truth Ordering:**
    1.  *Tool Execution Outputs* (highest priority for real-time menu query results or loyalty statuses).
    2.  *RAG Context documents* (priority for general rules and policies).
*   **Response Style Guidelines:**
    *   The assistant must converse as a warm, human member of the Muni Slowbar team.
    *   **Prohibited Vocabulary:** Under no circumstances should the LLM mention terms like `"tools"`, `"databases"`, `"retrieved documents"`, `"provided context"`, or `"knowledge base"`. It must never write phrases like *"Based on the information available..."*.
*   **Hallucination Prevention:** If facts are not present in either the RAG context or the tool results, the model must output a friendly customer-facing disclaimer: *"I don't currently have that information available."* It is strictly forbidden from estimating or guessing.

### 2. `SemanticCacheService`

Matches user queries against previous answers using vector cosine similarity.
*   **Threshold:** Cosmic similarity score must be `>= 0.95`.
*   **TTL:** Cache items expire after **10 minutes** (`defaultTtlMs = 600,000 ms`).
*   **DI Providers:** Swappable implementation through [cache-provider.interface.ts](file:///C:/Projects/muni-backend/src/modules/chat/storefront/application/services/cache-provider.interface.ts). Standard configurations include `InMemoryCacheProvider` (process-local) and `NoopCacheProvider` (cache disabled).

---

## šŸ› ļø Tool Registry & Bootstrap

Function calling enables the storefront AI to interact with live core business services. Mappings between LLM tool definitions and use cases are established in the [ToolBootstrapService](file:///C:/Projects/muni-backend/src/modules/chat/storefront/application/tools/tool-bootstrap.service.ts):

| Tool Name | Parameters | Core Use Case / Service | Caching Restriction |
| :--- | :--- | :--- | :--- |
| `search_knowledge_base` | `query` (string) | [SearchKnowledgeBaseTool](file:///C:/Projects/muni-backend/src/modules/chat/storefront/application/use-cases/search-knowledge-base.use-case.ts) | Eligible |
| `get_products` | `search` (string, optional) | [SearchProductUseCase](file:///C:/Projects/muni-backend/src/modules/chat/storefront/application/use-cases/search_product.use-case.ts) | Eligible |
| `get_user_rewards_info_by_member_code` | `memberCode` (string) | [GetRewardByMemberCodeUseCase](file:///C:/Projects/muni-backend/src/modules/loyalty/application/use-cases/get-reward-by-member-code.use-case.ts) | **Strictly Blocked** (Contains personal customer data) |
| `get_order_status` | `orderCode` (string), `storeId` (string, optional) | [GetOrderStatusUseCase](file:///C:/Projects/muni-backend/src/modules/chat/storefront/application/use-cases/get-order-status.use-case.ts) | Eligible (but tool registration is currently commented out in bootstrap) |

### Registering a New Tool
To expose a new operation to the storefront chat assistant:
1.  Add the JSON definition schema in the `TOOLS` array within [tool-definition.ts](file:///C:/Projects/muni-backend/src/modules/chat/storefront/application/tools/tool-definition.ts).
2.  If the tool accesses customer-specific information, add its string name to `PERSONAL_DATA_TOOLS` inside [chat.use-case.ts](file:///C:/Projects/muni-backend/src/modules/chat/storefront/application/use-cases/chat.use-case.ts).
3.  Inject the relevant business Use Case in [tool-bootstrap.service.ts](file:///C:/Projects/muni-backend/src/modules/chat/storefront/application/tools/tool-bootstrap.service.ts) and register the tool handler inside the `onModuleInit()` hook:
    ```typescript
    this.registry.register('my_new_tool_name', async (args) => {
      const { param1 } = JSON.parse(JSON.stringify(args));
      return await this.myNewUseCase.execute(param1);
    });
    ```