Storefront Chat Module

The chat module provides an AI-driven conversational customer assistance system for the Muni Slowbar storefront. It uses a Retrieval-Augmented Generation (RAG) architecture and function calling (tool use) to answer customer inquiries about menu items, rewards, promotions, operating hours, locations, and order statuses.

🎯 Responsibilities

What it does	What it does NOT do
Conversational Support: Streams natural, warm, and helpful responses to storefront users.	Financial Transactions: Cannot process payments or modify orders directly.
Menu & Knowledge Retrieval (RAG): Accesses uploaded documents (.txt, .pdf) using hybrid keyword + semantic vector search.	POS Admin Operations: Does not expose cashier, staff management, or store management capabilities.
Customer Loyalty Context: Queries customer points and rewards balances using registered tool calls.	Direct Database Mutation: Writes are strictly limited to auditing conversations/messages and uploading vector chunks; it cannot modify catalogs or user details.
Hallucination Prevention: Enforces strict formatting and data policies preventing the AI from fabricating prices or menu items.
Request Security Guardrails: Filters query inputs to block malicious prompt injections or overly long inputs.

🏗️ Module Architecture Blueprint

This module adheres to Clean Architecture and Domain-Driven Design (DDD) principles. It is structured into four distinct layers in src/modules/chat/storefront:

src/modules/chat/storefront/
├── domain/                  # Pure Business Entities (No external libs or NestJS)
├── infrastructure/          # Database schemas, Repositories, Module definition
├── application/             # Use cases, Services (RAG, Caching, Guardrails), Tools
└── presentation/            # Controllers, Request DTOs

Layer Details & File Breakdown

Layer	File / Directory	Class / Interface	Responsibility
Domain	conversation.entity.ts	`Conversation`	Represents a customer chat session.
	document.entity.ts	`Document`, `DocumentChunk`	Represents uploaded knowledge files and their chunked sections.
	message.entity.ts	`Message`	Represents a single message in a conversation (user, assistant, or tool).
Application	chat.use-case.ts	`ChatUseCase`	Core orchestration logic: runs the guardrail validations, semantic cache checks, context lookup, and the tool-resolution loop.
	document-upload.use-case.ts	`DocumentUploadUseCase`	Receives raw files (.txt or .pdf), extracts text using PDF parser, saves the file metadata, and triggers chunking.
	document-ingestion.use-case.ts	`DocumentIngestionUseCase`	Wrapper invoking `DocumentService.ingest` for storing documents.
	document-search.use-case.ts	`DocumentSearchUseCase`	Core search endpoint wrapper executing a query string against retrieval services.
	document-embedding.use-case.ts	`DocumentEmbeddingUseCase`	Generates semantic embeddings for legacy documents missing vectors.
	document.service.ts	`DocumentService`	Coordinates semantic chunking (600 max size, 120 overlap), context enrichment, and embedding generation.
	rag.service.ts	`RagService`	Formulates structured prompt context, injects temporal headers, and handles chat response streaming.
	hybrid-retrieval.service.ts	`HybridRetrievalService`	Integrates full-text and semantic search vectors using Reciprocal Rank Fusion (RRF).
	guardrail.service.ts	`GuardrailService`	Inspects input length (< 800 chars) and checks for prompt injections.
	semantic-cache.service.ts	`SemanticCacheService`	Manages cosine similarity caching (0.95 threshold) for common informational requests.
	tool.registry.ts	`ToolRegistry`	Manages registration and execution of function calling actions requested by the LLM.
	tool-definition.ts	`TOOLS`	Defines JSON parameters and descriptions of tools exposed to OpenAI/OpenRouter.
	tool-bootstrap.service.ts	`ToolBootstrapService`	Hooks NestJS initialization to bind specific application tools to their corresponding Use Cases.
Infrastructure	storefront-chat.module.ts	`StorefrontChatModule`	NestJS module declaring dependencies (Supabase, LLM, Products, Orders, Rate Limiter) and services.
	document.repository.ts	`DocumentRepository`	Executes database transactions on `documents` and `document_chunks` (including Vector, Fulltext, and Hybrid Search RPCs).
	storefront-chat.repository.ts	`StorefrontChatRepository`	Manages conversation rows and history message records.
Presentation	storefront-chat.controller.ts	`StorefrontChatController`	REST endpoints exposing RAG stream channels (using SSE), file uploads, and search controllers.
	chat.dto.ts	`ChatDto`	Validates streaming chat body payload parameters.
	ingest-document.dto.ts	`IngestDocumentDto`	Schema for document metadata parameters.

🔄 Core Workflows

1. Chat Execution & Parallel Tool Calling Loop

This workflow depicts how the ChatUseCase orchestrates security validations, checks the cache, queries hybrid search context, streams response text, and executes a multi-iteration loop to support parallel tool calls.

[Diagram]

2. Document Ingestion, Chunking & Semantic Vectorization

This workflow outlines what happens when store administrators upload knowledge base documents (.txt or .pdf) to populate the RAG context.

[Diagram]

📁 Data Access & Database Schema

The module operates on four Postgres tables hosted in Supabase. Vector indexing and search operations require the pgvector Postgres extension.

1. Database Table Details

`documents`

Stores metadata and full raw contents of files uploaded for RAG.

id (uuid, Primary Key): Unique document identifier.
title (text): File name (e.g., Muni_Loyalty_Policy.pdf).
content (text): Complete extracted plain text.
embedding (vector, 1024): Optional document-level semantic representation.
created_at (timestamp with time zone).

`document_chunks`

Stores isolated chunks generated from documents, coupled with high-dimensional vectors for semantic lookup.

id (uuid, Primary Key): Chunk identifier.
document_id (uuid, Foreign Key -> documents.id): References parent document.
chunk_index (integer): Order index of the chunk.
content (text): Context-enriched text snippet (contains document title).
embedding (vector, 1024): Embeddings generated via Hugging Face.
created_at (timestamp with time zone).

`conversations`

Tracks chat sessions created by storefront users.

id (uuid, Primary Key): Unique session ID.
created_at (timestamp with time zone).

`messages`

Maintains a log of all interactions occurring inside a conversation.

id (bigint, Primary Key): Incremental log ID.
conversation_id (uuid, Foreign Key -> conversations.id).
role (text): Message source (user, assistant, or tool).
content (text): Plain text content or serialized tool result.
tools (text, optional): Details of tool executions.
created_at (timestamp with time zone).

2. Search Store Procedures (RPCs)

The database utilizes three SQL procedures called through Supabase RPC. These are defined inside the DocumentRepository:

A. Vector Search (`do_vector_search_document_chunks`)

Performs cosine distance similarity lookup.

Parameters: query_embedding (vector, 1024), match_count (integer).
Query logic: Selects chunks, calculating similarity score as (1 - (chunk.embedding <=> query_embedding)).

B. Full-Text Search (`do_fulltext_search_document_chunks`)

Uses Postgres natural language indexes to match tokens.

Parameters: search_text (string), match_count (integer).
Query logic: Executes text matches using Postgres to_tsvector and to_tsquery.

C. Hybrid Search (`do_hybrid_search_document_chunks`)

Runs Reciprocal Rank Fusion (RRF) combining the strengths of vector search and keyword match.

Parameters: search_text (string), query_embedding (vector, 1024), match_count (integer).
Mathematical Formula:
[Formula]

[Formula]

    Where $M$ is the set of searches (Vector and Keyword), and $Rank_m(d)$ is the position rank of document chunk $d$ in search method $m$ (capped at the default limit). If a chunk is absent in a search, its rank term is evaluated as $0$.
*   **Relevance Threshold:** The system enforces a strict [RagService.ts:MIN_RELEVANCE_SCORE](file:///C:/Projects/muni-backend/src/modules/chat/storefront/application/services/rag.service.ts) filter of `0.01` to drop low-ranking tail search noise prior to prompt compilation.

---

## 🔌 External Services & Configuration

To enable the RAG system and LLM responses, set the following environment parameters in the deployment context:

### 1. Chat Completion API
*   **Provider:** [OpenRouter](https://openrouter.ai/)
*   **Default Model:** `openrouter/owl-alpha`
*   **Base URL:** `https://openrouter.ai/api/v1`
*   **Config key:** `LLM_CHAT_API_KEY` (OpenRouter API Token)

### 2. Embedding Model API
*   **Provider:** [Hugging Face Serverless Inference API](https://huggingface.co/docs/api-inference)
*   **Model:** `BAAI/bge-large-en-v1.5`
*   **Vector Dimensions:** `1024`
*   **API Base Domain:** `router.huggingface.co`
*   **Config key:** `HF_TOKEN` or `HUGGINGFACE_API_KEY`

> [!WARNING]
> **DNS Resolution Fallbacks:** Legacy endpoints like `api-inference.huggingface.co` are prone to DNS resolution failures (such as `getaddrinfo ENOTFOUND`). To prevent execution crashes, the embedding pipeline strictly targets the unified **Hugging Face Inference Router** domain: `https://router.huggingface.co/hf-inference/models/BAAI/bge-large-en-v1.5/pipeline/feature-extraction`.

---

## 🔐 Security & Multi-Tenancy

*   **Public Visibility:** The storefront chat endpoints are explicitly marked with `@Public()` in the controller. Authenticated customer accounts are optional; anyone browsing the web catalog can query the assistant.
*   **Input Length Caps:** The [GuardrailService](file:///C:/Projects/muni-backend/src/modules/chat/storefront/application/services/guardrail.service.ts) blocks inputs exceeding **800 characters** to prevent denial of service (DoS) via context token bloat.
*   **Injection Protections:** Simple Regex patterns detect system instructions bypasses (e.g. `ignore all previous instructions`, `bypass restrictions`). If triggered, the request is cut short and a static security disclaimer is returned.
*   **Tenant/Customer Privacy (Cache Isolation):**
    *   To speed up requests, the system implements semantic caching.
    *   To prevent cross-customer data leaks, queries showing intention to ask about accounts (e.g., containing user member codes like `MBR-XXXX` or words like `my points`, `my rewards`) are deemed ineligible for caching.
    *   As an absolute safety measure, if the LLM initiates a sensitive tool call (such as `get_user_rewards_info_by_member_code`) during its execution loop, the transaction is marked as containing **Personal Data**, and its final response is blocked from being saved to the shared semantic cache.

---

## 🧠 Services Deep-Dive

### 1. `RagService` & Prompt Engineering

The system prompt in [rag.service.ts](file:///C:/Projects/muni-backend/src/modules/chat/storefront/application/services/rag.service.ts) enforces several strict policies:

*   **Temporal Context Integration:** Before every query, the system fetches the current server date and time, translating it to the Manila Timezone (`Asia/Manila`). The LLM compares this temporal envelope against catalog schedules or campaign rules.
*   **Source of Truth Ordering:**
    1.  *Tool Execution Outputs* (highest priority for real-time menu query results or loyalty statuses).
    2.  *RAG Context documents* (priority for general rules and policies).
*   **Response Style Guidelines:**
    *   The assistant must converse as a warm, human member of the Muni Slowbar team.
    *   **Prohibited Vocabulary:** Under no circumstances should the LLM mention terms like `"tools"`, `"databases"`, `"retrieved documents"`, `"provided context"`, or `"knowledge base"`. It must never write phrases like *"Based on the information available..."*.
*   **Hallucination Prevention:** If facts are not present in either the RAG context or the tool results, the model must output a friendly customer-facing disclaimer: *"I don't currently have that information available."* It is strictly forbidden from estimating or guessing.

### 2. `SemanticCacheService`

Matches user queries against previous answers using vector cosine similarity.
*   **Threshold:** Cosmic similarity score must be `>= 0.95`.
*   **TTL:** Cache items expire after **10 minutes** (`defaultTtlMs = 600,000 ms`).
*   **DI Providers:** Swappable implementation through [cache-provider.interface.ts](file:///C:/Projects/muni-backend/src/modules/chat/storefront/application/services/cache-provider.interface.ts). Standard configurations include `InMemoryCacheProvider` (process-local) and `NoopCacheProvider` (cache disabled).

---

## 🛠️ Tool Registry & Bootstrap

Function calling enables the storefront AI to interact with live core business services. Mappings between LLM tool definitions and use cases are established in the [ToolBootstrapService](file:///C:/Projects/muni-backend/src/modules/chat/storefront/application/tools/tool-bootstrap.service.ts):

| Tool Name | Parameters | Core Use Case / Service | Caching Restriction |
| :--- | :--- | :--- | :--- |
| `search_knowledge_base` | `query` (string) | [SearchKnowledgeBaseTool](file:///C:/Projects/muni-backend/src/modules/chat/storefront/application/use-cases/search-knowledge-base.use-case.ts) | Eligible |
| `get_products` | `search` (string, optional) | [SearchProductUseCase](file:///C:/Projects/muni-backend/src/modules/chat/storefront/application/use-cases/search_product.use-case.ts) | Eligible |
| `get_user_rewards_info_by_member_code` | `memberCode` (string) | [GetRewardByMemberCodeUseCase](file:///C:/Projects/muni-backend/src/modules/loyalty/application/use-cases/get-reward-by-member-code.use-case.ts) | **Strictly Blocked** (Contains personal customer data) |
| `get_order_status` | `orderCode` (string), `storeId` (string, optional) | [GetOrderStatusUseCase](file:///C:/Projects/muni-backend/src/modules/chat/storefront/application/use-cases/get-order-status.use-case.ts) | Eligible (but tool registration is currently commented out in bootstrap) |

### Registering a New Tool
To expose a new operation to the storefront chat assistant:
1.  Add the JSON definition schema in the `TOOLS` array within [tool-definition.ts](file:///C:/Projects/muni-backend/src/modules/chat/storefront/application/tools/tool-definition.ts).
2.  If the tool accesses customer-specific information, add its string name to `PERSONAL_DATA_TOOLS` inside [chat.use-case.ts](file:///C:/Projects/muni-backend/src/modules/chat/storefront/application/use-cases/chat.use-case.ts).
3.  Inject the relevant business Use Case in [tool-bootstrap.service.ts](file:///C:/Projects/muni-backend/src/modules/chat/storefront/application/tools/tool-bootstrap.service.ts) and register the tool handler inside the `onModuleInit()` hook:
    ```typescript
    this.registry.register('my_new_tool_name', async (args) => {
      const { param1 } = JSON.parse(JSON.stringify(args));
      return await this.myNewUseCase.execute(param1);
    });
    ```