InnoGen AP — Solution Architecture Document

Version 1.0.0 | June 2026 | InnoWave360 Consulting


Table of Contents

  1. Executive Summary
  2. System Overview & Design Principles
  3. End-to-End Architecture Flowchart
  4. PII Masking Layer
  5. Agent Architecture
  6. Data Models & State Schema
  7. FastAPI Service Layer
  8. Infrastructure & Docker Architecture
  9. Key Sequence Flows
  10. Technology Stack
  11. Matching Engine Detail
  12. Non-Functional Requirements
  13. Deployment Environments
  14. Success Metrics & KPIs

1. Executive Summary

InnoGen AP is an AI-native, agentic Accounts Payable platform. It automates the full invoice-to-pay lifecycle by orchestrating a network of specialised AI agents — each responsible for a discrete processing step — from inbox monitoring and OCR extraction through PO/GRN matching, compliance checks, exception management, cost allocation, accounting entry generation, and ERP posting.

The platform is built on LangGraph for stateful agent orchestration, Pydantic AI for type-safe agent definitions and tool calling, PostgreSQL for transactional persistence, and FastAPI as the API service layer. All components are containerised with Docker Compose. A PII masking layer is embedded early in the ingestion pipeline is a first-class architectural component.


2. System Overview & Design Principles

InnoGen AP operates as a headless intelligence engine. Business-facing UI (AP portals, dashboards, approval workflows) are consumers of the FastAPI service layer and are out of scope for this document.

Design Principles

Principle Description
Agent Isolation Each agent is stateless between invocations. Communication flows only through the LangGraph state graph and typed Pydantic models.
PII-First Design All inbound documents pass through a PII Masking Agent before any data leaves the secure ingestion boundary.
Confidence-Gated Routing Every agent emits a confidence score. Scores below thresholds gate to human-in-the-loop queues.
Idempotent Processing All processing steps are idempotent — enforced via invoice hash deduplication.
Audit-Native Every agent decision, tool call, score, and state transition is persisted to an immutable audit log.
ERP-Agnostic The ERP adapter layer abstracts SAP / Oracle / Dynamics behind a canonical document schema.
Dockerised Everything All services run as Docker containers orchestrated via Docker Compose.

3. End-to-End Architecture Flowchart

[Diagram]

4. PII Masking Layer

The PII Masking Layer is interposed between Intake & Ingestion and Document Processing. It ensures personally identifiable and commercially sensitive data is never transmitted to AI inference services in raw form.

4.1 Entity Masking Map

Entity Type Examples Masking Strategy
Vendor PII Vendor name, address, contact name Token substitution (VENDOR_001)
Financial IDs Bank account, IBAN, SWIFT Deterministic hash token
Tax Identifiers PAN, TAN, ABN, VAT, GST registration Category token + vault reference
Invoice Numbers Vendor invoice numbers Pseudonymised (INV_TOKEN_xxx)
PO Numbers Purchase order identifiers Pseudonymised (PO_TOKEN_xxx)
Email Addresses AP contact, vendor contact CONTACT_EMAIL_n
Phone Numbers Phone / fax patterns PHONE_n

4.2 PII Masking Sequence

[Diagram]

4.3 De-masking on Output

[Diagram]

5. Agent Architecture

5.1 LangGraph StateGraph

[Diagram]

5.2 Pydantic AI Agent Pattern

from pydantic_ai import Agent
from pydantic_ai.models.anthropic import AnthropicModel
from models.invoice import ExtractionOutput, InvoiceContext

extraction_agent = Agent(
    model=AnthropicModel("claude-sonnet-4-6"),
    result_type=ExtractionOutput,
    system_prompt=EXTRACTION_SYSTEM_PROMPT,
    retries=3,
)

@extraction_agent.tool
async def fetch_po_reference(ctx: RunContext[InvoiceContext], po_number: str) -> POData:
    """Fetch PO details from database for cross-reference during extraction."""
    return await ctx.deps.db.get_po(po_number)

@extraction_agent.tool
async def lookup_vendor_master(ctx: RunContext[InvoiceContext], vendor_token: str) -> VendorData:
    """Look up vendor master record by masked token."""
    return await ctx.deps.db.get_vendor_by_token(vendor_token)

5.3 Agent Catalogue

Agent LangGraph Node Key Tools Outputs
Intake Agent intake email_reader, file_classifier RawDocument, source_meta
PII Masking Agent pii_mask ner_engine, vault_write MaskedDocument, pii_token_map
Extraction Agent extract fetch_po_reference, lookup_vendor ExtractionOutput + confidence
Validation Agent validate tax_lookup, field_rules_engine ValidationResult per field
Matching Agent match fetch_po, fetch_grn, tolerance_check MatchResult (2-way/3-way)
Audit Agent audit duplicate_check, fraud_score, gst_validate AuditResult
Cost Allocation Agent cost_alloc gl_rules_lookup, history_similarity CostAllocationResult
Accounting Agent accounting journal_template, accrual_engine JournalEntry (ERP-ready)
Exception Agent exception queue_assign, sla_compute, notify ExceptionRecord
ERP Integration Agent erp_post erp_adapter, pii_unmask ERPPostingResult
Insights Agent insights kpi_emit, metrics_publish KPI events

6. Data Models & State Schema

6.1 InvoiceState (LangGraph)

# models/state.py
from typing import TypedDict, Annotated
import operator

class InvoiceState(TypedDict):
    invoice_id:        str
    raw_document:      RawDocument
    masked_document:   MaskedDocument | None
    pii_token_map:     dict[str, str] | None
    extraction:        ExtractionOutput | None
    validation:        ValidationResult | None
    match_result:      MatchResult | None
    audit_result:      AuditResult | None
    cost_allocation:   CostAllocationResult | None
    journal_entry:     JournalEntry | None
    exception_records: Annotated[list[ExceptionRecord], operator.add]
    erp_result:        ERPPostingResult | None
    audit_trail:       Annotated[list[AuditEvent], operator.add]
    current_step:      ProcessingStep
    human_approved:    bool
    override_reason:   str | None

6.2 Core Pydantic Models

# models/invoice.py
class ExtractionOutput(BaseModel):
    vendor_token: str                     # masked vendor reference
    invoice_number_token: str             # masked invoice number
    invoice_date: date
    due_date: date | None
    currency: str
    subtotal: Decimal
    tax_amount: Decimal
    total_amount: Decimal
    po_token: str | None                  # masked PO reference
    line_items: list[LineItem]
    confidence: float = Field(ge=0, le=1) # field-level confidence
    confidence_breakdown: dict[str, float]

class MatchResult(BaseModel):
    match_type: Literal["2-way", "3-way", "no-match"]
    overall_confidence: float
    vendor_match: bool
    value_variance_pct: Decimal
    qty_variance_pct: Decimal | None
    grn_confirmed: bool | None           # None for 2-way
    tolerance_breaches: list[str]
    auto_approvable: bool

class AuditResult(BaseModel):
    is_duplicate: bool
    duplicate_invoice_id: str | None
    fraud_score: float = Field(ge=0, le=1)
    fraud_flags: list[str]
    gst_valid: bool
    gst_issues: list[str]
    policy_violations: list[str]
    overall_clear: bool

6.3 PostgreSQL Schema — Entity Relationship

[Diagram]

7. FastAPI Service Layer

7.1 API Endpoint Map

[Diagram]

7.2 Request/Response Flow (Invoice Submission)

[Diagram]

8. Infrastructure & Docker Architecture

8.1 Container Network Topology

[Diagram]

8.2 Directory Structure

innogen-ap/
├── services/
│   ├── api/
│   │   ├── Dockerfile
│   │   ├── main.py
│   │   ├── routers/
│   │   │   ├── invoices.py
│   │   │   ├── exceptions.py
│   │   │   ├── analytics.py
│   │   │   └── health.py
│   │   ├── dependencies.py        # Auth, DB pool
│   │   └── middleware/
│   ├── agent_worker/
│   │   ├── Dockerfile
│   │   ├── graph.py               # LangGraph StateGraph definition
│   │   ├── agents/
│   │   │   ├── intake.py
│   │   │   ├── pii_masking.py
│   │   │   ├── extraction.py
│   │   │   ├── validation.py
│   │   │   ├── matching.py
│   │   │   ├── audit.py
│   │   │   ├── cost_allocation.py
│   │   │   ├── accounting.py
│   │   │   ├── exception.py
│   │   │   └── erp_integration.py
│   │   ├── models/
│   │   │   ├── state.py           # InvoiceState TypedDict
│   │   │   └── invoice.py         # Pydantic models
│   │   └── tools/                 # Agent tool implementations
│   ├── pii_service/
│   │   ├── Dockerfile
│   │   ├── main.py
│   │   ├── masking_agent.py       # spaCy NER + rules
│   │   └── vault.py               # AES-256 vault operations
│   └── ocr_service/
│       ├── Dockerfile
│       └── main.py
├── shared/
│   ├── models/                    # Shared Pydantic schemas
│   └── config/                    # Settings (pydantic-settings)
├── alembic/
│   └── versions/                  # DB migrations
├── nginx/
│   └── nginx.conf
├── prometheus/
│   └── prometheus.yml
├── docker-compose.yml
├── docker-compose.prod.yml
└── tests/
    ├── unit/
    ├── integration/
    └── e2e/

8.3 Docker Compose (Core Services)

# docker-compose.yml
version: "3.9"

services:
  nginx:
    image: nginx:alpine
    ports: ["80:80", "443:443"]
    volumes: ["./nginx/nginx.conf:/etc/nginx/nginx.conf:ro"]
    networks: [ap_public]
    depends_on: [api]

  api:
    build: ./services/api
    env_file: .env
    networks: [ap_public, ap_internal]
    depends_on: [postgres, redis, pii-service]
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8000/api/v1/health"]
      interval: 30s

  agent-worker:
    build: ./services/agent_worker
    env_file: .env
    networks: [ap_internal]
    depends_on: [postgres, redis, pii-service, ocr-service]
    deploy:
      replicas: 2

  pii-service:
    build: ./services/pii_service
    env_file: .env
    networks: [ap_internal]  # NOT exposed externally
    depends_on: [postgres]

  ocr-service:
    build: ./services/ocr_service
    networks: [ap_internal]

  postgres:
    image: postgres:16-alpine
    env_file: .env
    volumes: ["pgdata:/var/lib/postgresql/data"]
    networks: [ap_internal]

  redis:
    image: redis:7-alpine
    networks: [ap_internal]

  prometheus:
    image: prom/prometheus
    volumes: ["./prometheus/prometheus.yml:/etc/prometheus/prometheus.yml"]
    networks: [ap_internal]

  grafana:
    image: grafana/grafana
    networks: [ap_internal]
    ports: ["3000:3000"]  # internal access only via VPN

volumes:
  pgdata:

networks:
  ap_public:
  ap_internal:
    internal: true  # No external egress

9. Key Sequence Flows

9.1 Happy Path — Invoice to ERP (STP)

[Diagram]

9.2 Exception Path — Match Failure with Human Override

[Diagram]

9.3 Duplicate Invoice Detection

[Diagram]

10. Technology Stack

Layer Technology Version Rationale
Agent Orchestration LangGraph 0.2.x Stateful, resumable graph; PostgreSQL checkpointing; conditional routing; HITL support
Agent Definition Pydantic AI 0.0.x Type-safe agent I/O; structured tool calling; multi-LLM; automatic retry/validation
LLM Backend Anthropic Claude claude-sonnet-4-6 Primary extraction + reasoning; swappable via Pydantic AI model string
API Framework FastAPI + Uvicorn 0.111+ Async-native; auto OpenAPI; Pydantic models; WebSocket native
Database PostgreSQL 16 ACID; LangGraph checkpointer; JSONB; row-level security for PII schema
Async DB Driver asyncpg + SQLAlchemy 2.x High-throughput async; connection pooling
Task Queue Celery 5.x Async agent dispatch; retry policies; priority queues
Message Broker Redis 7 Celery broker + rate limit token bucket + WebSocket pub/sub
OCR Tesseract / Azure Form Recognizer Tesseract OSS for standard; Form Recognizer for complex layouts
Schema Migrations Alembic 1.x Version-controlled migrations; CI/CD integrated
PII NER spaCy 3.x Entity recognition for vendor names, addresses, tax IDs, bank details
Containerisation Docker + Compose v2 Full-stack parity; all services containerised
Observability Prometheus + Grafana Agent latency, queue depth, STP rate; custom AP KPI dashboards
Structured Logging structlog JSON logs; correlated by invoice_id and trace_id
Distributed Tracing OpenTelemetry Agent-to-agent trace propagation
API Auth OAuth2 + JWT RS256 SSO-compatible; RBAC at FastAPI dependency layer
Secret Management Docker Secrets → HashiCorp Vault No secrets in env vars in production

11. Matching Engine Detail

11.1 Match Type Decision

[Diagram]

11.2 Confidence Score Weights

Match Component Default Weight Scoring Method Exception Trigger
Vendor identity 25% Exact token match < 1.0 → hard block
Total value match 30% % deviation from PO > 2% → flag
Line quantity match 25% % deviation from PO/GRN > 5% → flag
GRN confirmation 15% Boolean + date check Missing on 3-way → flag
Currency & terms 5% Exact match Mismatch → advisory

11.3 Tolerance Configuration

Tolerance thresholds are stored in ap_core.match_config and are operator-configurable per vendor category, without code changes:

CREATE TABLE ap_core.match_config (
    config_id       UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    vendor_category VARCHAR(100),   -- NULL = default for all
    value_tolerance DECIMAL(5,2),   -- percentage, e.g. 2.00
    qty_tolerance   DECIMAL(5,2),   -- percentage, e.g. 5.00
    auto_approve_threshold DECIMAL(4,3), -- e.g. 0.850
    effective_from  DATE NOT NULL,
    effective_to    DATE
);

12. Non-Functional Requirements

12.1 Security Architecture

[Diagram]

13. Deployment Environments

Config Development Staging Production
Compose file docker-compose.yml docker-compose.staging.yml docker-compose.prod.yml
LLM model claude-haiku-4-5 claude-sonnet-4-6 claude-sonnet-4-6 / claude-opus-4-6
OCR Gemini/Mistral/Tesseract only Gemini/Mistral/Tesseract Gemini/Mistral/Tesseract
PostgreSQL Single node Single + WAL archive Primary + read replica + WAL
Celery replicas 1 2 4+ (autoscaled)
Flower Enabled Enabled Disabled (Grafana replaces)
Secret store .env file .env (sops encrypted) HashiCorp Vault / AWS Secrets Manager
PII vault Local Postgres Postgres + key rotation Postgres + KMS (AWS/Azure)
Log level DEBUG INFO WARNING + structured
OpenTelemetry Disabled Enabled (Jaeger) Enabled (vendor OTEL)

14. Success Metrics & KPIs(To be Updated)

KPI Baseline Target Stretch Measurement Method
STP Rate > 60% > 75% auto_posted / total_invoices
Extraction Accuracy > 95% > 98% Correct fields / sampled fields
First-Pass Match Rate > 85% > 95% Auto-matched / total matchable
Duplicate Detection > 90% > 99% Duplicates caught / true duplicates
Average STP Time < 60 seconds < 30 seconds intake_at to erp_confirmation_at
Exception Rate < 40% < 25% Exception invoices / total
Manual Effort Reduction > 70% > 80% FTE hours vs baseline
API Availability > 99.5% > 99.9% Prometheus uptime probe

Appendix A — Glossary

Term Definition
STP Straight-Through Processing — invoice processed end-to-end without human intervention
2-way Match Invoice matched against Purchase Order only
3-way Match Invoice matched against Purchase Order and Goods Receipt Note
GRN Goods Receipt Note — confirmation that goods have been physically received
PO Purchase Order — authorised order issued by the buying organisation
PII Personally Identifiable Information — vendor names, addresses, tax IDs, bank details
LangGraph LangChain graph orchestration framework for stateful multi-step AI agent workflows
Pydantic AI Type-safe AI agent framework built on Pydantic for structured LLM interactions
RBAC Role-Based Access Control — permissions assigned to roles, not individuals
WAL Write-Ahead Log — PostgreSQL mechanism for crash recovery and replication
NER Named Entity Recognition — ML technique for identifying named entities in text
HITL Human-in-the-Loop — human intervention gate within automated workflow
ERP Enterprise Resource Planning — SAP, Oracle Fusion, Microsoft Dynamics, etc.
KMS Key Management Service — cloud-managed cryptographic key service

Appendix B — Document History

Version Date Change Author
1.0 June 2026 Initial architecture document — full coverage including PII masking layer InnoWave360 Consulting

InnoGen AP Solution Architecture Document v1.0 — InnoWave360 Consulting — CONFIDENTIAL