InnoGen AP — Solution Architecture Document
Version 1.0.0 | June 2026 | InnoWave360 Consulting
Table of Contents
- Executive Summary
- System Overview & Design Principles
- End-to-End Architecture Flowchart
- PII Masking Layer
- Agent Architecture
- Data Models & State Schema
- FastAPI Service Layer
- Infrastructure & Docker Architecture
- Key Sequence Flows
- Technology Stack
- Matching Engine Detail
- Non-Functional Requirements
- Deployment Environments
- Success Metrics & KPIs
1. Executive Summary
InnoGen AP is an AI-native, agentic Accounts Payable platform. It automates the full invoice-to-pay lifecycle by orchestrating a network of specialised AI agents — each responsible for a discrete processing step — from inbox monitoring and OCR extraction through PO/GRN matching, compliance checks, exception management, cost allocation, accounting entry generation, and ERP posting.
The platform is built on LangGraph for stateful agent orchestration, Pydantic AI for type-safe agent definitions and tool calling, PostgreSQL for transactional persistence, and FastAPI as the API service layer. All components are containerised with Docker Compose. A PII masking layer is embedded early in the ingestion pipeline is a first-class architectural component.
2. System Overview & Design Principles
InnoGen AP operates as a headless intelligence engine. Business-facing UI (AP portals, dashboards, approval workflows) are consumers of the FastAPI service layer and are out of scope for this document.
Design Principles
| Principle |
Description |
| Agent Isolation |
Each agent is stateless between invocations. Communication flows only through the LangGraph state graph and typed Pydantic models. |
| PII-First Design |
All inbound documents pass through a PII Masking Agent before any data leaves the secure ingestion boundary. |
| Confidence-Gated Routing |
Every agent emits a confidence score. Scores below thresholds gate to human-in-the-loop queues. |
| Idempotent Processing |
All processing steps are idempotent — enforced via invoice hash deduplication. |
| Audit-Native |
Every agent decision, tool call, score, and state transition is persisted to an immutable audit log. |
| ERP-Agnostic |
The ERP adapter layer abstracts SAP / Oracle / Dynamics behind a canonical document schema. |
| Dockerised Everything |
All services run as Docker containers orchestrated via Docker Compose. |
3. End-to-End Architecture Flowchart
[Diagram]
4. PII Masking Layer
The PII Masking Layer is interposed between Intake & Ingestion and Document Processing. It ensures personally identifiable and commercially sensitive data is never transmitted to AI inference services in raw form.
4.1 Entity Masking Map
| Entity Type |
Examples |
Masking Strategy |
| Vendor PII |
Vendor name, address, contact name |
Token substitution (VENDOR_001) |
| Financial IDs |
Bank account, IBAN, SWIFT |
Deterministic hash token |
| Tax Identifiers |
PAN, TAN, ABN, VAT, GST registration |
Category token + vault reference |
| Invoice Numbers |
Vendor invoice numbers |
Pseudonymised (INV_TOKEN_xxx) |
| PO Numbers |
Purchase order identifiers |
Pseudonymised (PO_TOKEN_xxx) |
| Email Addresses |
AP contact, vendor contact |
CONTACT_EMAIL_n |
| Phone Numbers |
Phone / fax patterns |
PHONE_n |
4.2 PII Masking Sequence
[Diagram]
4.3 De-masking on Output
[Diagram]
5. Agent Architecture
5.1 LangGraph StateGraph
[Diagram]
5.2 Pydantic AI Agent Pattern
from pydantic_ai import Agent
from pydantic_ai.models.anthropic import AnthropicModel
from models.invoice import ExtractionOutput, InvoiceContext
extraction_agent = Agent(
model=AnthropicModel("claude-sonnet-4-6"),
result_type=ExtractionOutput,
system_prompt=EXTRACTION_SYSTEM_PROMPT,
retries=3,
)
@extraction_agent.tool
async def fetch_po_reference(ctx: RunContext[InvoiceContext], po_number: str) -> POData:
"""Fetch PO details from database for cross-reference during extraction."""
return await ctx.deps.db.get_po(po_number)
@extraction_agent.tool
async def lookup_vendor_master(ctx: RunContext[InvoiceContext], vendor_token: str) -> VendorData:
"""Look up vendor master record by masked token."""
return await ctx.deps.db.get_vendor_by_token(vendor_token)
5.3 Agent Catalogue
| Agent |
LangGraph Node |
Key Tools |
Outputs |
| Intake Agent |
intake |
email_reader, file_classifier |
RawDocument, source_meta |
| PII Masking Agent |
pii_mask |
ner_engine, vault_write |
MaskedDocument, pii_token_map |
| Extraction Agent |
extract |
fetch_po_reference, lookup_vendor |
ExtractionOutput + confidence |
| Validation Agent |
validate |
tax_lookup, field_rules_engine |
ValidationResult per field |
| Matching Agent |
match |
fetch_po, fetch_grn, tolerance_check |
MatchResult (2-way/3-way) |
| Audit Agent |
audit |
duplicate_check, fraud_score, gst_validate |
AuditResult |
| Cost Allocation Agent |
cost_alloc |
gl_rules_lookup, history_similarity |
CostAllocationResult |
| Accounting Agent |
accounting |
journal_template, accrual_engine |
JournalEntry (ERP-ready) |
| Exception Agent |
exception |
queue_assign, sla_compute, notify |
ExceptionRecord |
| ERP Integration Agent |
erp_post |
erp_adapter, pii_unmask |
ERPPostingResult |
| Insights Agent |
insights |
kpi_emit, metrics_publish |
KPI events |
6. Data Models & State Schema
6.1 InvoiceState (LangGraph)
# models/state.py
from typing import TypedDict, Annotated
import operator
class InvoiceState(TypedDict):
invoice_id: str
raw_document: RawDocument
masked_document: MaskedDocument | None
pii_token_map: dict[str, str] | None
extraction: ExtractionOutput | None
validation: ValidationResult | None
match_result: MatchResult | None
audit_result: AuditResult | None
cost_allocation: CostAllocationResult | None
journal_entry: JournalEntry | None
exception_records: Annotated[list[ExceptionRecord], operator.add]
erp_result: ERPPostingResult | None
audit_trail: Annotated[list[AuditEvent], operator.add]
current_step: ProcessingStep
human_approved: bool
override_reason: str | None
6.2 Core Pydantic Models
# models/invoice.py
class ExtractionOutput(BaseModel):
vendor_token: str # masked vendor reference
invoice_number_token: str # masked invoice number
invoice_date: date
due_date: date | None
currency: str
subtotal: Decimal
tax_amount: Decimal
total_amount: Decimal
po_token: str | None # masked PO reference
line_items: list[LineItem]
confidence: float = Field(ge=0, le=1) # field-level confidence
confidence_breakdown: dict[str, float]
class MatchResult(BaseModel):
match_type: Literal["2-way", "3-way", "no-match"]
overall_confidence: float
vendor_match: bool
value_variance_pct: Decimal
qty_variance_pct: Decimal | None
grn_confirmed: bool | None # None for 2-way
tolerance_breaches: list[str]
auto_approvable: bool
class AuditResult(BaseModel):
is_duplicate: bool
duplicate_invoice_id: str | None
fraud_score: float = Field(ge=0, le=1)
fraud_flags: list[str]
gst_valid: bool
gst_issues: list[str]
policy_violations: list[str]
overall_clear: bool
6.3 PostgreSQL Schema — Entity Relationship
[Diagram]
7. FastAPI Service Layer
7.1 API Endpoint Map
[Diagram]
7.2 Request/Response Flow (Invoice Submission)
[Diagram]
8. Infrastructure & Docker Architecture
8.1 Container Network Topology
[Diagram]
8.2 Directory Structure
innogen-ap/
├── services/
│ ├── api/
│ │ ├── Dockerfile
│ │ ├── main.py
│ │ ├── routers/
│ │ │ ├── invoices.py
│ │ │ ├── exceptions.py
│ │ │ ├── analytics.py
│ │ │ └── health.py
│ │ ├── dependencies.py # Auth, DB pool
│ │ └── middleware/
│ ├── agent_worker/
│ │ ├── Dockerfile
│ │ ├── graph.py # LangGraph StateGraph definition
│ │ ├── agents/
│ │ │ ├── intake.py
│ │ │ ├── pii_masking.py
│ │ │ ├── extraction.py
│ │ │ ├── validation.py
│ │ │ ├── matching.py
│ │ │ ├── audit.py
│ │ │ ├── cost_allocation.py
│ │ │ ├── accounting.py
│ │ │ ├── exception.py
│ │ │ └── erp_integration.py
│ │ ├── models/
│ │ │ ├── state.py # InvoiceState TypedDict
│ │ │ └── invoice.py # Pydantic models
│ │ └── tools/ # Agent tool implementations
│ ├── pii_service/
│ │ ├── Dockerfile
│ │ ├── main.py
│ │ ├── masking_agent.py # spaCy NER + rules
│ │ └── vault.py # AES-256 vault operations
│ └── ocr_service/
│ ├── Dockerfile
│ └── main.py
├── shared/
│ ├── models/ # Shared Pydantic schemas
│ └── config/ # Settings (pydantic-settings)
├── alembic/
│ └── versions/ # DB migrations
├── nginx/
│ └── nginx.conf
├── prometheus/
│ └── prometheus.yml
├── docker-compose.yml
├── docker-compose.prod.yml
└── tests/
├── unit/
├── integration/
└── e2e/
8.3 Docker Compose (Core Services)
# docker-compose.yml
version: "3.9"
services:
nginx:
image: nginx:alpine
ports: ["80:80", "443:443"]
volumes: ["./nginx/nginx.conf:/etc/nginx/nginx.conf:ro"]
networks: [ap_public]
depends_on: [api]
api:
build: ./services/api
env_file: .env
networks: [ap_public, ap_internal]
depends_on: [postgres, redis, pii-service]
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8000/api/v1/health"]
interval: 30s
agent-worker:
build: ./services/agent_worker
env_file: .env
networks: [ap_internal]
depends_on: [postgres, redis, pii-service, ocr-service]
deploy:
replicas: 2
pii-service:
build: ./services/pii_service
env_file: .env
networks: [ap_internal] # NOT exposed externally
depends_on: [postgres]
ocr-service:
build: ./services/ocr_service
networks: [ap_internal]
postgres:
image: postgres:16-alpine
env_file: .env
volumes: ["pgdata:/var/lib/postgresql/data"]
networks: [ap_internal]
redis:
image: redis:7-alpine
networks: [ap_internal]
prometheus:
image: prom/prometheus
volumes: ["./prometheus/prometheus.yml:/etc/prometheus/prometheus.yml"]
networks: [ap_internal]
grafana:
image: grafana/grafana
networks: [ap_internal]
ports: ["3000:3000"] # internal access only via VPN
volumes:
pgdata:
networks:
ap_public:
ap_internal:
internal: true # No external egress
9. Key Sequence Flows
9.1 Happy Path — Invoice to ERP (STP)
[Diagram]
9.2 Exception Path — Match Failure with Human Override
[Diagram]
9.3 Duplicate Invoice Detection
[Diagram]
10. Technology Stack
| Layer |
Technology |
Version |
Rationale |
| Agent Orchestration |
LangGraph |
0.2.x |
Stateful, resumable graph; PostgreSQL checkpointing; conditional routing; HITL support |
| Agent Definition |
Pydantic AI |
0.0.x |
Type-safe agent I/O; structured tool calling; multi-LLM; automatic retry/validation |
| LLM Backend |
Anthropic Claude |
claude-sonnet-4-6 |
Primary extraction + reasoning; swappable via Pydantic AI model string |
| API Framework |
FastAPI + Uvicorn |
0.111+ |
Async-native; auto OpenAPI; Pydantic models; WebSocket native |
| Database |
PostgreSQL |
16 |
ACID; LangGraph checkpointer; JSONB; row-level security for PII schema |
| Async DB Driver |
asyncpg + SQLAlchemy |
2.x |
High-throughput async; connection pooling |
| Task Queue |
Celery |
5.x |
Async agent dispatch; retry policies; priority queues |
| Message Broker |
Redis |
7 |
Celery broker + rate limit token bucket + WebSocket pub/sub |
| OCR |
Tesseract / Azure Form Recognizer |
— |
Tesseract OSS for standard; Form Recognizer for complex layouts |
| Schema Migrations |
Alembic |
1.x |
Version-controlled migrations; CI/CD integrated |
| PII NER |
spaCy |
3.x |
Entity recognition for vendor names, addresses, tax IDs, bank details |
| Containerisation |
Docker + Compose |
v2 |
Full-stack parity; all services containerised |
| Observability |
Prometheus + Grafana |
— |
Agent latency, queue depth, STP rate; custom AP KPI dashboards |
| Structured Logging |
structlog |
— |
JSON logs; correlated by invoice_id and trace_id |
| Distributed Tracing |
OpenTelemetry |
— |
Agent-to-agent trace propagation |
| API Auth |
OAuth2 + JWT RS256 |
— |
SSO-compatible; RBAC at FastAPI dependency layer |
| Secret Management |
Docker Secrets → HashiCorp Vault |
— |
No secrets in env vars in production |
11. Matching Engine Detail
11.1 Match Type Decision
[Diagram]
11.2 Confidence Score Weights
| Match Component |
Default Weight |
Scoring Method |
Exception Trigger |
| Vendor identity |
25% |
Exact token match |
< 1.0 → hard block |
| Total value match |
30% |
% deviation from PO |
> 2% → flag |
| Line quantity match |
25% |
% deviation from PO/GRN |
> 5% → flag |
| GRN confirmation |
15% |
Boolean + date check |
Missing on 3-way → flag |
| Currency & terms |
5% |
Exact match |
Mismatch → advisory |
11.3 Tolerance Configuration
Tolerance thresholds are stored in ap_core.match_config and are operator-configurable per vendor category, without code changes:
CREATE TABLE ap_core.match_config (
config_id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
vendor_category VARCHAR(100), -- NULL = default for all
value_tolerance DECIMAL(5,2), -- percentage, e.g. 2.00
qty_tolerance DECIMAL(5,2), -- percentage, e.g. 5.00
auto_approve_threshold DECIMAL(4,3), -- e.g. 0.850
effective_from DATE NOT NULL,
effective_to DATE
);
12. Non-Functional Requirements
12.1 Security Architecture
[Diagram]
13. Deployment Environments
| Config |
Development |
Staging |
Production |
| Compose file |
docker-compose.yml |
docker-compose.staging.yml |
docker-compose.prod.yml |
| LLM model |
claude-haiku-4-5 |
claude-sonnet-4-6 |
claude-sonnet-4-6 / claude-opus-4-6 |
| OCR |
Gemini/Mistral/Tesseract only |
Gemini/Mistral/Tesseract |
Gemini/Mistral/Tesseract |
| PostgreSQL |
Single node |
Single + WAL archive |
Primary + read replica + WAL |
| Celery replicas |
1 |
2 |
4+ (autoscaled) |
| Flower |
Enabled |
Enabled |
Disabled (Grafana replaces) |
| Secret store |
.env file |
.env (sops encrypted) |
HashiCorp Vault / AWS Secrets Manager |
| PII vault |
Local Postgres |
Postgres + key rotation |
Postgres + KMS (AWS/Azure) |
| Log level |
DEBUG |
INFO |
WARNING + structured |
| OpenTelemetry |
Disabled |
Enabled (Jaeger) |
Enabled (vendor OTEL) |
14. Success Metrics & KPIs(To be Updated)
| KPI |
Baseline Target |
Stretch |
Measurement Method |
| STP Rate |
> 60% |
> 75% |
auto_posted / total_invoices |
| Extraction Accuracy |
> 95% |
> 98% |
Correct fields / sampled fields |
| First-Pass Match Rate |
> 85% |
> 95% |
Auto-matched / total matchable |
| Duplicate Detection |
> 90% |
> 99% |
Duplicates caught / true duplicates |
| Average STP Time |
< 60 seconds |
< 30 seconds |
intake_at to erp_confirmation_at |
| Exception Rate |
< 40% |
< 25% |
Exception invoices / total |
| Manual Effort Reduction |
> 70% |
> 80% |
FTE hours vs baseline |
| API Availability |
> 99.5% |
> 99.9% |
Prometheus uptime probe |
Appendix A — Glossary
| Term |
Definition |
| STP |
Straight-Through Processing — invoice processed end-to-end without human intervention |
| 2-way Match |
Invoice matched against Purchase Order only |
| 3-way Match |
Invoice matched against Purchase Order and Goods Receipt Note |
| GRN |
Goods Receipt Note — confirmation that goods have been physically received |
| PO |
Purchase Order — authorised order issued by the buying organisation |
| PII |
Personally Identifiable Information — vendor names, addresses, tax IDs, bank details |
| LangGraph |
LangChain graph orchestration framework for stateful multi-step AI agent workflows |
| Pydantic AI |
Type-safe AI agent framework built on Pydantic for structured LLM interactions |
| RBAC |
Role-Based Access Control — permissions assigned to roles, not individuals |
| WAL |
Write-Ahead Log — PostgreSQL mechanism for crash recovery and replication |
| NER |
Named Entity Recognition — ML technique for identifying named entities in text |
| HITL |
Human-in-the-Loop — human intervention gate within automated workflow |
| ERP |
Enterprise Resource Planning — SAP, Oracle Fusion, Microsoft Dynamics, etc. |
| KMS |
Key Management Service — cloud-managed cryptographic key service |
Appendix B — Document History
| Version |
Date |
Change |
Author |
| 1.0 |
June 2026 |
Initial architecture document — full coverage including PII masking layer |
InnoWave360 Consulting |
InnoGen AP Solution Architecture Document v1.0 — InnoWave360 Consulting — CONFIDENTIAL