ElaRide — Implementation Plan

1. Toolchain for AI-Driven Development

Before picking tools, here's the core problem: Claude Code loses context across sessions in a large monorepo. Every tool below targets this in a different way. Most overlap — pick one per concern.

Use these:

Headroom (headroom-ai) is the most immediately practical tool for this project. It wraps Claude Code as a local proxy and compresses tool outputs, file reads, and logs by 60–95% before they reach the model. On a large monorepo like ElaRide (5 apps, 12 NestJS modules, shared packages), context will blow up constantly. Run it as headroom wrap claude. Its headroom learn command mines failed sessions and writes corrections back to CLAUDE.md automatically. Use it from day one.

CLAUDE.md (native Claude Code) is the highest-leverage thing after Headroom. Every app and the API service gets its own CLAUDE.md. These are loaded automatically at every session. Keep them under 150 lines — tighter is better. The three scopes (project, local, user) let you layer project-wide rules, machine-specific overrides, and personal preferences without conflict.

Claude Code subagents are built into Claude Code and should be used for domain-isolated tasks. When implementing a NestJS module, spawn it as a subagent with a clean context so it doesn't drag in unrelated history.

Skip or defer:

context-mode, claude-mem, gsd-core — they overlap heavily with CLAUDE.md + Headroom. Add complexity without clear advantage at this stage.
HiveMind (DeepLake) — vector-DB project memory is powerful but adds infra complexity. Revisit if the codebase grows past ~10 domains.

Summary: Headroom + CLAUDE.md per app + subagents for domains = the right stack. Simple, proven, and zero extra services.

2. Provider Abstraction Strategy

Yes — swapping providers without breaking the system is absolutely achievable, and NestJS's DI system makes it clean. The pattern is consistent across SMS, email, file storage, maps, and payments.

The approach: Define an interface (contract) per concern. Each provider implements that interface. NestJS resolves the correct implementation from an environment variable at startup. Business logic only ever talks to the interface, never the provider SDK directly.

ISmsProvider     →  SevenProvider | PlivoProvider | MockProvider (test)
IEmailProvider   →  BrevoProvider | MockProvider (test)
IStorageProvider →  R2Provider | S3Provider | LocalProvider (dev)
IMapProvider     →  GoogleMapsProvider | MockProvider (test)
IPaymentProvider →  StripeProvider | MockProvider (test)

Each interface lives in packages/shared-types. Each concrete implementation is a NestJS provider registered conditionally. A factory module reads SMS_PROVIDER, EMAIL_PROVIDER, STORAGE_PROVIDER from env and registers the right class.

In practice: Swapping from Brevo to another email service is changing one env var and writing one adapter class that implements IEmailProvider. No business logic changes. This also means every external call is mockable in tests with a MockProvider that returns fixtures, no real API calls needed.

For local development: LocalStorageProvider writes to disk, MockSmsProvider logs to console, MockEmailProvider saves to a local file. No external accounts needed to run the full stack locally.

3. Project Foundation — What to Build First

Before any domain work, these foundations must exist. No domain depends on another here — they can be built in parallel across the team.

Monorepo scaffold with Turborepo + pnpm. The full app structure (apps, services, packages) is set up with empty stubs and working pnpm dev across all surfaces. Turborepo remote cache configured. Every app must build before domain work starts.

shared-types package is the source of truth for all TypeScript types, Zod schemas, and enums. This is built first because every other package depends on it. All enums from the database design document live here.

api-client package — TanStack Query hooks wrapping the Axios instance. Every frontend calls the backend through this, never with raw fetch.

ui package — shadcn/ui primitives, TailwindCSS config, shared between the two Next.js apps only. Expo apps use React Native components; only logic and types cross over.

Database bootstrap — Prisma schema, migrations, and seed data. Neon branches for dev, staging, main. Prisma client output goes to packages/shared-types.

Provider abstraction layer — all interfaces and factory modules built upfront, with mock implementations for every provider. Real provider implementations can come later.

Auth foundation — JWT, refresh tokens, Redis session store, TOTP scaffold. This unlocks everything else.

Docker Compose for local dev — PostgreSQL, Redis, and the NestJS API all run locally with a single command. No cloud dependencies for development.

4. AI-Driven Development Workflow

The workflow that works for large agentic projects is: Spec → Plan → Implement → Verify, with Claude Code driving the implement phase under human oversight.

Per domain, the cycle is:

Spec session — Start a fresh Claude Code session. Paste the relevant SRS sections and ask Claude to interview you with AskUserQuestion until edge cases are covered. Output: a SPEC.md in the domain folder. This is the contract.
Plan session — New session, attach the spec. Use /plan or plan mode (Opus handles this). Output: task breakdown with file-level changes listed. Review and approve before proceeding.
Implement session — New session with clean context. Claude Code executes the plan. Use subagents for isolated modules. Headroom runs throughout.
Verify — Run the test suite, check types, run lint. Commit. If something regresses, headroom learn captures the failure.

CLAUDE.md structure per service/app:

Each app's CLAUDE.md covers: build and test commands, module structure, naming conventions, import rules, anything the linter doesn't enforce, and cross-references to domain specs. It does not contain code samples or prose documentation — pointers to files only.

Session hygiene: Use /compact before sessions get too large. Use /btw for quick questions that shouldn't pollute history. Use subagents for research tasks ("investigate how Stripe handles partial capture") to keep the implementation context clean.

5. Domain Development Plan

Domains are ordered by: independence (no dependency on other domains), user-facing value first, then operator tooling. Each domain targets a deployable slice.

Phase 0 — Foundation

Prerequisite for everything. No domain work starts without this.

Monorepo scaffold · shared-types · api-client · ui package · Prisma schema + migrations + seed · Docker Compose local stack · Provider interfaces + mock implementations · Auth scaffold (JWT + Redis + TOTP) · CLAUDE.md per app · CI pipeline (local via act)

Phase 1 — Identity & Access Management

First consumer-facing domain. Independent of all others.

Guardian registration + phone OTP · Dependent profile CRUD (up to 3) · Saved places per dependent (encrypted) · allow_login toggle + age check · Dependent login + session · Trusted Circle management · Password reset + phone change · TOTP for ops roles · RBAC (Roles guard + CASL + RLS on sensitive tables) · Driver onboarding state machine + document upload

Frontend: Registration flows (rider web + rider mobile) · Profile management screens · Dependent profile screens

Phase 2 — Ride Booking

Depends on IAM. Consumer-facing booking flows.

ElaRide+ booking (24h+ standard, 12–24h short-notice flag) · ElaAbo subscription configuration (days, times, add-ons) · Recurring ride generation · Fare estimate calculation · Minor Safety Layer auto-apply (Guardian Mode + PIN + Safe Handover for under-18) · Ops booking on behalf of guardian · Booking validation rules · Stripe PaymentIntent authorisation at booking

Frontend: Full booking forms on rider web + mobile · Fare breakdown display · Subscription config screens

Phase 3 — Billing & Payments

Depends on IAM + Booking. Can be developed partially in parallel with Phase 2.

Stripe customer creation · ElaAbo subscription via Stripe Subscriptions · ElaRide+ via PaymentIntent (manual capture) · Apple Pay + Google Pay via Stripe Payment Element · Stripe webhook handler · Subscription period tracking + overage calculation · Payment failure + grace period · Cancellation fee calculation + capture · Brevo receipt + invoice emails (or mock email provider for now) · Dependent restricted payment view · Immutable ride pricing records

Phase 4 — Notifications

Depends on IAM. Can run in parallel with Phase 2/3.

Expo Push Notification Service integration · SMS provider integration (behind ISmsProvider — use mock until provider is selected) · Brevo email integration (behind IEmailProvider) · Full notification trigger matrix (guardian, dependent, driver, ops) · SOS notification path (push + SMS in parallel, no single point of failure) · Notification preference settings · GDPR-compliant preview text (no minor names or full addresses)

Phase 5 — Ride Lifecycle & State Machine

Depends on Booking + IAM.

Full state machine implementation (all transitions, all actors) · Immutable ride event log · Ops approval flow for ElaRide+ · Driver offer + 5-minute expiry (BullMQ) · Re-dispatch protocol when driver cancels · Pickup PIN validation (bcrypt) · PIN forgotten flow (guardian remote approval, identity verify, support unlock) · Safe Handover checkout flow · arrived_at / started_at / completed_at denormalized timestamps · Stripe capture on completion · Cancel + refund flows

Frontend: Ride status screens · PIN entry screen (driver app) · Guardian "Journey started / Safe Arrival" notifications

Phase 6 — Live Tracking & Guardian Mode

Depends on Ride Lifecycle.

Adaptive GPS transmission from driver app (4s / 5s / 10s by state) · JWT + assignment validation on every location POST · Redis ride:{id}:loc cache · Async batch insert to ride_live_locations · Socket.io room broadcast to guardian + dependent + ops · Live map on guardian app · Guardian Mode (non-disableable for dependents) · Live location purge on terminal state · BullMQ ride.cleanup worker · ride_live_locations range partitioning by month

Phase 7 — Safety Systems

Depends on Live Tracking + Ride Lifecycle.

SOS trigger (guardian + dependent only, during active ride) · Parallel dispatch: ops Socket.io alert + guardian push + Trusted Circle SMS · 3-minute unacknowledged escalation (SMS + email to ops lead) · Route deviation detection worker (4 levels, BullMQ) · Incident CRUD + status workflow (open → in_review → resolved | escalated) · Incident escalation → ops lead email · SF-004 deviation thresholds + guardian notifications at Level 3+

Frontend: SOS button (always visible during active ride) · 112 call button · Incident report form

Phase 8 — Add-ons (ElaExtras)

Depends on Ride Lifecycle + Billing.

wait_plus, begleitung_plus, return_plus selection per booking · Per-subscription add-on flags for ElaAbo · Add-on usage recording (addon_usages table) · Billing engine: Safe Handover 5-min tolerance, Wait+ 15-min block billing, Begleitung+ 10-min flat then Wait+ rate, Return+ as linked leg · Integration with ride pricing record on completion

Phase 9 — Driver Operations & Shift Management

Depends on IAM. Can run partially in parallel with Phase 5+.

Driver availability blocks + recurring weekly patterns · Shift check-in/check-out + vehicle + health checks · ArbZG compliance checks before every offer (10h shift limit, 11h rest, break requirements) · is_in_dispatch_pool flag management · Führungszeugnis 30-day renewal reminder (BullMQ cron) · Document expiry → action_required status · Data retention for driver documents per policy

Phase 10 — Dispatch & Route Optimization

Depends on Ride Lifecycle + Driver Operations.

Dispatcher-triggered optimization flow · Vroom integration (Docker, internal only) · route_plans proposal → approval flow · Driver offer lifecycle post-approval · Manual assignment fallback (graceful degradation when Vroom is unavailable) · Driver schedule board data APIs · Ops Reservation Board APIs · Ops Live Ride Board (WebSocket)

Phase 11 — Operations Dashboard

Depends on all backend domains.

Role-based dashboard views (admin / dispatcher / support) · Reservation Board · Live Ride Board · Driver Schedule Board · Safety & Incidents Board · Driver Management (onboarding pipeline, document review) · System Configuration (prices, timeouts, thresholds from system_config table) · Booking on behalf of guardian

This is the last surface built because it depends on all backend APIs being stable.

6. Testing Strategy

Unit tests — Service layer logic in isolation. Every NestJS service tested with mocked repositories and mocked provider implementations. Test the state machine transitions, fee calculation logic, ArbZG checks, PIN validation, and deviation level logic here. Jest for all TypeScript. Target: every business rule has a unit test.

Integration tests — API endpoint tests against a real test database (Neon branch or local Docker Postgres). These test the full request pipeline: route guard → CASL check → service → database. No mock providers — use test implementations that hit local infrastructure. Supertest for NestJS. Critical paths: booking creation, ride state transitions, PIN flow, SOS trigger, Stripe webhook handling.

E2E tests — Full user flows across the frontend and backend. Playwright for Next.js web apps. Detox for Expo apps (deferred until apps are stable — add in Phase 6 onwards). These are slow; run them on the CI gate only, not on every save.

Contract tests — The shared-types package is the contract between frontend and backend. TypeScript strict mode + Zod validation enforces this at compile time and runtime. Any breaking change to a shared type fails the build in all consumers simultaneously.

TDD posture — Apply TDD on business-critical logic (state machine, fee calculation, ArbZG checks, route deviation thresholds). For UI and plumbing, write tests after. Don't make TDD a religion — apply it where it saves debugging time.

Provider mocks in test — Every external provider (ISmsProvider, IEmailProvider, etc.) has a MockProvider that records calls and returns fixtures. Integration tests never hit real APIs.

7. Local CI/CD Setup

All CI/CD runs locally using act (nektos/act) — the GitHub Actions runner for Docker. Write real GitHub Actions YAML in .github/workflows/. Test them with act before pushing. No cloud runner dependency until deployment is decided.

Pipeline jobs per PR:

lint → typecheck → unit-tests → integration-tests → build (Turborepo cached)

Turborepo only rebuilds affected packages, so a change to NotificationModule doesn't rebuild the rider mobile app.

Local environment matrix:

local runs against Docker Compose (Postgres + Redis). ci inside act runs against Docker-in-Docker services defined in the workflow services: block. The same .env.test drives both. No environment-specific code paths.

Secret management for local: .env.local (gitignored) for real API keys. .env.test (committed, no secrets) for test suites using mock providers. act reads secrets from .secrets file (gitignored).

Pre-commit hooks via Husky: lint-staged runs ESLint + Prettier on changed files. tsc --noEmit on the affected packages. Prevents broken code reaching the pipeline.

When deployment is decided, the act-tested workflow files are already correct — they just need the deployment step added. The pipeline itself requires no rewrite.

8. Environment Configuration

Three environments, all runnable locally:

local — Docker Compose runs Postgres + Redis. All external providers use mock implementations. pnpm dev across all apps. No Neon, no Upstash, no Stripe live keys.

staging — When deployment is decided, this maps to a Neon branch + staging service. Until then, staging is just the local environment run with NODE_ENV=staging and the real (test-mode) Stripe keys, pointing to mock SMS/email providers.

production — Finalised after deployment platform decision.

All configuration lives in a single ConfigModule per NestJS app. The module reads from environment variables. The ACTIVE_SMS_PROVIDER, ACTIVE_EMAIL_PROVIDER, ACTIVE_STORAGE_PROVIDER variables drive the factory modules. Changing a provider is one env var change — no code change, no deployment of new logic.