ElaRide — System Design Document

Version: 1.1 | Date: June 2026 | Status: Internal

Architecture decisions, component design, infrastructure, and every critical system flow. Database schema is in ElaRide-DATABASE.md.

Changelog v1.1: Added Section 4.6 — Provider Abstraction Layer. Updated Section 4.1 module graph to include ProvidersModule. Added provider abstraction row to Section 7 technology stack.


Table of Contents

  1. Repository Architecture
  2. System Context
  3. Container Architecture
  4. Backend Design
  5. Security Architecture
  6. Real-time & Caching
  7. Observability
  8. System Flows

1. Repository Architecture

Single Turborepo monorepo with pnpm workspaces. All TypeScript types, API client hooks, and shared UI primitives live in packages/ — consumed by every app without duplication or drift.

[Diagram]

Key decisions:

Decision Rationale
Monorepo over polyrepo Single PR can update backend + client types atomically; Zod schemas in shared-types are the source of truth for both API validation and client forms
services/api/ distinct from apps/ Signals backend is a service, not a frontend app; avoids confusion in CI pipelines
packages/ui for Next.js only Expo apps use React Native components; sharing raw HTML/DOM components with mobile is not possible — only logic and types are shared
Turborepo remote cache (Vercel) ~70% CI time reduction on unchanged packages

2. System Context

[Diagram]

External services marked "configurable" are resolved at runtime via the Provider Abstraction Layer (see Section 4.6). The specific vendor is an environment variable, not a hardcoded dependency.


3. Container Architecture

[Diagram]

Deployment rationale:

Surface Platform Why
Next.js apps Vercel Zero-config SSR, edge CDN, preview deployments per PR
NestJS API Railway Simple Docker-based Node.js, persistent WebSocket support, no cold starts
PostgreSQL Neon Serverless branching (dev/staging per Neon branch), PITR, PostGIS, migrates to AWS RDS without code changes
Redis Upstash Serverless, compatible with BullMQ, Upstash REST fallback for edge functions
Vroom Railway Docker Co-located with API; internal HTTP only, never public
Mobile EAS + App Stores Over-the-air updates via EAS Update for JS-only changes

4. Backend Design

4.1 NestJS Module Graph

[Diagram]

ProvidersModule owns all swappable external service adapters. NotificationModule depends on it for SMS and email dispatch. DriverModule depends on it for document storage. No other module reaches an external SDK directly — they go through the interfaces exported by ProvidersModule. See Section 4.6 for full detail.

4.2 Request Pipeline

Every request passes through these layers in order. Failure at any layer short-circuits with a structured JSON error.

[Diagram]

4.3 REST API Surface

All endpoints prefixed /api/v1/. Breaking changes bump to /api/v2/.

Prefix Roles Key operations
/auth public / all login, register, refresh, logout, OTP, MFA
/users/me all get + update own profile
/dependents guardian, admin CRUD, login toggle, PIN reset
/trusted-circle guardian, admin manage per-dependent contacts
/saved-places guardian, admin manage per-dependent locations
/bookings guardian, dispatcher, admin create, approve, reject, list
/rides all (scoped) get, cancel, events log
/rides/:id/location driver GPS position POST
/rides/:id/validate-pin driver PIN entry and override flows
/safety/sos guardian, dependent trigger; ops acknowledges
/incidents all (scoped) CRUD, status transitions
/drivers/me driver profile, documents, shifts, availability
/drivers dispatcher, admin list, onboarding pipeline
/dispatch/optimize dispatcher confirmed rides list, Vroom solve
/route-plans dispatcher, admin approve, discard
/subscriptions guardian, admin create, cancel, pause
/payments guardian, dependent (limited), admin list, detail
/webhooks/stripe Stripe (HMAC-verified) event handler
/admin/config admin system config read/write

4.4 WebSocket Room & Event Reference

[Diagram]

Redis adapter (@socket.io/redis-adapter + Upstash) enables multi-replica horizontal scaling. Socket.io events emitted on any Railway replica are fanned out to all replicas via Redis pub/sub.

4.5 Background Workers (BullMQ)

Queue Trigger Worker action
location.persist GPS POST received Batch-insert ride_live_locations async
location.anomaly Location inserted Compare GPS vs planned route; escalate deviation level
notification.dispatch Any domain event Route to push/SMS/email via provider interfaces; exponential retry on failure
ride.offer.expire Assignment created Mark offer expired after configurable timeout
billing.cycle_close ElaAbo cycle end (cron) Calculate overage; issue Stripe charge
billing.webhook Stripe event received Process payment_succeeded / payment_failed
driver.doc_reminder Daily cron 08:00 Berlin Query docs expiring ≤30 days; send reminder push + email
ride.cleanup Ride → terminal state Delete ride_live_locations; clear Redis GPS key
data.retention Weekly cron Sunday 02:00 Execute per-type retention policy; log to audit table

All queues have a dead-letter queue. Permanently failed jobs (after max 5 attempts with exponential backoff) send an alert to the on-call Slack channel via Sentry alert rule.


4.6 Provider Abstraction Layer

Several external services in ElaRide are either not yet finalised (SMS provider, final hosting platform) or are likely to change as the business scales. The Provider Abstraction Layer decouples business logic from vendor SDKs so that swapping a provider is a single environment variable change with no modifications to any service, flow, or test.

Pattern

Each swappable concern has three artefacts:

  1. Interface — defined in packages/shared-types. Declares the contract: method signatures and return types. Business logic only imports this interface, never a concrete SDK.
  2. Concrete adapter — lives in services/api/src/providers/<concern>/. Implements the interface by wrapping the vendor SDK. One file per vendor.
  3. Factory module — a NestJS DynamicModule that reads an environment variable and registers the correct adapter as the interface token. Consuming modules declare a dependency on the token; NestJS injects the correct implementation automatically.
[Diagram]

The same pattern applies identically to email, storage, and maps.

Interface Catalogue

Token Interface Consumed by Adapters
SMS_PROVIDER ISmsProvider NotificationModule SevenAdapter · PlivoAdapter · GatewayApiAdapter · MockSmsAdapter
EMAIL_PROVIDER IEmailProvider NotificationModule BrevoAdapter · MockEmailAdapter
STORAGE_PROVIDER IStorageProvider DriverModule R2Adapter · S3Adapter · LocalStorageAdapter
MAP_PROVIDER IMapProvider TrackingModule · BookingModule GoogleMapsAdapter · MockMapAdapter

Stripe (PaymentModule) and Expo Push (NotificationModule) are not abstracted. Stripe is the stated payment provider and switching it would require a billing schema migration, not just an adapter swap. Expo Push is a thin wrapper over FCM/APNs with no meaningful alternative in the React Native ecosystem.

Environment Variables

SMS_PROVIDER=seven        # seven | plivo | gatewayapi | mock
EMAIL_PROVIDER=brevo      # brevo | mock
STORAGE_PROVIDER=r2       # r2 | s3 | local
MAP_PROVIDER=google       # google | mock

mock is the default for all providers when NODE_ENV=test or NODE_ENV=local. No external accounts or network calls are required to run the full stack locally.

Mock Adapters

Each mock adapter implements the full interface contract and is designed for three uses:

Adding a New Provider

  1. Write a new adapter class in services/api/src/providers/<concern>/ that implements the interface.
  2. Add a case to the factory module for the new env var value.
  3. Export a mock if needed.
  4. Update the SMS_PROVIDER (or equivalent) env var.

No business logic changes. No test changes (unless testing the adapter itself). No other modules are touched.


5. Security Architecture

5.1 Authentication Flow

[Diagram]

JWT payload structure:

{
  "sub": "uuid",
  "profileId": "uuid",
  "role": "guardian",
  "familyId": "uuid",
  "jti": "uuid",
  "iat": 1234567890,
  "exp": 1234568790
}

familyId = guardian_profiles.id. For non-guardian roles this is null. CASL uses it to scope family-level resource checks without a DB query.

5.2 Authorization Layers

[Diagram]

No database permission tables. Permissions are fixed business rules — they are code, not data. The role enum on the users table is sufficient for storage. Database permission tables are only warranted when an admin UI needs to configure permissions at runtime (multi-tenant SaaS). ElaRide has neither the complexity nor the runtime configuration requirement.

5.3 Field Encryption

Prisma middleware intercepts read/write operations on designated fields and applies AES-256-GCM transparently. The encryption key is stored in Railway's secret environment store, never in code or logs.

Field Table Classification
date_of_birth dependent_profiles Minor PII — highest sensitivity
special_needs_notes dependent_profiles Medical/disability PII
address_encrypted saved_places Home/school address
pickup_address_encrypted rides Exact pickup address
dropoff_address_encrypted rides Exact destination
totp_secret users MFA seed — encrypted with separate key

Driver documents (Führungszeugnis) are stored in a private storage bucket (R2 or S3 depending on STORAGE_PROVIDER). The API generates short-lived signed URLs (15-minute expiry) server-side, issued only to the admin role. No direct public bucket access.

5.4 Secret Management

[Diagram]

6. Real-time & Caching

6.1 GPS Data Path

[Diagram]

6.2 Redis Key Schema

Key pattern Type TTL Content
ride:{id}:loc STRING (JSON) Ride duration Latest GPS position
session:{userId}:{jti} STRING 7 days Refresh token validity flag
throttle:{userId}:{endpoint} STRING 60s Rate limit counter
throttle:ip:{ip} STRING 60s IP-level rate limit
bull:* Multiple Managed by BullMQ Queue backing store
io:* Multiple Managed by Socket.io WebSocket adapter

7. Infrastructure & Deployment

7.1 Environment Matrix

Env Frontend Backend Database
Local next dev / Expo Go nest start:dev (hot reload) Neon branch: dev-{name}
Staging Vercel Preview (auto on PR) Railway: staging service Neon branch: staging
Production Vercel Production Railway: prod service (2 replicas) Neon: main (+ read replica)

7.2 CI/CD Pipeline

[Diagram]

Zero-downtime deploys: Railway uses rolling deploy with health check on GET /health. New instance must respond 200 before old instance drains. Database migrations run in the pre-deploy hook and must be backward-compatible (additive only — no breaking schema changes in a single deploy).


8. Observability

Concern Tool Detail
Error tracking Sentry (NestJS + Next.js + Expo SDKs) Unhandled exceptions, slow transactions, source maps, user context
Structured logging Pino (NestJS) JSON logs: {requestId, userId, rideId, role, durationMs, statusCode}
Log aggregation Logtail (BetterStack) Queryable log storage; alert on error rate spike
Performance Sentry Performance P50/P95 per endpoint; Core Web Vitals on Next.js
Uptime Railway health check GET /health every 30s; auto-restart on failure
Queue health Bull Board (internal /ops/queues) Depth, failed jobs, retry counts — admin-auth required
Database Neon metrics dashboard Connection pool, query latency, WAL size

Request ID propagation: Every request is stamped with a UUID X-Request-ID at the edge (Cloudflare or first middleware). This ID is attached to every Pino log line, every Sentry event, and returned in error response bodies. End-to-end correlation across frontend → backend → database → external service is possible with one ID.


9. System Flows

9.1 Ride Booking Creation

[Diagram]

9.2 Ride Lifecycle State Machine

[Diagram]

9.3 GPS Tracking

[Diagram]

9.4 SOS Trigger & Escalation

[Diagram]

The SOS flow calls ISmsProvider via NotificationModule, never a vendor SDK directly. Swapping the SMS provider has no effect on this flow.

9.5 Dispatch & Driver Assignment

[Diagram]

9.6 PIN Validation & Forgotten PIN

[Diagram]

9.7 Payment Flow

[Diagram]

9.8 Driver Onboarding State Machine

[Diagram]

9.9 Ride Cancellation Flow

[Diagram]

Technology Stack

Layer Choice Rationale
Rider mobile app Expo SDK 51+ (React Native) Cross-platform iOS/Android. expo-location for GPS, expo-task-manager for background, expo-notifications for push.
Rider web app Next.js 14 (App Router, TypeScript) Proper desktop-first web UI. Shares TailwindCSS, shadcn/ui, and design system with ops dashboard. Types and API client shared via monorepo packages.
Driver app Expo SDK 51+ — separate app Persistent background GPS via expo-task-manager. Separate app store listing, permissions, and UX.
Ops dashboard Next.js 14 (App Router, TypeScript) Complex desktop UI: full web capabilities, TailwindCSS + shadcn/ui.
Backend NestJS 10 (TypeScript) Modular, strong DI, built-in WebSocket, Passport guards, auto-generated OpenAPI.
Provider abstraction Interface + Factory pattern (NestJS DI) All swappable external services (SMS, email, storage) are resolved at runtime via env var. Swapping a vendor is a single config change — no business logic changes. Mock adapters enable fully offline local development and deterministic tests. See Section 4.6.
Authorization CASL (@casl/ability) Resource-level attribute-based authorization in NestJS service layer. Handles ownership and family scoping conditions. Paired with custom @Roles() guard for route-level checks. No database permission tables — permissions are fixed business rules coded in the application.
ORM Prisma Type-safe queries, excellent migration tooling, auto-generated types shared via monorepo.
Database PostgreSQL 16 + PostGIS (Neon) Industry standard for ride-hailing (used by Uber, Lyft, Grab). PostGIS for geo queries. Neon: serverless, scales-to-zero for MVP cost, PITR, per-environment branching, migrates to AWS RDS as volume grows with no code changes.
Cache & Queue Redis via Upstash Serverless Redis. GPS position cache, JWT revocation store, BullMQ backing.
Job queue BullMQ Redis-backed, reliable retries, dead-letter queues, cron scheduling.
Real-time Socket.io (@nestjs/platform-socket.io) Room-based broadcasting per ride, handles reconnection, compatible with both Expo and Next.js clients.
File storage Cloudflare R2 (default) S3-compatible API, zero egress fees, private buckets with signed URLs for driver documents. Swappable to S3 or local via IStorageProvider (see Section 4.6).
Maps Google Maps Platform Maps SDK for React Native (mobile apps), Maps JavaScript API (Next.js web apps). Directions API, Geocoding API, Distance Matrix API.
Route optimization Vroom (self-hosted Docker) Open-source VRPTW solver, REST API, handles fleet routing constraints. Self-hosted alongside the API.
Payments Stripe PaymentIntents (ElaRide+), Subscriptions (ElaAbo), Payment Element (Apple/Google Pay), webhooks. Not abstracted — switching payment providers requires billing schema migration.
Push notifications Expo Push Notification Service Free, wraps FCM (Android) and APNs (iOS), delivery receipts API. Not abstracted — no meaningful alternative in the React Native ecosystem.
SMS Configurable via SMS_PROVIDER env var — shortlist: seven.io, Plivo, GatewayAPI All GDPR-compliant, EU-hosted. Final selection after pricing benchmark. Swappable without code changes via ISmsProvider.
Email Configurable via EMAIL_PROVIDER env var — default: Brevo EU-hosted, GDPR and ISO 27001 certified. Swappable without code changes via IEmailProvider.
Authentication Custom (Passport.js + JWT + TOTP via otplib) No external auth vendor. JWT (access 15 min, refresh 7 days rotating), Redis token store with revocation. TOTP for ops MFA.
Monorepo Turborepo Single repo: apps/rider-mobile, apps/rider-web, apps/driver, apps/admin, packages/api-client, packages/shared-types, packages/ui. Shared types prevent API/client drift; shared UI primitives between the two Next.js apps.
Error monitoring Sentry SDK for NestJS, Next.js, and Expo. Free tier covers MVP.
CI/CD GitHub Actions (tested locally via act) Lint, type-check, unit and integration tests on every PR. Pipeline developed and validated locally before any cloud deployment dependency.

End of ElaRide SDD v1.1