Chapter 02: Architecture Decision Records

Documenting Key Technical Decisions

This chapter documents the major architectural decisions for the POS Platform using Architecture Decision Records (ADRs). Each ADR captures the context, decision, and consequences of a significant technical choice.


What is an ADR?

Architecture Decision Records provide a structured way to document important technical decisions:

ADR Structure
=============

+------------------------------------------------------------------+
|  ADR-XXX: [Title]                                                 |
+------------------------------------------------------------------+
|  Status: [proposed | accepted | deprecated | superseded]         |
|  Date: YYYY-MM-DD                                                 |
|  Deciders: [who made the decision]                               |
+------------------------------------------------------------------+
|                                                                   |
|  CONTEXT                                                          |
|  - What is the issue?                                            |
|  - What forces are at play?                                      |
|  - What constraints exist?                                       |
|                                                                   |
|  DECISION                                                         |
|  - What is the change?                                           |
|  - What did we choose?                                           |
|                                                                   |
|  CONSEQUENCES                                                     |
|  - What are the positive outcomes?                               |
|  - What are the negative outcomes?                               |
|  - What risks are introduced?                                    |
|                                                                   |
+------------------------------------------------------------------+

ADR-001: Shared Tables with Row-Level Security Multi-Tenancy

Note: This ADR originally documented Schema-Per-Tenant (Strategy C) but was corrected to reflect the actual decision: Shared Tables with Row-Level Security (Strategy A). The RLS implementation is detailed in Chapter 04, Section L.10A.4.

+==================================================================+
|  ADR-001: Shared Tables with Row-Level Security Multi-Tenancy    |
+==================================================================+
|  Status: SUPERSEDED (corrected to Row-Level RLS, Ch 04           |
|          Section L.10A.4)                                         |
|  Date: 2025-12-29                                                |
|  Deciders: Architecture Team                                      |
+==================================================================+

CONTEXT
-------
We are building a multi-tenant POS platform that will serve multiple
independent retail businesses. Each tenant needs:

1. Strong data isolation for security and compliance
2. Easy backup and restore of individual tenant data
3. Ability to scale individual tenants independently
4. Compliance with SOC 2 and potential HIPAA requirements
5. Efficient connection pooling across all tenants

We evaluated three multi-tenancy strategies:

  Strategy A: Shared Tables (Row-Level Security)
  - All tenants share tables
  - tenant_id column on every business table
  - PostgreSQL RLS policies enforce isolation

  Strategy B: Separate Databases
  - Each tenant gets own database
  - Complete isolation
  - High connection overhead

  Strategy C: Schema-Per-Tenant
  - Single database, separate schemas
  - SET search_path per request
  - Logical isolation, shared infrastructure

DECISION
--------
We will use SHARED TABLES with ROW-LEVEL SECURITY multi-tenancy
(Strategy A).

Each tenant is identified by a tenant_id column on every business
table. PostgreSQL Row-Level Security (RLS) policies enforce isolation:

  CREATE POLICY tenant_isolation ON <table>
    USING (tenant_id = current_setting('app.current_tenant')::uuid);

The tenant is resolved from the subdomain (e.g., nexus.pos-platform.com)
and SET app.current_tenant is called per request via middleware.

CONSEQUENCES
------------
Positive:
  + Strong data isolation via PostgreSQL RLS (database-enforced)
  + Single schema — migrations apply once, not per-tenant
  + tenant_id enables straightforward cross-tenant analytics
    (platform admin)
  + Standard PostgreSQL feature — no custom middleware risk
  + All tenants share connection pool

Negative:
  - tenant_id required on every business table (discipline needed)
  - Every query must be RLS-aware (mitigated by Prisma middleware)
  - Cross-tenant queries require explicit bypasses
    (SET app.current_tenant = '')
  - Noisy neighbor risk on shared tables (mitigated by index
    partitioning)

Risks:
  - Forgetting tenant_id on new tables breaks isolation
  - RLS policies must be applied to every new table
  - Need robust middleware to always set app.current_tenant

Mitigations:
  - Prisma middleware automatically injects tenant_id on every query
  - CI/CD linter checks all tables for tenant_id + RLS policy
  - Integration tests verify tenant isolation per API endpoint

ADR-002: Offline-First POS Architecture

Superseded: The offline-first approach has been replaced by an online-first with offline fallback strategy. Target retail environments have reliable internet (outages measured in minutes/year). The online-first approach eliminates CRDTs, reduces SQLite from 6 tables to 2, and simplifies integration flows while preserving sales continuity during brief outages. See ADR-048.

+==================================================================+
|  ADR-002: Offline-First POS Architecture                         |
+==================================================================+
|  Status: SUPERSEDED (by ADR-048: Online-First POS Data Strategy) |
|  Date: 2025-12-29                                                |
|  Deciders: Architecture Team                                      |
+==================================================================+

CONTEXT
-------
POS terminals operate in retail environments where network
connectivity is unreliable:

1. Internet outages occur (ISP issues, weather, accidents)
2. WiFi can be congested during peak shopping hours
3. Store networks may have maintenance windows
4. Rural locations may have poor connectivity

A traditional online-required POS would:
- Block sales during outages (lost revenue)
- Show errors during slow connections (poor UX)
- Require manual workarounds (paper receipts)

Business requirements:
- Sales must NEVER be blocked by network issues
- Receipts must print immediately
- Data must eventually sync to central system
- Inventory should be reasonably accurate

DECISION
--------
We will implement OFFLINE-FIRST architecture for POS clients.

Key design elements:
1. Local SQLite database on each POS terminal
2. All operations work against local database first
3. Event queue for pending changes
4. Background sync when connectivity available
5. Conflict resolution for concurrent changes

Data flow:
  User Action -> Local DB -> Event Queue -> [Background] -> Central API

CONSEQUENCES
------------
Positive:
  + Sales never blocked by network issues
  + Instant response time (local operations)
  + Resilient to any connectivity problem
  + Business continues regardless of server status
  + Better user experience for cashiers

Negative:
  - Data is eventually consistent (not immediate)
  - Inventory counts may drift until sync
  - More complex architecture
  - Conflict resolution logic required
  - Local storage management needed

Risks:
  - Data loss if local device fails before sync
  - Inventory overselling possible during outages
  - Conflict resolution edge cases

Mitigations:
  - Aggressive sync when online (every 30 seconds)
  - Local database backup to secondary storage
  - Conservative inventory thresholds
  - Clear offline indicator in UI
  - Deterministic conflict resolution rules

ADR-003: Event Sourcing for Sales Domain

+==================================================================+
|  ADR-003: Event Sourcing for Sales Domain                        |
+==================================================================+
|  Status: ACCEPTED                                                 |
|  Date: 2025-12-29                                                |
|  Deciders: Architecture Team                                      |
+==================================================================+

CONTEXT
-------
The Sales domain has specific requirements that traditional CRUD
does not adequately address:

1. Complete audit trail required (PCI-DSS compliance)
2. Need to answer "what happened?" not just "what is?"
3. Offline clients need conflict-free merge capability
4. Historical analysis (sales trends, patterns)
5. Debugging production issues by replaying events

Traditional CRUD limitations:
- Only stores current state
- Updates overwrite history
- Hard to reconstruct past states
- Audit logs separate from data model

DECISION
--------
We will use EVENT SOURCING for the Sales aggregate.

Implementation:
1. Append-only event store in PostgreSQL
2. Events are the source of truth
3. Read models (projections) for queries
4. Snapshots for performance on long streams

Events captured:
- SaleCreated, SaleLineItemAdded, PaymentReceived, SaleCompleted
- SaleVoided, RefundProcessed
- All inventory changes (InventorySold, InventoryAdjusted)

NOT event-sourced (traditional CRUD):
- Products (read-heavy, infrequent changes)
- Employees (HR data, simple lifecycle)
- Locations (configuration data)

CONSEQUENCES
------------
Positive:
  + Complete audit trail built into data model
  + Temporal queries ("inventory on Dec 15 at 3pm")
  + Offline sync via event merge (append-only = no conflicts)
  + Debugging by event replay
  + Analytics on event streams
  + Natural fit for CQRS pattern

Negative:
  - More complex than CRUD
  - Requires event versioning strategy
  - Projections must be rebuilt if logic changes
  - Storage grows over time (mitigated by snapshots)
  - Learning curve for developers

Risks:
  - Event schema evolution complexity
  - Projection bugs cause stale read models
  - Performance without proper snapshotting

Mitigations:
  - Event versioning from day one
  - Automated projection rebuild process
  - Snapshot every 100 events
  - Clear documentation and training

ADR-004: JWT + PIN Authentication

+==================================================================+
|  ADR-004: JWT + PIN Authentication                               |
+==================================================================+
|  Status: ACCEPTED                                                 |
|  Date: 2025-12-29                                                |
|  Deciders: Architecture Team, Security Team                       |
+==================================================================+

CONTEXT
-------
POS systems have unique authentication requirements:

1. API access needs secure, stateless authentication
2. Cashiers need quick clock-in at physical terminals
3. Sensitive actions need additional verification
4. Multiple employees may share a terminal
5. Terminals may be offline

Requirements:
- Strong authentication for API/Admin access
- Fast authentication for cashiers (< 2 seconds)
- Manager override capability
- Works offline for cashier PIN

Industry standards:
- JWT is standard for API authentication
- PINs are standard for POS quick access
- Password + MFA for Nexus Admin access

DECISION
--------
We will implement a HYBRID authentication system:

1. JWT for API Authentication
   - Nexus Admin uses email + password + optional MFA
   - Issues JWT token (15 min access, 7 day refresh)
   - Standard Bearer token in Authorization header

2. PIN for POS Terminal Access
   - 4-6 digit PIN per employee
   - Stored as bcrypt hash in database
   - Used for: clock-in, sale attribution, drawer access

3. Manager Override
   - Sensitive actions require manager PIN
   - Void, large discount, price override
   - Manager enters their PIN to authorize

4. Offline PIN Validation
   - Employee records with PIN hashes cached locally
   - Validated against local cache when offline
   - Sync employee changes when online

CONSEQUENCES
------------
Positive:
  + Secure API access with industry-standard JWT
  + Fast cashier workflow with PIN
  + Manager oversight on sensitive operations
  + Works offline for POS operations
  + Clear audit trail (who did what)

Negative:
  - Two authentication systems to maintain
  - PIN is less secure than password (brute force)
  - Local PIN cache could be extracted
  - Token refresh complexity

Risks:
  - PIN guessing attacks
  - Stolen JWT tokens
  - Stale employee cache (terminated employee)

Mitigations:
  - Rate limiting on PIN attempts (3 failures = lockout)
  - Short JWT expiry (15 minutes)
  - Aggressive employee sync (every 5 minutes)
  - PIN attempt logging and alerting
  - Secure local storage encryption

ADR-005: PostgreSQL as Primary Database

+==================================================================+
|  ADR-005: PostgreSQL as Primary Database                         |
+==================================================================+
|  Status: ACCEPTED                                                 |
|  Date: 2025-12-29                                                |
|  Deciders: Architecture Team                                      |
+==================================================================+

CONTEXT
-------
We need a database that supports:

1. Row-Level Security multi-tenancy
2. JSONB for flexible event storage
3. Strong ACID guarantees for financial data
4. Good performance at scale
5. Mature ecosystem and tooling

Options considered:
- PostgreSQL: Schema support, JSONB, mature
- MySQL: Popular, but weaker schema support
- SQL Server: Good, but licensing costs
- MongoDB: Document store, no ACID, no schemas
- CockroachDB: Distributed, but complexity

DECISION
--------
We will use POSTGRESQL 16 as the primary database.

Justifications:
1. Native Row-Level Security (RLS) for multi-tenancy isolation
   (Originally: schema support; updated per ADR-001 supersession)
2. Excellent JSONB for event storage
3. Strong ACID for financial transactions
4. Proven at scale (Instagram, Uber, etc.)
5. Rich extension ecosystem (PostGIS, etc.)
6. Open source, no licensing costs
7. Excellent tooling (pgAdmin, pg_dump)

CONSEQUENCES
------------
Positive:
  + Native RLS for multi-tenant data isolation (see ADR-001 supersession)
  + JSONB enables flexible event data
  + Strong consistency guarantees
  + Mature, well-documented
  + No licensing costs
  + Excellent community support

Negative:
  - Single point of failure without replication
  - Requires PostgreSQL expertise
  - Not as horizontally scalable as NoSQL
  - Schema migrations need coordination

Mitigations:
  - Streaming replication for HA
  - Regular backups with pg_dump
  - Team training on PostgreSQL
  - Migration automation tooling

ADR-006: Node.js + TypeScript for Central API

2.6 ADR-006: Central API Framework

FieldValue
StatusAccepted
Date2026-02-28
Decision MakersArchitecture Review Team
ContextThe Central API needs a backend framework that supports high-performance I/O, strong typing, real-time features, and alignment with the frontend TypeScript ecosystem.

Context

The Central API is the backbone of the POS platform — serving REST endpoints for sales, inventory, customers, reporting, admin/setup, and integrations. It must support real-time inventory broadcasts to connected POS terminals, type-safe database access with automatic migrations, and Docker-based deployment on commodity hardware.

With the frontend stack standardized on React/TypeScript (Nexus POS via Tauri, Nexus Admin via web, Nexus Raptag via React Native), selecting a TypeScript-based backend enables a unified language across the entire platform. Shared types, validation schemas (Zod), and API contracts can be published as npm packages consumed by all clients.

Decision

We will use Node.js + Express/Fastify + TypeScript for the Central API.

Considered Options

  1. ASP.NET Core (C#) — High performance, strong typing, EF Core, SignalR
  2. Node.js + Express/Fastify (TypeScript) — Unified TypeScript stack, Prisma ORM, Socket.io
  3. Go (Gin) — Raw performance, small binary, but no type sharing with frontend
  4. Python (FastAPI) — Excellent for ML, but weaker typing and slower I/O
  5. Java (Spring) — Enterprise-grade, but verbose and no frontend code sharing

Decision Outcome

Chosen: Node.js + Express/Fastify + TypeScript because it unifies the entire platform on a single language (TypeScript), enables shared types between API and all client applications via npm packages, provides excellent I/O performance for the database-heavy POS workload, and offers the largest package ecosystem (2M+ npm packages).

Team context: Full TypeScript expertise aligned with React (Nexus POS/Admin) and React Native (Nexus Raptag) frontends. No language context-switching between backend and frontend development.

Trade-offs

Pros:

  • Unified TypeScript across entire stack — API, Nexus POS (Tauri + React), Nexus Admin (React web), Nexus Raptag (React Native)
  • Prisma ORM for type-safe PostgreSQL queries with automatic migrations and introspection
  • Socket.io for real-time inventory broadcasts to connected POS terminals (replaces SignalR)
  • Massive npm ecosystem (2M+ packages) — battle-tested libraries for every integration need
  • Excellent Docker support — Alpine Node.js images with small footprint (~50MB)
  • Same language for frontend and backend eliminates context-switching and enables code sharing
  • Strong typing via TypeScript catches errors at compile time with strict mode enabled
  • Shared validation schemas (Zod) and API types published as npm packages

Cons:

  • Single-threaded event loop — CPU-bound tasks require worker threads (mitigated by worker_threads for report generation)
  • Less raw compute performance than Go/Rust/C# — acceptable for I/O-bound POS workloads (database queries, Redis lookups, HTTP calls)
  • Node.js ecosystem moves fast — dependency churn (mitigated by pinned versions and lock file, see ADR-014)

References

  • Ch 04: Architecture Styles, Section L.9A (System Architecture)
  • ADR-014: npm Package Versioning (Pinned Major.Minor with Lock File)
  • ADR-046: Nexus Dual Deployment Architecture

ADR Index

ADRTitleStatusDate
ADR-001Shared Tables with Row-Level Security Multi-TenancySuperseded (corrected to Row-Level RLS, Ch 04 L.10A.4)2025-12-29
ADR-002Offline-First POS ArchitectureSuperseded (by ADR-048)2025-12-29
ADR-003Event Sourcing for Sales DomainAccepted2025-12-29
ADR-004JWT + PIN AuthenticationAccepted2025-12-29
ADR-005PostgreSQL as Primary DatabaseAccepted2025-12-29
ADR-006Node.js + TypeScript for Central APIAccepted2026-02-28
ADR-007Admin Portal Framework (Blazor Server)Superseded (by ADR-046)2026-02-27
ADR-008POS Client Framework (Tauri 2.0 + React/TypeScript)Accepted2026-02-28
ADR-009Redis for Session & CacheAccepted2026-02-27
ADR-010Shopify Sync Strategy (Webhook + Polling)Accepted2026-02-27
ADR-011Payment Gateway (SAQ-A Semi-Integrated)Accepted2026-02-27
ADR-012Logging & Monitoring (LGTM Stack)Accepted2026-02-27
ADR-013RFID Configuration in Tenant AdminSuperseded (by ADR-046)2026-01-01
ADR-014npm Package Versioning (Pinned Major.Minor with Lock File)Accepted2026-02-28
ADR-015Offline Sync Strategy (Queue-and-Sync with CRDTs)Accepted2026-02-27
ADR-016Error Code Structure (ERR-Mxxx Hierarchical)Accepted2026-02-27
ADR-017Test Strategy (Layered Testing Pyramid)Accepted2026-02-27
ADR-018Affirm BNPL IntegrationAccepted2026-02-27
ADR-019SAQ-A Semi-Integrated Payment ScopeAccepted2026-02-27
ADR-020Split Tender Payment SupportAccepted2026-02-27
ADR-021Layaway Payment PlansAccepted2026-02-27
ADR-022Tax-Inclusive Display with Compound CalculationAccepted2026-02-27
ADR-023Compound Tax (3-Level State/County/City)Accepted2026-02-27
ADR-024Gift Card Compliance (State Escheatment)Accepted2026-02-27
ADR-0256-Status Inventory State MachineAccepted2026-02-27
ADR-026Reservation-Based Inventory Hold ModelAccepted2026-02-27
ADR-027RFID Counting-Only Scope (No Lifecycle)Accepted2026-02-27
ADR-028Physical Count Freeze PeriodAccepted2026-02-27
ADR-029Adjustment Manager Approval (Universal)Accepted2026-02-27
ADR-030Auto-Suggest Transfers AlgorithmAccepted2026-02-27
ADR-031Shopify Webhook + Polling Dual SyncAccepted2026-02-27
ADR-032Strictest-Rule-Wins Cross-Platform ValidationAccepted2026-02-27
ADR-033Amazon SP-API Integration StrategyAccepted2026-02-27
ADR-034Google Merchant Center Feed StrategyAccepted2026-02-27
ADR-035Channel Safety Buffer CalculationAccepted2026-02-27
ADR-036POS-Master Default for External ChannelsAccepted2026-02-27
ADR-037Offline Conflict Resolution via CRDTsAccepted2026-02-27
ADR-038Transactional Outbox for Event PublishingAccepted2026-02-27
ADR-039CQRS Boundary (Sales Domain Only)Accepted2026-02-27
ADR-040Eventual Consistency SLA (5s Online, 30min Offline)Accepted2026-02-27
ADR-0416-Gate Security PyramidAccepted2026-02-27
ADR-042E2E Testing StrategyRemoved (duplicate of ADR-017)2026-02-27
ADR-043LGTM Observability StackRemoved (duplicate of ADR-012)2026-02-27
ADR-044API Performance TargetsAccepted2026-02-27
ADR-045Blue-Green Deployment StrategyAccepted2026-02-27
ADR-046Nexus Dual Deployment ArchitectureAccepted2026-02-28
ADR-047Raptag Mobile Framework (React Native)Accepted2026-02-28
ADR-048Online-First POS Data StrategyAccepted2026-03-01

ADR-013: RFID Configuration Embedded in Tenant Admin Portal

Superseded: The “Admin Portal” concept has been eliminated. RFID configuration is now accessed via Nexus Admin web app > Settings > RFID section. The decision to embed RFID in the main application (rather than a separate portal) remains valid — only the product surface name has changed. See ADR-046.

+==================================================================+
|  ADR-013: RFID Configuration Embedded in Tenant Admin Portal     |
+==================================================================+
|  Status: SUPERSEDED (by ADR-046: Nexus Dual Deployment           |
|          Architecture)                                            |
|  Date: 2026-01-01                                                |
|  Deciders: Architecture Team                                      |
+==================================================================+

CONTEXT
-------
RapOS includes RFID inventory capabilities via the Raptag mobile app.
The question arose: where should RFID configuration (device management,
printer setup, tag encoding settings, templates) be managed?

We evaluated three options:

  Option A: Embed in Tenant Admin Portal (app.rapos.com)
  - RFID settings as feature-flagged section in existing portal
  - Uses existing authentication, permissions, navigation
  - Shared context with products, locations, users

  Option B: Separate RFID Portal (rfid.rapos.com)
  - Dedicated portal just for RFID configuration
  - 4th portal in the architecture
  - Independent scaling and development

  Option C: Hybrid Approach
  - Basic settings in Tenant Admin
  - Advanced configuration in separate portal
  - Users navigate between portals

Research was conducted on major RFID vendors:
- SML Clarity: Single platform, modular components
- Checkpoint HALO/ItemOptix: Unified SaaS platform
- Avery Dennison atma.io: Role-based dashboards in one platform
- Impinj ItemSense: Single Management Console

Key finding: NO major RFID vendor uses separate portals for RFID
configuration. All embed RFID features within unified platforms.

DECISION
--------
We will EMBED RFID configuration in the Tenant Admin Portal (Option A).

Implementation:
- Settings > RFID section (feature-flagged)
- Devices tab: Claim codes, device list, release
- Printers tab: IP configuration, test connectivity
- Tag Configuration tab: EPC prefix (read-only), variance thresholds
- Templates tab: Label template library

Mobile app downloads configuration from central API on startup.
No RFID configuration in the mobile app itself.

CONSEQUENCES
------------
Positive:
  + Matches industry pattern (SML, Checkpoint, Avery Dennison)
  + Single login/URL for all tenant management
  + Shared context with products, locations, users
  + Lower development cost (one portal, not two)
  + Progressive disclosure manages complexity
  + Same permissions system applies to RFID

Negative:
  - Could become bloated if RFID features grow significantly
  - Enterprise customers might want dedicated RFID admin
  - Feature flags add slight complexity

Risks:
  - Tenant Admin may feel "cluttered" with many features
  - RFID power users may want more dedicated experience

Mitigations:
  - Use progressive disclosure (collapse advanced settings)
  - Role-based visibility (hide RFID from non-RFID users)
  - Monitor feedback; re-evaluate if enterprise demand grows
  - Feature-flagged sections can be extracted later if needed

Re-evaluation Triggers:
  - Multiple enterprise customers (100+ stores) request separation
  - RFID feature count exceeds 20+ configuration screens
  - Evidence that RFID admins are different people than Tenant admins

ADR-007: Admin Portal Framework — Blazor Server

Superseded: This ADR documents the original C#/Blazor Server architecture that was rejected during the v6.1.0 tech stack pivot. The separate Admin Portal has been eliminated. Administration is now integrated into the Nexus web application — the same React/TypeScript codebase deployed as both a Tauri desktop app (Nexus POS) and a standard web app (Nexus Admin). The current architecture uses React/TypeScript with Tauri 2.0 for the desktop POS client (see ADR-046 Nexus Dual Deployment). This record is preserved for historical context.

2.7 ADR-007: Admin Portal Framework

FieldValue
StatusSuperseded (by ADR-046)
Date2026-02-27
Decision MakersArchitecture Review Team
ContextThe Admin Portal needs a frontend framework that integrates with the .NET backend and supports real-time features.

Context

The Admin Portal (app.rapos.com) is the central management interface for tenant administrators. It provides dashboards, product management, employee management, reporting, and configuration. The portal requires real-time data updates (inventory levels, sales dashboards, integration sync status) and must share authentication and authorization logic with the Central API.

The team already uses C# for the Central API (ASP.NET Core 8.0), the POS Client (.NET MAUI), and the Mobile App (.NET MAUI). Introducing a JavaScript-based frontend would require maintaining two toolchains, two build systems, and two sets of domain models with mapping layers.

Admin portals are inherently server-heavy workloads: data-dense tables, reporting dashboards, configuration forms, and audit logs. Unlike consumer-facing SPAs, admin portals benefit more from server-side rendering and direct database access than from client-side interactivity.

Decision

We will use Blazor Server for the Admin Portal.

Considered Options

  1. React SPA — JavaScript/TypeScript Single Page Application with REST/GraphQL API calls
  2. Angular SPA — TypeScript-based enterprise SPA framework
  3. Vue.js SPA — Progressive JavaScript framework
  4. Blazor Server — Server-side Razor components with SignalR real-time updates

Decision Outcome

Chosen: Blazor Server because it unifies the entire stack on C#/.NET, eliminates the need for a separate JavaScript build toolchain, provides built-in real-time updates via SignalR (already used for inventory broadcasts), and enables sharing of domain models, validation logic, and DTOs directly between the API and the portal.

Trade-offs

Pros:

  • Unified .NET stack — same language, same models, same tooling across API, Admin Portal, POS Client, and Mobile
  • Built-in real-time via SignalR — dashboard updates, inventory alerts, sync status without polling
  • No separate build toolchain — no Node.js, npm, webpack, or Vite required for the Admin Portal
  • Server-side rendering — thin client, no large JavaScript bundles, fast initial load
  • Shared Blazor components with POS Client (.NET MAUI Blazor Hybrid)
  • Full access to .NET ecosystem (FluentValidation, MediatR, EF Core) in UI logic
  • Simplified authentication — shares the same ASP.NET Core Identity/JWT infrastructure

Cons:

  • Requires persistent SignalR connection — higher server memory per concurrent user
  • Latency on every UI interaction (round-trip to server) — acceptable for admin workloads, not for consumer SPAs
  • Smaller UI component ecosystem compared to React (mitigated by MudBlazor, Radzen, Syncfusion)
  • Team must learn Razor component model if unfamiliar (low risk given existing C# expertise)

References

  • Ch 04: Architecture Styles, Section L.9A (System Architecture) (Admin Portal details — planned future rewrite)
  • ADR-006: Node.js + TypeScript for Central API

ADR-008: POS Client Framework — Tauri 2.0 + React/TypeScript

Note (v7.0.0): The Tauri 2.0 desktop wrapper has been replaced by a pure React web application (ADR-052). Hardware peripherals now use web protocols (Star WebPRNT for receipt printers, USB HID for barcode scanners, Stripe Terminal SDK for payment terminals). SQLite offline storage uses WASM (sql.js/wa-sqlite + OPFS) instead of native better-sqlite3. The React/TypeScript architecture and shared codebase principles from this ADR remain valid.

2.8 ADR-008: POS Client Framework

FieldValue
StatusAccepted (Tauri-specific parts superseded by ADR-052)
Date2026-02-28
Decision MakersArchitecture Review Team
ContextThe POS Client runs on store terminals (Windows desktops/tablets), needs native hardware access (receipt printers ESC/POS, barcode scanners HID/serial, cash drawers RJ-11), offline-first local SQLite, and cross-platform desktop deployment.

Context

The POS Client (Nexus POS) runs on store terminals (Windows desktops/tablets) and must integrate with physical retail hardware: receipt printers (ESC/POS protocol), barcode scanners (HID/serial), cash drawers (RJ-11 trigger). It must operate fully offline with a local SQLite database and sync queued transactions when connectivity is restored.

With the tech stack pivot to TypeScript (ADR-006), the POS Client should use the same React/TypeScript codebase as the Nexus Admin web application. Tauri 2.0 enables wrapping a React web app as a native desktop application with Rust-powered backend commands for hardware access and performance-critical operations. The same React codebase is deployed as both a Tauri desktop app (Nexus POS) and a standard web app (Nexus Admin) — see ADR-046.

Decision

We will use Tauri 2.0 + React/TypeScript for the POS Client, with better-sqlite3 for local offline storage.

Considered Options

  1. Electron — Chromium-based desktop app with Node.js backend (rejected: 150MB+ bundle, Chromium overhead)
  2. Tauri 2.0 — Rust-based lightweight desktop app with web frontend (chosen)
  3. PWA (Progressive Web App) — Browser-based with service worker caching (rejected: no native hardware access)
  4. .NET MAUI Blazor Hybrid — Native .NET desktop app with embedded Blazor WebView (rejected: different language ecosystem from TypeScript stack)

Decision Outcome

Chosen: Tauri 2.0 + React/TypeScript because it provides full offline capability with local SQLite via better-sqlite3 (Tauri sidecar or native plugin), direct hardware access via Tauri Rust commands (receipt printer ESC/POS, barcode scanner, cash drawer), shares the same React codebase with Nexus Admin web app (dual deployment from single source), and produces a lightweight binary (~10MB vs Electron 150MB+).

Trade-offs

Pros:

  • Full offline capability with local SQLite via better-sqlite3 (Tauri sidecar or native plugin)
  • Direct hardware access via Tauri Rust commands (receipt printer ESC/POS, barcode scanner, cash drawer)
  • Same React codebase as Nexus Admin web app — dual deployment from single source (ADR-046)
  • Lightweight binary (~10MB vs Electron 150MB+) — important for store terminal hardware
  • No bundled Chromium — uses system WebView2 (Windows) reducing memory footprint
  • Rust backend for performance-critical paths (encryption, local DB operations, sync)
  • TypeScript shared types with Central API via npm packages
  • Single design system (TailwindCSS + shadcn/ui) across Nexus POS and Nexus Admin

Cons:

  • Tauri 2.0 is newer than Electron — smaller community, fewer third-party plugins (growing rapidly)
  • Rust commands require Rust expertise for hardware integration layer (contained scope)
  • WebView2 dependency on Windows (auto-installed on Windows 10 21H2+ and Windows 11)
  • Some rendering differences between WebView2 and Chrome (mitigated by consistent React component library)

References

  • Ch 04: Architecture Styles, Section L.10A.1 (Offline Strategy) (POS Client details — planned future rewrite)
  • ADR-002: Offline-First POS Architecture
  • ADR-046: Nexus Dual Deployment Architecture

ADR-009: Redis for Session & Cache

2.9 ADR-009: Redis for Session & Cache

FieldValue
StatusAccepted
Date2026-02-27
Decision MakersArchitecture Review Team
ContextThe platform needs distributed session management and caching for a horizontally scaled API layer.

Context

The Central API is deployed as multiple stateless instances behind a load balancer. User sessions (JWT refresh tokens, active cart state for the Nexus Admin) and frequently accessed data (product catalog, tax rates, tenant configuration) must be available to any API instance. In-memory caching per-instance leads to inconsistency when requests are load-balanced across instances.

Additionally, Module 6 (Integrations) requires real-time pub/sub for broadcasting inventory updates to connected POS terminals via Socket.io, and caching safety buffer computations to avoid recalculating on every channel sync.

Decision

We will use Redis 7.x for distributed session management, cache-aside pattern, and pub/sub real-time notifications.

Considered Options

  1. In-memory per-instance — Each API instance maintains its own cache
  2. Memcached — Simple distributed key-value cache
  3. PostgreSQL-based sessions — Store sessions in the primary database
  4. Redis 7.x — Distributed cache, session store, and pub/sub

Decision Outcome

Chosen: Redis 7.x because it supports all three use cases (session, cache, pub/sub) in a single infrastructure component, has excellent Node.js integration via ioredis, and provides sub-millisecond read latency for product lookups during checkout.

Trade-offs

Pros:

  • Distributed session — any API instance can serve any user without sticky sessions
  • Cache-aside pattern — product catalog, tax rates, and tenant config cached with configurable TTL
  • Pub/sub for real-time — inventory update broadcasts to Socket.io rooms without polling PostgreSQL
  • Sub-millisecond read latency — critical for checkout performance (NFR-PERF-001: < 500ms p99)
  • Built-in data structures (sorted sets for leaderboards, streams for event buffering)
  • Proven at scale — used by GitHub, Twitter, Stack Overflow

Cons:

  • Additional infrastructure component to deploy and monitor
  • Data loss on restart if not using AOF persistence (mitigated by AOF + RDB snapshots)
  • Memory-bound — cost increases with data volume (mitigated by TTL eviction policies)
  • Single-threaded command processing — throughput limited per instance (mitigated by Redis Cluster for scale)

References

  • Chapter 04: Architecture Styles, Section L.9A (Data Layer)
  • Chapter 09: Indexes & Performance

ADR-010: Shopify Sync Strategy — Webhook + Polling Hybrid

2.10 ADR-010: Shopify Sync Strategy

FieldValue
StatusAccepted
Date2026-02-27
Decision MakersArchitecture Review Team
ContextShopify integration requires real-time inventory sync with fallback for missed webhooks.

Context

The POS platform syncs product catalog and inventory levels bidirectionally with Shopify. Shopify provides webhooks for real-time notifications (products/update, inventory_levels/update, orders/create) but webhooks can be missed due to network issues, Shopify outages, or endpoint failures. The platform must guarantee eventual consistency between POS inventory and Shopify inventory.

BRD v18.0 Module 6 (Section 6.3) defines Shopify as the primary e-commerce integration with OAuth 2.0/PKCE authentication, GraphQL Admin API at 50 points/second rate limiting, and mandatory @idempotent mutations (required 2026-04).

Decision

We will use a Webhook + Polling hybrid strategy for Shopify synchronization.

Considered Options

  1. Pure Webhook — Rely solely on Shopify webhooks for all sync
  2. Pure Polling — Poll Shopify API on intervals for all changes
  3. Shopify Flow — Use Shopify’s built-in automation workflows
  4. Webhook + Polling hybrid — Webhooks for real-time, polling as fallback

Decision Outcome

Chosen: Webhook + Polling hybrid because webhooks provide near-real-time sync (< 5 seconds processing) for the common case, while scheduled polling (every 15 minutes) catches any missed webhooks and ensures eventual consistency. Both paths use idempotent processing with the same event pipeline.

Trade-offs

Pros:

  • Near-real-time sync via webhooks (< 5 seconds processing per NFR-INTG-001)
  • Guaranteed eventual consistency via polling fallback
  • Idempotent processing — same handler for webhook and polling events (no double-counting)
  • Resilient to webhook delivery failures (Shopify retries for 48 hours, polling catches the rest)
  • Rate-limit-aware polling with adaptive backoff

Cons:

  • More complex than pure polling (webhook endpoint, signature verification, retry handling)
  • Polling adds API calls that count against Shopify rate limits (mitigated by delta queries with updated_at_min)
  • Must handle duplicate events from both webhook and poll (mitigated by idempotency framework with 24-hour dedup window)

References

  • Ch 04: Architecture Styles, Section L.4B (Integration Architecture) (Integration patterns — see also Ch 05 Module 6)
  • BRD v20.0 Section 6.3 (Shopify Integration)

ADR-011: Payment Gateway — SAQ-A Semi-Integrated

2.11 ADR-011: Payment Gateway

FieldValue
StatusAccepted
Date2026-02-27
Decision MakersArchitecture Review Team, Security Team
ContextThe platform must process card payments with minimal PCI compliance scope.

Context

POS terminals must accept card payments (chip, tap, swipe) in physical stores. PCI-DSS compliance is mandatory, but the level of compliance effort varies dramatically based on how card data is handled. Full integration (SAQ-D) requires 300+ controls; semi-integrated (SAQ-A) requires ~30 controls because card data never touches our system.

The platform must support multiple payment providers to avoid vendor lock-in and enable tenant choice. The offline capability requires that payment tokens (not card data) can be stored locally for void/refund operations.

Decision

We will use SAQ-A Semi-Integrated terminals with Stripe Terminal and Square Terminal as supported providers.

Considered Options

  1. Full Integration (SAQ-D) — Card data flows through our system, encrypted and tokenized
  2. Semi-Integrated (SAQ-A) — Card data handled entirely by terminal/processor, we receive tokens only
  3. Redirect-only — Customer redirected to payment page (not applicable for in-store POS)
  4. Hosted Fields — Embedded payment form from provider (web-only, not applicable for desktop POS)

Decision Outcome

Chosen: SAQ-A Semi-Integrated because no card data (PAN, CVV, track data, PIN block) ever touches our system. The POS Client sends a payment request to the terminal SDK, the terminal communicates directly with the payment processor, and we receive only a token, approval code, and masked card number (****1234). This reduces PCI scope from 300+ controls to ~30.

Trade-offs

Pros:

  • Minimal PCI scope (SAQ-A: ~30 controls vs. SAQ-D: 300+ controls)
  • No card data storage, transmission, or processing in our system
  • Multi-provider support — Stripe Terminal and Square Terminal via provider abstraction
  • Token-based void/refund — works offline using stored payment tokens
  • Terminal firmware managed by provider (no EMV kernel maintenance)

Cons:

  • Dependent on terminal hardware availability and provider SDK updates
  • Terminal communication adds latency (~1-3 seconds for chip transactions)
  • Limited control over payment UX (terminal screen is provider-controlled)
  • Two provider SDKs to maintain (Stripe Terminal SDK, Square Terminal SDK)

References

  • Ch 04: Architecture Styles, Section L.10A.3 (Payment Integration) (Security & Auth details — planned future rewrite)
  • Ch 04: Architecture Styles, Section L.8 (Security — 6-Gate Pyramid)
  • BRD v20.0 Section 1.18 (Payments)

ADR-012: Logging & Monitoring — LGTM Stack

2.12 ADR-012: Logging & Monitoring

FieldValue
StatusAccepted
Date2026-02-27
Decision MakersArchitecture Review Team, Infrastructure Team
ContextThe platform needs unified observability across API, POS clients, integrations, and infrastructure.

Context

The POS platform has multiple observable surfaces: the Central API (multiple instances), POS terminals in stores (offline-capable), external integrations (Shopify, Amazon, Google Merchant), and infrastructure (PostgreSQL, Redis, Kafka v2.0). Operators need logs, metrics, and distributed traces to diagnose issues like “why did this sale fail to sync?” or “why is the Shopify circuit breaker open?”

Cloud-native SaaS solutions (Datadog, New Relic) offer convenience but at significant cost ($15-25/host/month) and with vendor lock-in. The platform uses OpenTelemetry for instrumentation, which enables backend-agnostic telemetry collection.

Decision

We will use the LGTM Stack (Loki, Grafana, Tempo, Mimir/Prometheus) for observability.

Considered Options

  1. ELK Stack (Elasticsearch, Logstash, Kibana) — Established log aggregation platform
  2. Datadog — Cloud-native SaaS observability platform
  3. Cloud-native (CloudWatch, Azure Monitor) — Cloud provider native tools
  4. LGTM Stack (Loki, Grafana, Tempo, Prometheus) — Open-source observability platform

Decision Outcome

Chosen: LGTM Stack because it is fully open-source (no per-host licensing), self-hosted (data stays on our infrastructure for PCI compliance), and designed for the OpenTelemetry ecosystem. Grafana provides unified dashboards for logs (Loki), traces (Tempo), and metrics (Prometheus), with native Node.js auto-instrumentation via @opentelemetry/sdk-node.

Trade-offs

Pros:

  • Open-source — no per-host licensing costs, no vendor lock-in
  • Self-hosted — data stays on infrastructure (PCI compliance for audit logs)
  • Unified dashboards in Grafana — logs, metrics, and traces correlated by trace ID
  • Loki uses label-based indexing (not full-text) — lower storage costs than Elasticsearch
  • Native OpenTelemetry support — Node.js auto-instrumentation for Express/Fastify, Prisma, HTTP client
  • Integration-specific dashboards: circuit breaker state, DLQ depth, sync latency, safety buffer violations

Cons:

  • Operational overhead — must manage Loki, Tempo, Prometheus, Grafana infrastructure
  • Less feature-rich than Datadog for APM (no automatic service maps, no AI anomaly detection)
  • Grafana alerting is functional but less sophisticated than PagerDuty/OpsGenie (mitigated by Alertmanager integration)
  • Storage management required for long-term log/metric retention

References

  • Ch 04: Architecture Styles, Section L.7 (Observability) (Monitoring details — planned future rewrite)
  • Ch 04: Architecture Styles, Section L.8 (Security — FIM via Wazuh)

ADR-014: npm Package Versioning — Pinned Major.Minor with Lock File

2.14 ADR-014: npm Package Versioning

FieldValue
StatusAccepted
Date2026-02-28
Decision MakersArchitecture Review Team
ContextThe platform depends on critical npm packages that must be version-controlled for build reproducibility and security.

Context

The POS platform uses multiple npm packages for core functionality: Prisma for type-safe PostgreSQL access, ioredis for caching, Socket.io for real-time broadcasts, Zod for schema validation, pino for structured logging, jose for JWT operations, and argon2 for password hashing. The frontend uses React, TailwindCSS, shadcn/ui, React Query, and Zustand.

Floating version ranges (*, latest) can introduce breaking changes in CI/CD. Exact pinning (4.18.0) prevents security patches. A balanced approach is needed. The monorepo uses pnpm as the package manager with a committed lock file.

Decision

We will use pinned major.minor in package.json (e.g., "express": "^4.18") with pnpm-lock.yaml committed for full reproducibility. Dependabot/Renovate automates PR-based updates.

Considered Options

  1. Floating versions (*) — Always use latest available
  2. Exact pinning (4.18.2) — Lock to specific patch version
  3. Caret ranges (^4.18.0) — Allow minor + patch updates
  4. Pinned major.minor with lock file (^4.18 + committed pnpm-lock.yaml)

Decision Outcome

Chosen: Pinned major.minor with lock file because it ensures build reproducibility via the committed pnpm-lock.yaml (identical installs across developer machines and CI/CD) while allowing patch-level security fixes. Dependabot/Renovate creates PRs for major/minor bumps with changelog review.

Key Package Versions:

PackagePinned VersionPurpose
express or fastify^4.18 / ^5.0HTTP framework
@prisma/client^5.xPostgreSQL ORM (type-safe)
ioredis^5.xRedis client
socket.io^4.xReal-time WebSocket
zod^3.xSchema validation
pino^8.xStructured logging
@opentelemetry/sdk-node^1.xObservability instrumentation
jose^5.xJWT signing/verification (RS256)
argon2^0.xPassword hashing (Argon2id)
better-sqlite3^11.xSQLite for Tauri POS local DB
kafkajs^2.xKafka client (v2.0 future)

Trade-offs

Pros:

  • Build reproducibility — pnpm-lock.yaml ensures identical dependency trees across all environments
  • Automatic security patches — patch versions flow through automatically
  • Consistent across developer machines and CI/CD
  • Dependabot/Renovate creates PRs for major/minor bumps with changelog review
  • pnpm strict mode prevents phantom dependencies

Cons:

  • Patch-level changes could theoretically introduce bugs (extremely rare, mitigated by CI test suite)
  • Requires manual intervention for major/minor upgrades (by design — these are reviewed)
  • Lock file must be committed and kept up to date (pnpm-lock.yaml)
  • npm ecosystem has higher dependency churn than .NET (mitigated by lock file + Renovate)

References

  • Ch 04: Architecture Styles, Section L.8 (SCA — Snyk/OWASP) (Dev Environment details — planned future rewrite)
  • PCI-DSS 4.0 Req 6.3.2 (SBOM generation)

ADR-015: Offline Sync Strategy — Queue-and-Sync with CRDTs

SUPERSEDED: This ADR has been superseded by ADR-048 (Online-First with Offline Fallback). CRDTs were eliminated in v6.2.0. This record is preserved for historical context.

2.15 ADR-015: Offline Sync Strategy

FieldValue
StatusSuperseded (by ADR-048)
Date2026-02-27
Decision MakersArchitecture Review Team
ContextPOS terminals operating offline must sync transactions and inventory changes without data loss or conflicts.

Context

ADR-002 established offline-first as a core requirement. This ADR specifies the sync mechanism. When POS terminals are offline, sales, payments, and inventory changes accumulate locally. When connectivity is restored, these changes must be pushed to the Central API and merged with changes from other terminals and the Nexus Admin.

The key challenge is conflict resolution: two terminals may sell the last unit of a product simultaneously, or an admin may update a price while a terminal is offline. The sync strategy must handle these cases deterministically without data loss.

Decision

We will use Queue-and-Sync with CRDTs for offline synchronization.

Considered Options

  1. Sync-on-connect — Full database sync when connectivity is restored
  2. Optimistic sync — Push local changes, accept server response as authority
  3. Operational Transforms (OT) — Transform operations based on concurrent changes
  4. Queue-and-Sync with CRDTs — Priority-based sync queue with CRDT merge for conflict-free data types

Decision Outcome

Chosen: Queue-and-Sync with CRDTs because it combines append-only event queuing (sales are conflict-free by nature) with CRDT data structures for data types that need merge (inventory counters, price updates, cart items). Priority-based queuing ensures critical data (sales, payments) syncs before less critical data (customer updates, analytics).

Sync Priority Tiers:

PriorityEvent TypesSync Timing
1 (Critical)Sales, Payments, Refunds, VoidsImmediate when online
2 (Important)Inventory adjustments, TransfersWithin 5 minutes
3 (Normal)Customer updates, Loyalty changesWithin 15 minutes
4 (Low)Analytics events, LogsBatch sync hourly

CRDT Usage:

CRDT TypeUse CaseMerge Strategy
PN-CounterInventory levels (+/-)Sum increments, sum decrements
LWW-RegisterPrice updates, last modifiedHighest timestamp wins
OR-SetCart items, applied discountsUnion with tombstones
G-CounterTransaction counts, sales countsSum all increments

Trade-offs

Pros:

  • Sales never conflict — append-only events with unique IDs
  • Inventory converges automatically — PN-Counter CRDTs are mathematically guaranteed to converge
  • Priority-based sync — critical financial data syncs before convenience data
  • Parked sales support — up to 5 parked sales per terminal with 4-hour TTL
  • Queue limit (100 transactions) prevents unbounded offline operation

Cons:

  • CRDT implementation adds complexity to the sync layer
  • PN-Counters can temporarily show incorrect inventory (converges after sync)
  • Tombstone management for OR-Sets requires periodic compaction (7-day TTL)
  • Some operations blocked offline (customer create, gift card activation) to prevent inconsistencies

References

  • Chapter 04: Architecture Styles, Section L.10A.1 (Online-First with Offline Fallback)
  • ADR-002: Offline-First POS Architecture (superseded)
  • ADR-003: Event Sourcing for Sales Domain
  • ADR-048: Online-First POS Data Strategy (supersedes this ADR)

ADR-016: Error Code Structure — ERR-Mxxx Hierarchical

2.16 ADR-016: Error Code Structure

FieldValue
StatusAccepted
Date2026-02-27
Decision MakersArchitecture Review Team
ContextThe platform needs a structured error code system for consistent error handling across 7 modules.

Context

The POS platform has 7 BRD modules, each generating different types of errors. Without a structured error code system, error handling degrades to HTTP status codes and free-form messages, making it difficult for POS Client developers, integration partners, and support teams to programmatically handle specific error conditions.

BRD v20.0 already defines module-specific error codes (ERR-5xxx for Module 5, ERR-6xxx for Module 6). This ADR formalizes the structure across all modules.

Decision

We will use a hierarchical ERR-Mxxx error code structure where M identifies the module (1-6) and xxx identifies the specific error within that module.

Considered Options

  1. HTTP-only — Rely solely on HTTP status codes (400, 404, 409, 500)
  2. Free-form strings — Arbitrary error codes like “SALE_NOT_FOUND”, “INVENTORY_INSUFFICIENT”
  3. Exception-based — Let exception types define error categories
  4. ERR-Mxxx hierarchical — Structured numeric codes with module prefix

Decision Outcome

Chosen: ERR-Mxxx hierarchical because it provides predictable, documented, machine-parseable error codes that map directly to BRD module boundaries. POS Client developers can switch on error code ranges, and support teams can triage by module.

Error Code Ranges:

RangeModuleExamples
ERR-1xxxModule 1: SalesERR-1001 (sale not found), ERR-1010 (void window expired)
ERR-2xxxModule 2: InventoryERR-2001 (insufficient stock), ERR-2010 (transfer rejected)
ERR-3xxxModule 3: CustomersERR-3001 (duplicate email), ERR-3010 (loyalty balance insufficient)
ERR-4xxxModule 4: ReportingERR-4001 (date range too large), ERR-4010 (export limit exceeded)
ERR-5xxxModule 5: Admin/SetupERR-5071 (register IP change limit), ERR-5072 (register retire requires OWNER)
ERR-6xxxModule 6: IntegrationsERR-6001 (provider auth failed), ERR-6010 (circuit breaker open)

Trade-offs

Pros:

  • Predictable structure — POS Client can switch on error range (1xxx = sales, 2xxx = inventory)
  • Machine-parseable — error codes are numeric, not free-form strings
  • Aligned with BRD module boundaries — easy to trace errors to requirements
  • Supports i18n — error codes mapped to localized messages on the client
  • Documented in API reference (Appendix A — planned future rewrite) — developers know all possible errors per endpoint

Cons:

  • Requires maintaining error code registry (mitigated by code generation from registry file)
  • Must avoid error code conflicts as modules grow (mitigated by 1000-code range per module)
  • Error codes are less self-descriptive than string codes (mitigated by including message field in error response)

References

  • Ch 05: Architecture Components (BRD v20.0 Sections 5.x and 6.x — error code definitions) (API Design chapter — planned future rewrite)

ADR-017: Test Strategy — Layered Testing Pyramid

2.17 ADR-017: Test Strategy

FieldValue
StatusAccepted
Date2026-02-27
Decision MakersArchitecture Review Team, QA Team
ContextThe platform needs a testing strategy that balances coverage, speed, and confidence for a multi-tenant POS system.

Context

The POS platform processes financial transactions, manages inventory across multiple locations, and integrates with 6 external provider families. Testing must verify correctness at multiple levels: domain logic (tax calculation, commission reversal), API contracts (multi-tenant isolation, error codes), integration behavior (Shopify webhooks, payment terminals), and end-to-end workflows (offline sale → sync → inventory update).

BRD v18.0 defines 36 user stories with Gherkin acceptance criteria. Three platform sandboxes (Shopify Dev Store, Amazon SP-API Sandbox, Google Merchant test account) must be exercised in CI/CD.

Decision

We will use a Layered Testing Pyramid with specific tool choices per layer.

Considered Options

  1. Flat testing — Equal effort at all levels, no pyramid structure
  2. E2E-heavy — Focus on end-to-end tests with minimal unit tests
  3. Property-based — Use property-based testing (QuickCheck/FsCheck) as primary strategy
  4. Layered Testing Pyramid — Traditional pyramid: many unit, fewer integration, fewest E2E

Decision Outcome

Chosen: Layered Testing Pyramid because it provides fast feedback at the bottom (unit tests in < 5 seconds), confidence in the middle (integration tests with real PostgreSQL via Testcontainers-node), and end-to-end validation at the top (Playwright for browser automation). This matches the team’s TypeScript expertise and CI/CD pipeline constraints.

Testing Pyramid:

LayerToolCoverage TargetSpeedScope
UnitVitest80%< 5 secDomain logic, validators, calculators
IntegrationTestcontainers-node + Vitest15%< 2 minAPI endpoints, DB queries, Redis, RLS
E2EPlaywright5%< 10 minFull workflows: login → sale → receipt
Loadk6N/A30 minBlack Friday simulation: 500 concurrent, 1000 TPS
ContractPactN/A< 1 minShopify/Amazon/Google sandbox API contracts
Security6-Gate PyramidN/A< 5 minSAST, SCA, Secrets, ArchUnit, Pact, Manual

Trade-offs

Pros:

  • Fast feedback — Vitest unit tests run in seconds with native TypeScript support, catching regressions immediately
  • Real database testing — Testcontainers-node spins up PostgreSQL 16 with RLS for integration tests
  • Multi-tenant isolation verified — integration tests confirm tenant_id RLS policies prevent cross-tenant access
  • Contract testing with external platforms — Pact verifies Shopify/Amazon/Google API contracts
  • Load testing prevents performance regressions — k6 validates NFR-PERF-001 (< 500ms p99 checkout)

Cons:

  • Testcontainers-node requires Docker in CI/CD (standard in modern CI)
  • Playwright E2E tests are slower and more brittle (mitigated by limiting to critical paths only)
  • Load testing requires dedicated environment (not run on every commit, only on release candidates)
  • Contract tests depend on external sandbox availability (mitigated by recorded responses as fallback)

References

  • Chapter 04: Architecture Styles, Section L.6 (QA & Testing)
  • Chapter 04: Architecture Styles, Section L.8 (6-Gate Security Pyramid)
  • (Dev Environment and Checklists chapters — planned future rewrite)

ADR-018: Affirm BNPL Integration

2.18 ADR-018: Affirm BNPL Integration

FieldValue
StatusAccepted
Date2026-02-27
Decision MakersArchitecture Review Team
ContextThe platform needs a Buy Now Pay Later (BNPL) option for high-value retail transactions at the point of sale.

Context

Retail clothing transactions can reach $200-$500+, creating friction for customers who prefer installment payments. The POS must offer a third-party financing option that does not add PCI scope, integrates with the existing checkout flow, and pays the merchant in full immediately while the customer repays the financing provider directly.

BRD v20.0 Section 1.3 defines Third-Party Financing as a payment method alongside cash, card, gift card, on-account, and layaway. The financing flow must support both in-store QR code presentation and customer-device redirect.

Decision

We will integrate Affirm as the BNPL provider for in-store financing.

Considered Options

  1. Affirm — Established BNPL provider with in-store POS SDK, QR code flow, and merchant dashboard
  2. Klarna — Popular BNPL with strong e-commerce presence but limited in-store POS integration
  3. Afterpay/Clearpay — Fixed 4-installment model, limited flexibility for higher-value purchases
  4. In-house installment plans — Build custom financing directly in the POS system

Decision Outcome

Chosen: Affirm because it provides a well-documented in-store API, supports variable loan terms (3-36 months), pays the merchant full amount immediately (the store receives 100% of the sale amount from Affirm), and the customer completes the entire application on their own device. No card data or financial data touches the POS system — only a charge_id, loan_id, and approval status are stored.

Trade-offs

Pros:

  • Full payment received from Affirm immediately — no credit risk for the merchant
  • No PCI scope increase — customer’s financial data handled entirely by Affirm
  • QR code flow integrates cleanly into existing POS checkout sequence
  • Affirm handles all underwriting, collections, and customer communication
  • Established retail brand (Peloton, Shopify, Walmart) provides customer trust

Cons:

  • Affirm charges merchant fees (typically 3-6% per transaction) reducing margin
  • Approval is not guaranteed — customer may be declined, requiring fallback to another payment method
  • Adds dependency on Affirm API availability during checkout (mitigated by circuit breaker)
  • Limited to Affirm-supported markets (US primarily)

References

  • Chapter 05: Architecture Components, Section 1.3 (Financial Settlement)
  • Ch 05: Architecture Components, Module 6 (Integrations) (Integration Patterns chapter — planned future rewrite)
  • ADR-019: SAQ-A Semi-Integrated Payment Scope

ADR-019: SAQ-A Semi-Integrated Payment Scope

2.19 ADR-019: SAQ-A Semi-Integrated Payment Scope

FieldValue
StatusAccepted
Date2026-02-27
Decision MakersArchitecture Review Team, Security Team
ContextCard payment processing requires PCI-DSS compliance; the scope of compliance depends on how card data is handled.

Context

POS terminals must accept chip, tap, and swipe card payments. PCI-DSS compliance levels range from SAQ-A (~30 controls, card data never touches our system) to SAQ-D (300+ controls, full card data flow through our system). The choice fundamentally shapes the security architecture, development effort, and ongoing compliance cost.

BRD v20.0 Section 1.18 mandates that “Card data NEVER touches your system” and specifies a semi-integrated terminal architecture where the payment terminal communicates directly with the payment processor. The POS backend receives only tokens, approval codes, and masked card numbers (last 4 digits).

Decision

We will implement SAQ-A semi-integrated payment terminals where card data is handled entirely by the terminal hardware and payment processor SDK. The POS system stores only: transaction_id, payment_token, approval_code, masked_card_number (****1234), card_brand, entry_method, terminal_id, timestamp, and amount.

Considered Options

  1. SAQ-D Full Integration — Card data encrypted and tokenized through our system (300+ PCI controls)
  2. SAQ-A Semi-Integrated — Card data handled by terminal/processor, we receive tokens only (~30 controls)
  3. SAQ-A-EP (E-commerce) — Redirect to hosted payment page (not applicable for in-store POS)

Decision Outcome

Chosen: SAQ-A Semi-Integrated because it reduces PCI compliance scope by 90% (from 300+ to ~30 controls), eliminates the risk of card data breach from our systems, and supports token-based void/refund operations that work offline. Stripe Terminal and Square Terminal are supported as interchangeable providers via the IIntegrationProvider abstraction (Ch 05 Section 6.2.1).

Trade-offs

Pros:

  • 90% reduction in PCI compliance scope and audit effort
  • Zero card data in our system — breach of our database exposes no payment card information
  • Token-based refund/void works offline using stored payment tokens
  • Terminal firmware and EMV kernel managed by provider — no maintenance burden
  • Multi-provider support via provider abstraction prevents vendor lock-in

Cons:

  • Dependent on terminal hardware availability and SDK compatibility
  • Terminal communication adds 1-3 seconds latency for chip transactions
  • Limited control over payment UX (terminal screen controlled by provider)
  • Must maintain two provider SDKs (Stripe Terminal, Square Terminal)

References

  • Chapter 05: Architecture Components, Section 1.18 (Payment Integration)
  • Ch 04: Architecture Styles, Section L.8 (Security) (Security chapters — planned future rewrite)
  • ADR-011: Payment Gateway (SAQ-A Semi-Integrated)

ADR-020: Split Tender Payment Support

2.20 ADR-020: Split Tender Payment Support

FieldValue
StatusAccepted
Date2026-02-27
Decision MakersArchitecture Review Team
ContextRetail customers frequently need to pay with multiple payment methods in a single transaction.

Context

Retail transactions commonly involve multiple payment methods: cash + card, multiple credit cards, gift card + card, on-account + cash, or Affirm for the remaining balance. BRD v20.0 Section 1.3 defines a tender loop where the cashier selects payment methods iteratively until the remaining balance reaches zero. Each tender is tracked independently for refund routing — a refund must be returned to the original payment method.

Decision

We will support unlimited split tender combinations where any payment method can be combined with any other. Each tender in a transaction is stored as a separate payment record with its own token/reference, enabling per-tender refund routing.

Considered Options

  1. Single tender only — One payment method per transaction (simplest but poor UX)
  2. Two-tender maximum — Allow at most two payment methods (limits flexibility)
  3. Unlimited split tender — Any number of payment methods per transaction

Decision Outcome

Chosen: Unlimited split tender because retail customers expect payment flexibility, and gift card partial balances naturally require a second tender for the remainder. Each payment record stores its own token (for card), reference (for Affirm), or cash amount, enabling precise refund routing back to the original payment source.

Trade-offs

Pros:

  • Maximum payment flexibility — matches customer expectations in retail
  • Gift card partial balance + card is a common scenario handled naturally
  • Per-tender refund routing — each payment token tracked independently
  • Supports combining all 6 payment types: cash, card, gift card, on-account, layaway deposit, Affirm

Cons:

  • Refund logic complexity — must track which tender to refund to and in what order
  • Multiple card tenders mean multiple terminal interactions during checkout
  • Receipt layout must accommodate variable number of payment lines
  • Reconciliation reports must aggregate across tender types

References

  • Chapter 05: Architecture Components, Section 1.3 (Financial Settlement)
  • Ch 05: Architecture Components, Section 3.8 (Payment Processing) (API Design chapter — planned future rewrite)

ADR-021: Layaway Payment Plans

2.21 ADR-021: Layaway Payment Plans

FieldValue
StatusAccepted
Date2026-02-27
Decision MakersArchitecture Review Team
ContextSome customers need to pay for high-value items over time with a deposit and installments, with inventory reserved until paid in full.

Context

Layaway is a traditional retail financing model where the customer pays a minimum deposit, inventory is reserved (not released), and the customer makes additional payments over time until the full amount is paid. BRD v20.0 Section 1.3 defines a layaway state machine: DEPOSIT_PAID -> RESERVED -> PAID_IN_FULL -> COMPLETED, with CANCELLED and FORFEITED as terminal states. The credit limit calculation must include pending layaway balances.

Decision

We will implement native layaway with configurable minimum deposit percentage, reservation-based inventory hold, and a state machine governing the layaway lifecycle. Layaway balances are included in the credit limit calculation: Available Credit = Credit Limit - (Current Debt + Pending Layaway Balances + Current Cart Total).

Considered Options

  1. No layaway — Direct customers to Affirm BNPL instead
  2. Basic layaway — Deposit + single final payment, no partial installments
  3. Full layaway with installments — Deposit + multiple partial payments with deadline tracking

Decision Outcome

Chosen: Full layaway with installments because it is a standard expectation in brick-and-mortar retail, allows flexible payment schedules, and reserves inventory to guarantee availability. Unlike Affirm, layaway involves no third-party fees — the store manages the payment plan directly.

Trade-offs

Pros:

  • No third-party fees — merchant keeps full margin
  • Inventory reserved for customer until paid in full
  • Configurable minimum deposit percentage per tenant
  • Overdue tracking with forfeiture rules protects against abandoned layaways
  • Familiar model for retail staff and customers

Cons:

  • Inventory is tied up during the layaway period (not available for other sales)
  • Risk of forfeiture — must handle cancellation refund policies (configurable)
  • Adds complexity to credit limit calculations
  • Reporting must track outstanding layaway liability

References

  • Chapter 05: Architecture Components, Section 1.3 (Layaway State Machine)
  • Chapter 05: Architecture Components, Module 7 (State Machine Reference)
  • ADR-020: Split Tender Payment Support

ADR-022: Tax-Inclusive Display with Compound Calculation

2.22 ADR-022: Tax-Inclusive Display with Compound Calculation

FieldValue
StatusAccepted
Date2026-02-27
Decision MakersArchitecture Review Team
ContextThe POS must calculate and display tax correctly for US retail, where tax is calculated externally (not embedded in the price).

Context

US retail uses tax-exclusive pricing — product prices on the shelf do not include tax, and tax is calculated at checkout based on the store’s jurisdiction. BRD v20.0 Section 1.17 defines a tax hierarchy where product-level exemptions have highest priority, followed by customer-level exemptions, followed by the store’s location-based compound jurisdiction rate. Tax is computed per line item and displayed as a separate total on the receipt.

Section 5.9 defines the compound tax model: State + County + City rates summed at time of sale. Example: Norfolk, VA = State 4.3% + Regional 0.7% + City 1.0% = 6.0% compound rate.

Decision

We will use tax-exclusive pricing with compound tax calculation at checkout. Product prices are stored without tax. At checkout, all active rates for the store’s tax jurisdiction are summed and applied to each taxable line item. The receipt displays subtotal, tax breakdown (optionally by level), and total.

Considered Options

  1. Tax-inclusive pricing — Embed tax in the product price (common in EU/UK, not US)
  2. Tax-exclusive with flat rate — Single tax rate per location
  3. Tax-exclusive with compound rate — Multi-level (State/County/City) summed at checkout
  4. External tax service — Delegate to TaxJar/Avalara API for real-time calculation

Decision Outcome

Chosen: Tax-exclusive with compound rate because it matches US retail practice, supports the 3-level Virginia tax structure (the reference implementation), and enables future expansion to other states with complex district overlays (California) or no sales tax (Oregon). The tax engine is built internally rather than delegated to external services to ensure offline capability.

Trade-offs

Pros:

  • Matches US retail standard — prices on shelf exclude tax
  • 3-level compound model handles all US jurisdictions (State + County + City + special districts)
  • Offline-capable — tax rates cached locally on POS terminal, no API call needed
  • Product-level and customer-level exemptions supported (reseller, non-profit, diplomatic)
  • Future-proof for multi-state expansion (California district taxes, Oregon no-tax)

Cons:

  • More complex than flat-rate tax — must manage jurisdiction-to-location mapping
  • Tax rate changes require admin updates (mitigated by scheduled effective dates)
  • Multi-jurisdiction reporting adds complexity to tax liability reports
  • Not suitable for EU/UK VAT without redesign (acceptable — target market is US)

References

  • Chapter 05: Architecture Components, Section 1.17 (Tax Calculation Engine)
  • Chapter 05: Architecture Components, Section 5.9 (Tax Configuration)
  • Chapter 07: Schema Design (tax_jurisdictions, tax_rates tables)

ADR-023: Compound Tax (3-Level State/County/City)

2.23 ADR-023: Compound Tax (3-Level State/County/City)

FieldValue
StatusAccepted
Date2026-02-27
Decision MakersArchitecture Review Team
ContextThe tax data model must support compound (additive) tax rates at multiple jurisdictional levels.

Context

US sales tax varies by jurisdiction and can consist of multiple additive layers: state tax, county tax, city tax, and sometimes special district surcharges. BRD v20.0 Section 5.9 defines a tax_jurisdictions table (jurisdiction code, name, state) and a tax_rates table with a level enum (STATE, COUNTY, CITY). Each location references a jurisdiction, and at time of sale all active rates for that jurisdiction are summed.

Example: Norfolk, VA = State 4.300% + Regional 0.700% + City 1.000% = 6.000% compound. Northern Virginia adds an additional 0.7% regional rate. Rate changes can be scheduled via effective_date with automatic activation.

Decision

We will implement a 3-level compound tax model using tax_jurisdictions and tax_rates tables. Each jurisdiction can have up to 3 active rate levels (STATE, COUNTY, CITY). Rates are summed at time of sale. Future rates are scheduled via effective_date with background activation.

Considered Options

  1. Single flat rate per location — One rate column on the location table
  2. 2-level (State + Local) — State rate plus a single combined local rate
  3. 3-level compound (State/County/City) — Separate rate rows per level, summed at checkout
  4. N-level with district overlay — Unlimited levels including special taxing districts

Decision Outcome

Chosen: 3-level compound because it covers the vast majority of US jurisdictions without the complexity of unlimited district overlays. The Virginia reference implementation (4 stores across different regions) validates this model. Special districts (California Proposition) can be modeled as a CITY-level rate until N-level support is needed. Unique constraint on (jurisdiction_id, level, effective_date) prevents duplicate rates.

Trade-offs

Pros:

  • Covers all current US jurisdictions (State + County + City covers 95%+ of cases)
  • Scheduled rate changes via effective_date — no manual intervention on tax change dates
  • Preserves historical rates for audit — rate changes never modify existing records
  • Simple SUM query at checkout: SELECT SUM(rate_percent) FROM tax_rates WHERE jurisdiction_id = ? AND is_active = true

Cons:

  • Cannot model California special district overlays (4th+ level) without schema extension
  • Requires admin to configure jurisdiction-to-location mapping per tenant
  • Rate scheduling background job must run reliably at midnight

References

  • Chapter 05: Architecture Components, Section 5.9 (Tax Configuration)
  • Chapter 07: Schema Design (Domain 15: Tax)
  • ADR-022: Tax-Inclusive Display with Compound Calculation

ADR-024: Gift Card Compliance (State Escheatment)

2.24 ADR-024: Gift Card Compliance (State Escheatment)

FieldValue
StatusAccepted
Date2026-02-27
Decision MakersArchitecture Review Team, Legal
ContextGift card management must comply with varying state-level escheatment and consumer protection laws.

Context

Gift cards are subject to state-specific regulations governing expiration, inactivity fees, and mandatory cash-out thresholds. BRD v20.0 Section 1.5 defines a jurisdiction compliance matrix: Virginia allows 5-year minimum expiry and inactivity fees after 12 months; California prohibits expiry, prohibits fees, and mandates cash-out at $10.00; New York prohibits both expiry and fees. The gift card state machine includes INACTIVE, ACTIVE, DEPLETED, EXPIRED, and CASHED_OUT states.

The system must default to the most restrictive rules (California-style: no expiry, no fees, cash-out required) and enable features only where jurisdiction permits.

Decision

We will implement jurisdiction-aware gift card rules that default to the most restrictive configuration (California-style) and enable expiry, fees, and cash-out thresholds per store location’s jurisdiction. The store’s physical location determines which rules apply.

Considered Options

  1. Uniform national policy — Apply the most restrictive state’s rules everywhere (simple but limits flexibility)
  2. Per-jurisdiction rules — Configure rules per state/jurisdiction with most-restrictive defaults
  3. External compliance service — Delegate gift card compliance to a third-party service

Decision Outcome

Chosen: Per-jurisdiction rules with most-restrictive defaults because multi-state retail operations need location-specific compliance. Defaulting to California-style (no expiry, no fees, mandatory cash-out) ensures legal compliance even if jurisdiction configuration is incomplete. Stores in permissive jurisdictions can enable expiry and fees explicitly.

Trade-offs

Pros:

  • Legal compliance across all US jurisdictions from day one
  • Safe defaults — unconfigured jurisdictions use most restrictive rules
  • Cash-out workflow at POS for California compliance (balance <= $10.00)
  • Gift card liability reporting for accounting (outstanding balances = liability)
  • State machine enforces valid transitions (no invalid state changes)

Cons:

  • Jurisdiction rules must be maintained as laws change
  • Cash-out workflow adds complexity to POS checkout flow
  • Escheatment reporting (unclaimed property) required in some states after dormancy period
  • Gift card liability grows over time — reporting must track aging and dormant cards

References

  • Chapter 05: Architecture Components, Section 1.5 (Gift Card Management)
  • Chapter 05: Architecture Components, Section 1.5.2 (Jurisdiction Compliance Matrix)
  • Chapter 05: Architecture Components, Module 7 (Gift Card State Machine)

ADR-025: 6-Status Inventory State Machine

2.25 ADR-025: 6-Status Inventory State Machine

FieldValue
StatusAccepted
Date2026-02-27
Decision MakersArchitecture Review Team
ContextInventory at each location needs status tracking beyond simple quantity to manage quality holds, transit, reservations, and damage.

Context

Retail inventory is not simply “in stock” or “out of stock.” BRD v20.0 Section 4.2 defines six inventory statuses with a strict state machine governing transitions: AVAILABLE (sellable), QUARANTINE (quality hold), DAMAGED (cannot sell), PENDING_INSPECTION (received, needs review), RESERVED (allocated to order/transfer), and IN_TRANSIT (moving between locations). Only AVAILABLE stock can be sold at POS or transferred. All status changes require reason codes and are logged to the movement history audit trail.

Decision

We will implement a 6-status inventory state machine where each product-variant-location combination tracks quantity per status. Only AVAILABLE status is sellable. Transitions follow a strict state machine validated at the application layer against a state_transitions reference table.

Considered Options

  1. Binary (in-stock / out-of-stock) — Simple quantity tracking
  2. 3-status (Available / Reserved / Damaged) — Minimal status tracking
  3. 6-status state machine — Full lifecycle with quality management and transit tracking
  4. Continuous status field — Free-form status string (no transition enforcement)

Decision Outcome

Chosen: 6-status state machine because retail clothing operations require quality holds (QUARANTINE for items with potential defects), receiving inspection (PENDING_INSPECTION for new deliveries), reservation management (RESERVED for carts, transfers, online orders), and transit tracking (IN_TRANSIT between locations). Invalid transitions (e.g., QUARANTINE directly to RESERVED) are rejected by the API.

Trade-offs

Pros:

  • Only AVAILABLE stock appears as sellable — prevents selling damaged or quarantined items
  • RESERVED status prevents overselling in multi-terminal, multi-channel environments
  • IN_TRANSIT gives visibility into inventory movement between locations
  • Reason codes on every transition create a complete audit trail
  • State machine prevents invalid transitions (enforced at API and DB level)

Cons:

  • More complex than simple quantity tracking — 6 quantities per product-location instead of 1
  • Staff must understand status meanings and transition rules
  • Reporting must aggregate or filter by status
  • State machine logic adds validation overhead to every inventory operation

References

  • Chapter 05: Architecture Components, Section 4.2 (Inventory Status Model)
  • Chapter 05: Architecture Components, Module 7 (State Machine Reference)
  • Chapter 08: Entity Specifications

ADR-026: Reservation-Based Inventory Hold Model

2.26 ADR-026: Reservation-Based Inventory Hold Model

FieldValue
StatusAccepted
Date2026-02-27
Decision MakersArchitecture Review Team
ContextMultiple terminals, parked transactions, online orders, and transfers all compete for the same inventory. A mechanism is needed to prevent overselling.

Context

BRD v20.0 Section 4.2.2 defines five reservation types: Sale Cart (hard reserve until payment or void), Parked Transaction (soft reserve with 4-hour TTL, overridable with warning), Transfer (hard reserve at source until shipped), Online Order (hard reserve at assigned store), and Hold-for-Pickup (hard reserve with configurable expiry, default 48 hours). When two terminals attempt to reserve the last unit simultaneously, first-commit-wins via database-level row locking.

Decision

We will implement a reservation-based inventory hold model with 5 reservation types, each with its own lifecycle and TTL. Reservations atomically move quantity from AVAILABLE to RESERVED. Concurrent conflicts are resolved by database row locking (first-commit-wins).

Considered Options

  1. No reservation (optimistic) — Check quantity at payment time only, accept oversell risk
  2. Soft reservation with warnings — Show warnings but allow selling through reserved stock
  3. Hard reservation with TTL — Atomic reserve on add-to-cart, auto-release on expiry
  4. Mixed hard/soft by type — Hard for carts and online orders, soft for parked transactions

Decision Outcome

Chosen: Mixed hard/soft by type because sale carts, online orders, and transfers need hard reserves to prevent overselling, while parked transactions benefit from soft reserves (other terminals can sell through with a warning, since parked sales may never be completed). Auto-release via background job (every 5 minutes) prevents inventory from being permanently locked by abandoned sessions.

Trade-offs

Pros:

  • Prevents overselling across multi-terminal, multi-channel environments
  • Parked transaction soft reserve allows override when stock is genuinely needed
  • Auto-release on expiry prevents permanent inventory lockup
  • 5 reservation types cover all business scenarios (sale, park, transfer, online, hold)
  • Database row locking guarantees first-commit-wins under concurrent access

Cons:

  • Reservation management adds overhead to every cart operation (add/remove/void)
  • Background expiry job must run reliably (5-minute interval)
  • Soft reserve override can lead to parked transactions that can’t be recalled (reconciled at recall time)
  • Reservation table grows with transaction volume (mitigated by archival of COMMITTED/RELEASED records)

References

  • Chapter 05: Architecture Components, Section 4.2.2 (Reservation Model)
  • ADR-025: 6-Status Inventory State Machine
  • ADR-002: Offline-First POS Architecture

ADR-027: RFID Counting-Only Scope (No Lifecycle)

2.27 ADR-027: RFID Counting-Only Scope (No Lifecycle)

FieldValue
StatusAccepted
Date2026-02-27
Decision MakersArchitecture Review Team
ContextRFID integration scope must be defined — either counting-only or full lifecycle tracking (sales, transfers, receiving).

Context

BRD v20.0 Section 5.16 explicitly scopes RFID as a “dedicated inventory counting subsystem.” RFID readers (Zebra MC3390R, RFD40, FX9600) are used for bulk inventory counting and auditing via the Raptag mobile app. Barcode scanners remain the input device for sales transactions, receiving, and transfers. The rfid_tags table tracks tag status as active, void, or lost — there are no sold_at, transferred_at, or sold_order_id fields.

This separation means RFID and barcode scanning are independent abstractions that coexist: Scanner = barcode (POS register, one-item-at-a-time via USB HID); RFID = counting (Raptag app, 40+ tags/second via radio frequency).

Decision

We will scope RFID to counting and auditing only. RFID does not participate in sales, receiving, or transfer workflows. The core inventory system tracks stock movements via barcode. RFID provides a parallel counting channel for physical inventory verification.

Considered Options

  1. Full RFID lifecycle — Track every tag through sale, transfer, receiving, and returns
  2. Counting-only — RFID for inventory counting and auditing, barcode for all other workflows
  3. Hybrid (phased) — Start with counting, extend to receiving in v2.0

Decision Outcome

Chosen: Counting-only because full lifecycle RFID tracking would require replacing the barcode-based POS checkout flow with RFID readers at every register, fundamentally changing the hardware requirements and staff workflows. Counting-only provides the highest ROI (bulk counts in minutes vs. hours) with minimal disruption to existing barcode-based workflows. Tag status is limited to active, void, lost — no sales or transfer lifecycle fields.

Trade-offs

Pros:

  • Highest ROI — bulk inventory counts (2,000-100,000 items) completed in minutes vs. hours
  • No disruption to existing barcode-based POS, receiving, and transfer workflows
  • Simpler RFID schema — 12 tables vs. potentially 20+ for full lifecycle
  • Raptag mobile app focused on single purpose (counting) with clear UX
  • Scope can be expanded to receiving in v2.0 if business case emerges

Cons:

  • Cannot automatically decrement RFID tag counts on sale (counting snapshot may drift)
  • Receiving workflow still requires barcode scanning (no RFID speed benefit)
  • Two parallel inventory tracking systems (barcode quantity vs. RFID tag count) — reconciliation needed
  • Cannot provide real-time tag location or anti-theft alerts

References

  • Chapter 05: Architecture Components, Section 5.16 (RFID Configuration)
  • Ch 05: Architecture Components, Section 5.16 (RFID Counting) (Raptag Mobile chapter — planned future rewrite)
  • ADR-013: RFID Configuration in Tenant Admin

ADR-028: Physical Count Freeze Period

2.28 ADR-028: Physical Count Freeze Period

FieldValue
StatusAccepted
Date2026-02-27
Decision MakersArchitecture Review Team
ContextDuring physical inventory counts, sales and transfers can change stock levels, causing reconciliation errors.

Context

BRD v20.0 Section 4.6.4 defines two counting modes: FREEZE mode (POS sales blocked at counting location, transfers queued) and SNAPSHOT mode (operations continue normally, system reconciles movements post-count). FREEZE mode provides highest accuracy for annual audits. SNAPSHOT mode enables counting during business hours without blocking sales. The mode is chosen per count by the manager and cannot be changed after the count starts.

Decision

We will support configurable count freeze with two modes (FREEZE and SNAPSHOT), selected per count session. FREEZE blocks POS sales at the counting location and queues inbound transfers. SNAPSHOT takes a point-in-time inventory snapshot and reconciles against post-count movements.

Considered Options

  1. Always freeze — Block sales during every count (accurate but high business impact)
  2. Never freeze (snapshot only) — Always count during business hours (lower accuracy)
  3. Configurable per count — Manager chooses FREEZE or SNAPSHOT per count session

Decision Outcome

Chosen: Configurable per count because different counting scenarios have different accuracy requirements. Annual full physical counts benefit from FREEZE mode (after hours, maximum accuracy). Weekly cycle counts and monthly scans use SNAPSHOT mode (during hours, minimal disruption). The system defaults to SNAPSHOT; FREEZE must be explicitly selected by MANAGER/OWNER role.

Trade-offs

Pros:

  • Maximum flexibility — manager picks the right mode for each situation
  • FREEZE mode: perfect accuracy, no reconciliation needed
  • SNAPSHOT mode: zero business disruption, counts during peak hours
  • SNAPSHOT reconciliation formula: adjusted_expected = snapshot_qty - sales_during_count + receives_during_count
  • Only MANAGER/OWNER can initiate counts (access-controlled)

Cons:

  • FREEZE mode blocks revenue during the count window (mitigated by off-hours scheduling)
  • SNAPSHOT reconciliation is more complex and has slightly lower accuracy
  • Staff must understand the difference between modes
  • FREEZE mode queues transfers that must be processed after count approval

References

  • Chapter 05: Architecture Components, Section 4.6.4 (Configurable Count Freeze)
  • Chapter 05: Architecture Components, Section 4.6 (Inventory Counting & Auditing)
  • ADR-025: 6-Status Inventory State Machine

ADR-029: Adjustment Manager Approval (Universal)

2.29 ADR-029: Adjustment Manager Approval (Universal)

FieldValue
StatusAccepted
Date2026-02-27
Decision MakersArchitecture Review Team
ContextManual inventory adjustments directly affect stock levels and financial records. A control mechanism is needed.

Context

BRD v20.0 Section 4.7 mandates that all inventory adjustments require manager approval — positive (found stock), negative (shrinkage), and zero-net (reclassification). There is no auto-approval threshold. Adjustments are created with approval_status = PENDING and inventory is NOT changed until a MANAGER or OWNER explicitly approves. Rejected adjustments are preserved for audit. The cost impact (qty_change x weighted_avg_cost) is calculated and shown to the manager before approval.

Decision

We will require universal manager approval for all manual inventory adjustments, regardless of quantity or direction. No threshold-based auto-approval. Inventory quantities change only upon explicit manager approval.

Considered Options

  1. No approval — Staff adjustments apply immediately (fast but no oversight)
  2. Threshold-based — Small adjustments auto-approve, large adjustments require manager
  3. Universal approval — All adjustments require manager review before inventory changes

Decision Outcome

Chosen: Universal approval because inventory accuracy is critical for a multi-store retail operation with financial audit requirements. Even small adjustments can indicate systematic issues (repeated theft, receiving errors). The cost impact display enables managers to make informed decisions. Approved adjustments are logged as ADJUSTMENT_UP or ADJUSTMENT_DOWN movements in the audit trail.

Trade-offs

Pros:

  • Complete management oversight of all inventory changes
  • Cost impact shown before approval — managers see financial consequence
  • PENDING status prevents premature inventory changes
  • Rejected adjustments preserved for audit — pattern analysis possible
  • Standard reason codes + custom tenant-defined codes for categorization

Cons:

  • Manager bottleneck — adjustments may wait for approval (mitigated by push notifications)
  • Additional workflow steps compared to instant adjustments
  • Managers must be responsive to avoid approval backlog
  • No fast-track for trivially small adjustments (by design)

References

  • Chapter 05: Architecture Components, Section 4.7 (Inventory Adjustments)
  • ADR-025: 6-Status Inventory State Machine

ADR-030: Auto-Suggest Transfers Algorithm

2.30 ADR-030: Auto-Suggest Transfers Algorithm

FieldValue
StatusAccepted
Date2026-02-27
Decision MakersArchitecture Review Team
ContextMulti-store retail operations frequently have inventory imbalances — one store overstocked while another is understocked on the same product.

Context

BRD v20.0 Section 4.8.7 defines an auto-suggest transfer algorithm that continuously monitors inventory distribution relative to sales velocity across all locations. The algorithm calculates days of supply per product per location (qty_on_hand / avg_daily_sales_velocity), detects imbalances (one location >60 days of supply, another <15 days), and generates transfer suggestions targeting 30 days of supply at each location. The algorithm runs weekly (configurable) and never creates transfers automatically — all suggestions require manager review.

Decision

We will implement a velocity-based auto-suggest transfer algorithm that analyzes days of supply across locations, generates rebalancing suggestions, and presents them to managers for review and approval. Suggestions that are approved create transfer requests via the standard transfer workflow.

Considered Options

  1. Manual only — Managers identify imbalances and create transfers manually
  2. Rule-based alerts — Alert when stock is below threshold, but no transfer suggestion
  3. Auto-suggest with manager review — Algorithm suggests specific transfers, manager approves
  4. Fully automated — Algorithm creates and ships transfers without human review

Decision Outcome

Chosen: Auto-suggest with manager review because it combines algorithmic efficiency (analyzing hundreds of product-location combinations weekly) with human judgment (manager knowledge of upcoming promotions, seasonal shifts, display requirements). The algorithm provides data-driven starting points; managers adjust quantities and approve or reject.

Trade-offs

Pros:

  • Data-driven rebalancing across all locations — impossible to replicate manually at scale
  • Manager review preserves business judgment (upcoming promotions, seasonal knowledge)
  • Configurable thresholds: overstocked (>60 days), understocked (<15 days), target (30 days)
  • Trailing 30-day sales velocity adapts to changing demand patterns
  • Suggestions expire after 7 days if unreviewed — no stale recommendations

Cons:

  • Algorithm may suggest transfers that conflict with upcoming promotions (mitigated by manager review)
  • Dead stock (zero velocity at both locations) excluded — requires separate manual review
  • HQ warehouse uses different thresholds than stores (90-day overstocked threshold)
  • Weekly batch analysis may miss rapid demand changes (mitigated by on-demand trigger option)

References

  • Chapter 05: Architecture Components, Section 4.8.7 (Auto-Suggest Transfers)
  • Chapter 05: Architecture Components, Section 4.5 (Reorder Management)

ADR-031: Shopify Webhook + Polling Dual Sync

2.31 ADR-031: Shopify Webhook + Polling Dual Sync

FieldValue
StatusAccepted
Date2026-02-27
Decision MakersArchitecture Review Team
ContextShopify inventory and product sync must be near-real-time with guaranteed eventual consistency.

Context

BRD v20.0 Section 6.3 defines Shopify as the primary e-commerce integration. Shopify webhooks provide near-real-time notifications (products/update, inventory_levels/update, orders/create) but can be missed due to network issues, Shopify outages, or endpoint failures. Shopify retries failed webhook deliveries for 48 hours, but delivery is not guaranteed. The platform must guarantee eventual consistency between POS and Shopify inventory.

This ADR formalizes the dual-sync strategy previously captured in ADR-010. The detailed implementation in Section 6.3 specifies OAuth 2.0/PKCE authentication, GraphQL Admin API at 50 points/second rate limiting, and mandatory @idempotent mutations (required April 2026).

Decision

We will use a Webhook + Polling hybrid for Shopify synchronization. Webhooks provide near-real-time sync (<5 seconds processing) for the common case. Scheduled polling (every 15 minutes) using updated_at_min delta queries catches any missed webhooks. Both paths use the same idempotent event handler pipeline (24-hour dedup window).

Considered Options

  1. Pure Webhook — Rely solely on Shopify webhooks for all sync
  2. Pure Polling — Poll Shopify API on intervals for all changes
  3. Webhook + Polling hybrid — Real-time webhooks with polling fallback

Decision Outcome

Chosen: Webhook + Polling hybrid because webhooks alone cannot guarantee delivery, and polling alone introduces unacceptable latency for inventory updates that could cause overselling. The hybrid approach provides <5-second normal latency with guaranteed eventual consistency via 15-minute polling catchup. Idempotent processing prevents double-counting when both webhook and poll detect the same change.

Trade-offs

Pros:

  • Near-real-time sync via webhooks (<5 seconds for the common case)
  • Guaranteed eventual consistency via polling fallback
  • Idempotent processing handles duplicates from webhook + poll overlap
  • Rate-limit-aware polling with adaptive backoff protects against API throttling

Cons:

  • More complex than either pure approach
  • Polling adds API calls against Shopify rate limits (mitigated by delta queries)
  • Webhook endpoint requires HMAC signature verification and retry handling
  • Must maintain webhook registration lifecycle (register on connect, deregister on disconnect)

References

  • Chapter 05: Architecture Components, Section 6.3 (Shopify Integration)
  • ADR-010: Shopify Sync Strategy (foundational decision)
  • Ch 05: Architecture Components, Module 6 (Integrations) (Integration Patterns chapter — planned future rewrite)

ADR-032: Strictest-Rule-Wins Cross-Platform Validation

2.32 ADR-032: Strictest-Rule-Wins Cross-Platform Validation

FieldValue
StatusAccepted
Date2026-02-27
Decision MakersArchitecture Review Team
ContextProducts listed on Shopify, Amazon, and Google Merchant Center must meet each platform’s distinct validation requirements.

Context

BRD v20.0 Section 6.6 defines a unified product data validation matrix comparing field-level requirements across Shopify, Amazon, and Google Merchant Center. Each platform has different constraints — Google limits titles to 150 chars, Amazon requires 1000x1000px minimum images, Shopify is most permissive. The common pattern of “create now, fix later” leads to suppressed listings, disapproved products, and lost revenue.

The strictest-rule-wins principle means: title max 150 chars (Google strictest), image min 1000x1000px (Amazon strictest), no watermarks (Amazon + Google), barcode required (treat as mandatory for channel eligibility), brand required (Amazon + Google).

Decision

We will enforce strictest-rule-wins validation at the point of product data entry in the POS system. Any product passing POS validation is immediately eligible for listing on all connected platforms without remediation. Products failing validation can still be used for in-store POS sales but are blocked from external channel sync.

Considered Options

  1. Per-platform validation at sync time — Validate only when pushing to each platform
  2. Strictest-rule-wins at data entry — Enforce most restrictive requirements for all platforms upfront
  3. Tiered validation — POS-only products have relaxed rules; channel-listed products have strict rules

Decision Outcome

Chosen: Strictest-rule-wins at data entry because it eliminates the expensive “fix after suppression” cycle. Products created correctly the first time avoid listing delays, disapprovals, and the operational cost of chasing validation errors across three platforms. The pre-sync validation engine (PASS/WARN/FAIL) provides clear feedback at product creation time.

Trade-offs

Pros:

  • Any product passing POS validation is immediately listable on all channels
  • Eliminates suppressed listings, disapprovals, and remediation cycles
  • Single validation standard — staff learns one set of rules, not three
  • Pre-sync engine provides actionable PASS/WARN/FAIL feedback with remediation guidance
  • Image validation catches the #1 cause of listing suppression (watermarks, resolution, background)

Cons:

  • POS-only products must meet stricter requirements than necessary (mitigated by channel-listing flag)
  • Requirements may change as platforms update their rules (mitigated by configurable validation matrix)
  • More fields required at product creation (brand, weight, barcode) — slightly longer data entry
  • Google’s 150-char title limit is more restrictive than many retailers want

References

  • Chapter 05: Architecture Components, Section 6.6 (Cross-Platform Product Data Requirements)
  • Chapter 05: Architecture Components, Section 6.6.2 (Image Requirements Matrix)

ADR-033: Amazon SP-API Integration Strategy

2.33 ADR-033: Amazon SP-API Integration Strategy

FieldValue
StatusAccepted
Date2026-02-27
Decision MakersArchitecture Review Team
ContextMulti-channel retail requires Amazon marketplace integration for product listings, order fulfillment, and inventory sync.

Context

BRD v20.0 Section 6.4 defines Amazon integration via the Selling Partner API (SP-API) with OAuth 2.0/LWA authentication, regional endpoints (NA/EU/FE), and support for both FBA (Fulfilled by Amazon) and FBM (Fulfilled by Merchant) fulfillment models. The integration covers catalog items API, listings API, orders API, feeds API, and push notifications (SQS). Amazon SP-API polls every 2 minutes for inventory updates.

Key constraints: rate limits vary by API (5 requests/second for catalog, 1 request/second for feeds), access tokens expire every 1 hour, and per-marketplace pricing is required.

Decision

We will integrate with Amazon SP-API supporting both FBA and FBM fulfillment models, with OAuth 2.0/LWA token lifecycle management, per-marketplace catalog sync, and SQS-based push notifications for order and inventory events.

Considered Options

  1. No Amazon integration — POS-only retail without Amazon marketplace
  2. Amazon MWS (legacy) — Older Marketplace Web Service API (deprecated)
  3. Amazon SP-API — Current Selling Partner API with OAuth 2.0 and modern endpoints
  4. Third-party aggregator — Use a service like ChannelAdvisor to manage Amazon listing

Decision Outcome

Chosen: Direct Amazon SP-API integration because it provides full control over the integration, avoids third-party aggregator fees, and aligns with the provider abstraction architecture (IIntegrationProvider interface). MWS is deprecated. The POS backend handles token lifecycle transparently — proactive refresh at T-5 minutes before expiry, fallback force-refresh on 401 responses.

Trade-offs

Pros:

  • Full control over catalog, listing, order, and inventory sync
  • FBA + FBM support — tenants choose fulfillment model per product
  • SQS push notifications reduce polling overhead for order/inventory events
  • OAuth 2.0/LWA aligns with modern authentication standards
  • Per-marketplace support (US, CA, MX under NA region)

Cons:

  • Complex API with different rate limits per endpoint
  • Token management complexity (1-hour expiry, proactive refresh)
  • Amazon-specific field mappings (Browse Node taxonomy, product type definitions)
  • Amazon Brand Registry requirements add complexity for branded products
  • FBA inventory is read-only (Amazon manages stock) — requires separate monitoring

References

  • Chapter 05: Architecture Components, Section 6.4 (Amazon SP-API Integration)
  • Ch 05: Architecture Components, Module 6 (Integrations) (Integration Patterns chapter — planned future rewrite)
  • ADR-032: Strictest-Rule-Wins Cross-Platform Validation

ADR-034: Google Merchant Center Feed Strategy

2.34 ADR-034: Google Merchant Center Feed Strategy

FieldValue
StatusAccepted
Date2026-02-27
Decision MakersArchitecture Review Team
ContextGoogle Shopping and Local Inventory Ads require product data feeds managed via the Google Merchant API.

Context

BRD v20.0 Section 6.5 defines Google Merchant Center integration for product data management, local inventory advertising, and Google Business Profile linkage. CRITICAL: The Content API for Shopping reaches end-of-life on August 18, 2026 — all new development MUST target the Merchant API (v1beta/v1). Google uses OAuth 2.0 with service accounts (self-signed JWTs exchanged for 60-minute access tokens).

The Merchant API separates writes (ProductInput resource) from reads (Product resource — Google-enriched version after validation). Disapproval prevention is critical as Google can suspend product listings for policy violations.

Decision

We will target the Merchant API (v1beta/v1) from day one, with OAuth 2.0 service account authentication, outbound product feed management, and local inventory advertising. No development against the deprecated Content API.

Considered Options

  1. Content API (v2.1) — Current API but reaching EOL August 2026
  2. Merchant API (v1beta/v1) — New API, long-term supported
  3. Supplemental feed only — Use Google’s automated crawl with supplemental data
  4. Third-party feed manager — Delegate to GoDataFeed, DataFeedWatch, etc.

Decision Outcome

Chosen: Direct Merchant API integration because Content API EOL is August 2026 (within the platform’s launch timeline), the Merchant API provides new features (local inventory, GBP integration) only available on the new API, and direct integration avoids third-party feed manager costs. Service account auth with tenant-specific encryption keys aligns with the credential vault architecture.

Trade-offs

Pros:

  • Future-proof — no migration needed when Content API shuts down
  • Local Inventory Ads support for brick-and-mortar stores
  • Google Business Profile linkage for “available nearby” search results
  • Product status API enables proactive disapproval monitoring and remediation
  • Service account auth avoids user-interactive OAuth flows

Cons:

  • Merchant API is still in v1beta — minor API changes possible before GA
  • Google processing adds 30-minute latency to inventory updates
  • Product disapproval rules are complex and change frequently
  • Service account JSON key management adds security complexity (AES-256-GCM encrypted at rest)
  • 2x daily batch sync cadence for Google (vs. near-real-time for Shopify)

References

  • Chapter 05: Architecture Components, Section 6.5 (Google Merchant API Integration)
  • ADR-032: Strictest-Rule-Wins Cross-Platform Validation

ADR-035: Channel Safety Buffer Calculation

2.35 ADR-035: Channel Safety Buffer Calculation

FieldValue
StatusAccepted
Date2026-02-27
Decision MakersArchitecture Review Team
ContextExternal channels sync inventory with varying latency (Shopify <5s, Amazon <2min, Google <30min). During sync gaps, concurrent sales can cause overselling.

Context

BRD v20.0 Section 6.7.2 defines safety buffers that withhold a configurable number of units from external channel listings. The primary formula is: Channel Available Qty = POS Available Qty - Safety Buffer. Three buffer modes are supported: FIXED (subtract fixed units), PERCENTAGE (subtract % of stock), and MIN_RESERVE (floor-based). Buffers are configurable per-product, per-channel, with a 4-level priority resolution: product+channel > product > channel > tenant-wide default.

Recommended defaults: Shopify 0-2 units (low latency), Amazon FBM 5-10% (2-minute lag), Google Merchant 10-15% (30-minute processing delay).

Decision

We will implement configurable safety buffers per product per channel with 3 calculation modes and 4-level priority resolution. Higher-latency channels receive larger default buffers to compensate for sync lag.

Considered Options

  1. No buffers — List full POS quantity on all channels (highest oversell risk)
  2. Flat global buffer — Same buffer for all channels and products
  3. Per-channel default buffers — Different buffer per channel, same for all products
  4. Per-product per-channel configurable — Full flexibility with priority resolution

Decision Outcome

Chosen: Per-product per-channel configurable because sync latency varies dramatically between channels (Shopify <5s vs. Google <30min), and high-velocity products need different buffers than slow movers. The 4-level priority resolution enables tenants to set sensible defaults while overriding for specific products or channels. min_channel_qty threshold hides products from channel when available falls below minimum (default: 1).

Trade-offs

Pros:

  • Tunable oversell protection per channel based on sync latency
  • High-velocity products can have larger buffers than slow movers
  • max_channel_qty cap prevents revealing full warehouse stock to competitors
  • Priority resolution (product+channel > product > channel > tenant) minimizes configuration effort
  • Walk-in customer stock protected — buffers ensure in-store availability

Cons:

  • Configuration complexity — many possible combinations of product x channel x mode
  • Buffers reduce listed quantity — may lose online sales if set too aggressively
  • Buffer calculations add overhead to every inventory sync event
  • Must recalculate buffers when POS quantity changes (event-driven)

References

  • Chapter 05: Architecture Components, Section 6.7.2 (Safety Buffer Configuration)
  • Chapter 05: Architecture Components, Section 6.7.3 (Oversell Prevention Rules)
  • ADR-031: Shopify Webhook + Polling Dual Sync

ADR-036: POS-Master Default for External Channels

2.36 ADR-036: POS-Master Default for External Channels

FieldValue
StatusAccepted
Date2026-02-27
Decision MakersArchitecture Review Team
ContextWhen product data conflicts exist between POS and external channels, a source-of-truth must be defined.

Context

BRD v20.0 Section 6.1 establishes that the POS system is the “single source of truth for product data.” All external channels (Shopify, Amazon, Google Merchant) receive product catalog and inventory levels from the POS system. No external channel can directly modify POS inventory — all inbound changes are processed through the sync engine with conflict resolution. Section 6.7 states: “Auto-correction pushes POS quantity to the platform (POS always wins in reconciliation).”

Decision

We will use POS-master default where the POS system is the authoritative source for product data and inventory levels. External channels receive computed quantities. During reconciliation, discrepancies between POS and channel-reported quantities are resolved by pushing the POS value to the channel.

Considered Options

  1. POS-master — POS is source of truth, channels receive data from POS
  2. Channel-master — Each channel is source of truth for its own data
  3. Bidirectional merge — Changes from any source merged via conflict resolution
  4. Last-write-wins — Most recent change from any source wins

Decision Outcome

Chosen: POS-master because the physical store is where inventory physically exists. The POS system tracks every stock movement (sale, return, adjustment, transfer, count, receiving) with a complete audit trail. External channels may report stale or incorrect quantities due to sync delays, customer cancellations, or platform glitches. POS-master ensures one source of truth for financial reporting and inventory accuracy.

Trade-offs

Pros:

  • Single source of truth — no ambiguity about correct inventory levels
  • Reconciliation is deterministic — POS always wins, no merge conflicts
  • Financial reports based on POS data (auditable, event-sourced)
  • Protects against external platform data corruption or unauthorized changes
  • Simplifies sync architecture — one-way authority, bidirectional data flow

Cons:

  • Shopify admin inventory adjustments are overwritten at next reconciliation
  • Staff must make all inventory changes in the POS system, not in external platforms
  • If POS data is incorrect, the error propagates to all channels
  • External-only inventory (e.g., FBA stock managed by Amazon) must be handled as read-only exception

References

  • Chapter 05: Architecture Components, Section 6.1 (Integration Overview)
  • Chapter 05: Architecture Components, Section 6.7 (Cross-Platform Inventory Sync)
  • ADR-035: Channel Safety Buffer Calculation

ADR-037: Offline Conflict Resolution via CRDTs

SUPERSEDED: This ADR has been superseded by ADR-048 (Online-First with Offline Fallback). CRDTs were eliminated in v6.2.0. This record is preserved for historical context.

2.37 ADR-037: Offline Conflict Resolution via CRDTs

FieldValue
StatusSuperseded (by ADR-048)
Date2026-02-27
Decision MakersArchitecture Review Team
ContextWhen multiple POS terminals operate offline simultaneously, their local changes must merge without data loss when connectivity is restored.

Context

Chapter 04, Section L.10A.1H defines CRDTs (Conflict-free Replicated Data Types) as the merge strategy for offline POS terminals. The traditional sync problem: Terminal A sells 5 units offline (local: 95), Terminal B receives shipment +20 offline (local: 120) — neither 95 nor 120 is correct; the answer is 115. CRDTs solve this by tracking operations, not state.

Four CRDT types are used: PN-Counter for inventory levels (+/-), LWW-Register for price updates (highest timestamp wins), OR-Set for cart items and discounts (union with tombstones), and G-Counter for transaction counts (sum all increments). Sales themselves are conflict-free by nature (append-only events with unique IDs).

Decision

We will use CRDTs for offline data merge alongside append-only event queuing for sales. PN-Counters track inventory, LWW-Registers track last-modified data, OR-Sets track collections, and G-Counters track monotonic counts. This is complementary to ADR-015 (Queue-and-Sync).

Considered Options

  1. Last-write-wins globally — Most recent change overwrites all others
  2. Server-authoritative — Server state overwrites all offline changes
  3. Operational transforms (OT) — Transform operations based on concurrent edits
  4. CRDTs — Mathematically guaranteed convergence without coordination

Decision Outcome

Chosen: CRDTs because they are mathematically guaranteed to converge regardless of message ordering, duplication, or network partition duration. PN-Counters are particularly suited for inventory (sum increments, sum decrements, compute net). Sales events are inherently conflict-free (append-only with unique IDs), so CRDTs complement rather than replace event sourcing.

Trade-offs

Pros:

  • Mathematically guaranteed convergence — no coordination required between terminals
  • PN-Counter correctly handles concurrent sales and receives (example: 100 - 5 + 20 = 115)
  • LWW-Register handles price updates with deterministic resolution (highest timestamp)
  • OR-Set handles cart item additions/removals with tombstone-based conflict resolution
  • No data loss — all offline operations are preserved and merged

Cons:

  • CRDT implementation adds complexity to the sync layer
  • PN-Counters can temporarily show incorrect inventory until all terminals sync
  • OR-Set tombstones require periodic compaction (7-day TTL)
  • Development team must understand CRDT semantics for correct implementation
  • MV-Register (for customer preferences) keeps all concurrent values — may need manual resolution

References

  • Chapter 04: Architecture Styles, Section L.10A.1H (CRDTs)
  • ADR-015: Offline Sync Strategy (Queue-and-Sync with CRDTs)
  • ADR-002: Offline-First POS Architecture

ADR-038: Transactional Outbox for Event Publishing

2.38 ADR-038: Transactional Outbox for Event Publishing

FieldValue
StatusAccepted
Date2026-02-27
Decision MakersArchitecture Review Team
ContextDomain events must be reliably published to downstream consumers (Socket.io, webhooks, sync engine) without losing events or creating inconsistency.

Context

Chapter 04, Section L.4A defines the Transactional Outbox pattern: business data and outbox event are written atomically in the same database transaction. A relay process polls the outbox and publishes events, guaranteeing at-least-once delivery without distributed transactions. The event_outbox table (Section L.4A.1) stores: event_id, destination (socketio/webhook/sync), status (pending/processed), attempts, last_error, and timestamps.

This eliminates the dual-write problem: if the application writes to the database but the event publish fails (or vice versa), data and events become inconsistent. The outbox ensures both succeed or both fail within the same DB transaction.

Decision

We will use a Transactional Outbox pattern with a PostgreSQL event_outbox table. Domain events are written to the outbox in the same transaction as the business data. A background relay polls the outbox and publishes events to destinations (Socket.io rooms, webhook endpoints, sync engine).

Considered Options

  1. Publish-then-write — Publish event first, then write to database (lost data if DB write fails)
  2. Write-then-publish — Write to database first, then publish event (lost events if publish fails)
  3. Distributed transaction (2PC) — Coordinate DB and message broker atomically (complex, slow)
  4. Transactional Outbox — Write data + event in same DB transaction, relay publishes asynchronously

Decision Outcome

Chosen: Transactional Outbox because it guarantees at-least-once delivery using only the existing PostgreSQL database — no additional message broker infrastructure required for v1.0. The outbox relay runs as a background service, polling every 1 second for pending events. Failed publications are retried with exponential backoff and eventually routed to a dead-letter table.

Trade-offs

Pros:

  • Atomic write — business data and event are guaranteed consistent
  • No additional infrastructure — uses PostgreSQL (already deployed)
  • At-least-once delivery with retry and dead-letter handling
  • Destinations are pluggable (Socket.io, webhook, sync, future Kafka)
  • Works with PostgreSQL LISTEN/NOTIFY for low-latency relay notification

Cons:

  • Polling relay adds slight latency vs. direct publish (~1 second)
  • Outbox table grows and needs periodic cleanup (processed events archived)
  • At-least-once means consumers must be idempotent (handled by idempotency framework)
  • Single relay process is a potential bottleneck (mitigated by partition-based relay in v2.0)

References

  • Chapter 04: Architecture Styles, Section L.4A.1 (Event Store & Outbox Schema)
  • Chapter 05: Architecture Components, Section 6.2.3 (Transactional Outbox)
  • ADR-003: Event Sourcing for Sales Domain

ADR-039: CQRS Boundary (Sales Domain Only)

2.39 ADR-039: CQRS Boundary (Sales Domain Only)

FieldValue
StatusAccepted
Date2026-02-27
Decision MakersArchitecture Review Team
ContextCQRS adds complexity; it should be applied only where the read/write model divergence justifies the overhead.

Context

Chapter 04, Section L.4A defines per-module CQRS scope. Module 1 (Sales) uses full CQRS with separate read/write models and Event Sourcing. Module 4 (Inventory) uses materialized read models with ES for audit trail. Modules 2 (Customers), 3 (Catalog), 5 (Setup) use standard CRUD. Module 6 (Integrations) uses audit-trail-only ES. The Sales domain has the strongest case for CQRS: financial audit requirements, offline sync via event replay, temporal queries (“what was inventory at 3pm?”), and complex read models (dashboard aggregations).

Decision

We will apply full CQRS only to the Sales domain (Module 1). All other modules use standard CRUD with optional materialized views for performance. A command/query bus dispatches commands and queries in the Sales module.

Considered Options

  1. CQRS everywhere — Full CQRS for all modules
  2. CQRS for Sales only — Full CQRS for Sales, CRUD for everything else
  3. CQRS for Sales + Inventory — Full CQRS for both financial domains
  4. No CQRS — Standard CRUD everywhere with audit logging

Decision Outcome

Chosen: CQRS for Sales only because Sales has the strongest requirement (PCI-DSS audit trail, offline event replay, complex read models for dashboards). Inventory uses a lighter pattern — materialized read models for current levels with ES for the movement audit trail, but not full CQRS command/query separation. Applying CQRS everywhere would add unnecessary complexity to simple CRUD modules like Customers and Setup.

Trade-offs

Pros:

  • Full audit trail and temporal queries for the financial domain (Sales)
  • Command/query bus dispatch provides clean separation of concerns
  • Read models optimized for dashboard queries without affecting write performance
  • Non-Sales modules remain simple CRUD — lower development and maintenance cost
  • Event replay capability for offline sync and debugging

Cons:

  • Developers must understand two patterns (CQRS for Sales, CRUD for others)
  • Read model projections must be rebuilt if projection logic changes
  • Event versioning adds complexity for Sales domain events
  • Boundary between CQRS and CRUD modules must be clearly documented

References

  • Chapter 04: Architecture Styles, Section L.4A (CQRS & Event Sourcing Scope)
  • ADR-003: Event Sourcing for Sales Domain
  • ADR-038: Transactional Outbox for Event Publishing

ADR-040: Eventual Consistency SLA (5s Online, 30min Offline)

2.40 ADR-040: Eventual Consistency SLA

FieldValue
StatusAccepted
Date2026-02-27
Decision MakersArchitecture Review Team
ContextThe platform accepts eventual consistency for inventory sync. Concrete SLA targets are needed for each sync channel.

Context

Chapter 04, Section L.10A.1 establishes that the online-first architecture accepts eventual consistency for inventory sync across channels. BRD v20.0 Section 6.7.1 defines per-channel sync latency targets: Shopify <5 seconds via webhooks, Amazon FBM <2 minutes via SP-API push, Google Merchant <30 minutes (Google processing time). Reconciliation polls run at defined intervals (Shopify 15min, Amazon 30min, Google 6hr). POS terminals operating in offline fallback mode sync critical data within 30 seconds of connectivity restoration.

Decision

We will define explicit eventual consistency SLAs per sync channel with target latencies, reconciliation intervals, and maximum acceptable lag. Online POS terminals have 5-second consistency targets; offline terminals sync critical data within 30 seconds of reconnection.

Considered Options

  1. Strong consistency — All changes immediately visible everywhere (requires always-online)
  2. Best-effort eventual — No defined SLA, sync when possible
  3. Tiered SLA per channel — Explicit targets per sync channel and data priority

Decision Outcome

Chosen: Tiered SLA per channel because different channels have fundamentally different latency characteristics and business impact. Shopify needs near-real-time to prevent overselling; Google Merchant tolerates 30-minute processing; offline POS terminals prioritize sales/payment sync over analytics. The SLA framework provides measurable targets for monitoring and alerting.

Trade-offs

Pros:

  • Measurable sync targets for monitoring and SLA alerting
  • Priority-based sync ensures critical financial data (sales, payments) syncs first
  • Channel-specific targets match actual platform capabilities
  • Reconciliation intervals catch drift before it becomes operationally significant

Cons:

  • Inventory counts may be temporarily inaccurate across channels during sync windows
  • Overselling possible during sync gaps (mitigated by safety buffers — ADR-035)
  • Monitoring infrastructure needed to track sync latency per channel
  • Offline sync queue may grow large during extended outages (capped at 100 transactions)

References

  • Chapter 04: Architecture Styles, Section L.10A.1 (Online-First with Offline Fallback)
  • Chapter 05: Architecture Components, Section 6.7.1 (Sync Latency Targets)
  • ADR-048: Online-First POS Data Strategy
  • ADR-035: Channel Safety Buffer Calculation

ADR-041: 6-Gate Security Pyramid

2.41 ADR-041: 6-Gate Security Pyramid

FieldValue
StatusAccepted
Date2026-02-27
Decision MakersArchitecture Review Team, Security Team
ContextThe codebase is generated by Claude Code agents. A single security gate is insufficient for AI-generated code that processes financial transactions.

Context

Chapter 04, Section L.8 identifies that AI-generated code requires defense-in-depth security validation. A single SonarQube gate cannot catch missing authorization checks, incorrect OAuth implementation, SAQ-A violations, architecture drift, or insecure CORS/CSP headers. The platform processes PCI-scoped financial transactions and stores encrypted credentials for 6 external provider families.

The 6-Gate Security Pyramid provides layered verification: SAST (Gate 1), SCA + SBOM (Gate 2), Secrets Detection (Gate 3), Architecture Conformance (Gate 4), Contract Tests (Gate 5), and Manual Security Review (Gate 6). All 6 gates block merge. FIM via Wazuh monitors deployed systems.

Decision

We will implement a 6-Gate Security Test Pyramid in the CI/CD pipeline where all 6 gates must pass before code can be merged to the main branch.

Considered Options

  1. Single SAST gate — SonarQube/CodeQL only
  2. SAST + SCA — Static analysis plus dependency scanning
  3. Cloud security suite — Snyk/Datadog full platform (vendor-dependent)
  4. 6-Gate Pyramid — Layered security with SAST, SCA, Secrets, ArchUnit, Pact, Manual

Decision Outcome

Chosen: 6-Gate Pyramid because each gate catches different vulnerability classes that others miss. SAST finds code-level bugs; SCA finds vulnerable dependencies; Secrets Detection finds leaked credentials; Architecture Conformance prevents module boundary violations; Contract Tests verify external API behavior; Manual Review covers security-critical paths that automated tools cannot fully validate. All gates are merge-blocking.

Trade-offs

Pros:

  • Defense-in-depth — 6 independent verification layers
  • SBOM generation (Gate 2) satisfies PCI-DSS 4.0 Req 6.3.2
  • Architecture Conformance (Gate 4) prevents Module 6 from accessing Module 1 internals
  • Contract Tests (Gate 5) verify Shopify/Amazon/Google sandbox API behavior
  • Manual Review (Gate 6) provides human oversight for payment and credential flows

Cons:

  • 6 gates add CI/CD pipeline time (mitigated by parallel execution of Gates 1-4)
  • Manual Review (Gate 6) creates human bottleneck for security-tagged PRs
  • Must maintain ArchUnit rules and Pact contracts as system evolves
  • Tooling cost: SonarQube, Snyk, GitLeaks, ArchUnit, Pact licenses

References

  • Chapter 04: Architecture Styles, Section L.8 (Security & Compliance Strategy)
  • Ch 04: Architecture Styles, Section L.8 (Security) (Security Compliance chapter — planned future rewrite)
  • ADR-019: SAQ-A Semi-Integrated Payment Scope

ADR-042: [REMOVED — Duplicate of ADR-017]

This ADR was removed in v6.1.0. The E2E testing strategy (Playwright + k6) is fully covered by ADR-017: Test Strategy (Layered Testing Pyramid). Consolidating to avoid duplicated guidance.


ADR-043: [REMOVED — Duplicate of ADR-012]

This ADR was removed in v6.1.0. The LGTM Observability Stack is fully covered by ADR-012: Logging & Monitoring (LGTM Stack). Consolidating to avoid duplicated guidance.


ADR-044: API Performance Targets

2.44 ADR-044: API Performance Targets

FieldValue
StatusAccepted
Date2026-02-27
Decision MakersArchitecture Review Team
ContextThe POS API must meet specific latency targets to ensure responsive checkout and withstand peak retail traffic.

Context

Chapter 04, Section L.6 defines the Black Friday load testing scenario: 500 concurrent users, 1000 TPS target, p99 latency <500ms over 30 minutes. The API Gateway processes requests through 5 stages: rate limiting (100 req/min/client), JWT authentication, tenant resolution, request logging, and route dispatch. Redis caching provides sub-millisecond reads for product catalog, tax rates, and tenant configuration during checkout.

The POS checkout path is latency-critical: cashiers expect instant response to item scan, price lookup, and payment initiation. Non-checkout paths (reporting, configuration) have relaxed targets.

Decision

We will define explicit API performance targets with p99 latency budgets per endpoint category, validated by k6 load testing in CI/CD.

Performance Targets:

Endpoint Categoryp99 TargetMeasurement
Checkout (item scan, payment)< 500msk6 load test
Product lookup (cached)< 100msRedis cache hit
Inventory query< 200msMaterialized read model
Reporting / Dashboard< 2sAcceptable for non-interactive
Webhook processing< 5sShopify, Amazon inbound

Considered Options

  1. No defined targets — Optimize as needed based on user complaints
  2. Single global target — One latency target for all endpoints
  3. Tiered targets by category — Different targets for checkout vs. reporting vs. webhook

Decision Outcome

Chosen: Tiered targets by category because checkout latency directly impacts cashier productivity and customer experience, while reporting and dashboard queries are inherently slower and non-blocking. The k6 load testing framework (ADR-017) validates these targets on every release candidate.

Trade-offs

Pros:

  • Clear, measurable targets for development teams
  • k6 load tests enforce targets in CI/CD — performance regressions caught before deployment
  • Redis caching ensures sub-100ms product lookups during checkout
  • Tiered approach avoids over-engineering low-priority endpoints

Cons:

  • Must maintain k6 test scripts as API evolves
  • Load testing requires dedicated environment (resource cost)
  • Targets may need revision as user base scales
  • p99 targets require careful measurement methodology (warm-up periods, steady state)

References

  • Chapter 04: Architecture Styles, Section L.6 (Load Testing)
  • Chapter 04: Architecture Styles, Section L.9A (System Architecture)
  • ADR-009: Redis for Session & Cache
  • ADR-017: Test Strategy (Layered Testing Pyramid)

ADR-045: Blue-Green Deployment Strategy

2.45 ADR-045: Blue-Green Deployment Strategy

FieldValue
StatusAccepted
Date2026-02-27
Decision MakersArchitecture Review Team, Infrastructure Team
ContextDeployments of the Central API must not disrupt active POS terminals, ongoing payment transactions, or integration sync operations.

Context

The Architecture Styles Review (Upload/Architecture-Styles-Review.md, Finding HIGH-5) identified that no deployment strategy was specified. A failed deployment could break inventory sync across all channels for all tenants. BRD Section 6.7.5 mandates channel freeze after 2 hours of sync failure. The modular monolith architecture means the entire Central API deploys as a single unit — a failed deployment affects all modules simultaneously.

Database migration rollback, integration freeze procedures, and health check-based automatic rollback are required for safe deployments.

Decision

We will use blue-green deployment with automatic rollback on health check failure. The load balancer switches traffic from the current (blue) environment to the new (green) environment only after health checks pass. If health checks fail, traffic automatically routes back to blue.

Considered Options

  1. Rolling update — Gradually replace instances (risk of mixed-version routing)
  2. Canary deployment — Route small percentage to new version, gradually increase
  3. Blue-green deployment — Full parallel environment with instant switchover
  4. Feature flags only — Deploy code but gate features behind flags

Decision Outcome

Chosen: Blue-green deployment because the modular monolith deploys as a single unit, making canary (partial routing) complex without microservices boundaries. Blue-green provides instant rollback by switching the load balancer back to the previous environment. Database migrations must be backward-compatible (expand-then-contract pattern) so both blue and green can run against the same database schema during transition.

Trade-offs

Pros:

  • Instant rollback — switch load balancer back to previous environment
  • Zero-downtime deployment — green environment validated before receiving traffic
  • Health check validation before cutover (API, database connectivity, Redis, integration endpoints)
  • Full environment parity — green runs the same infrastructure as blue
  • Simplifies post-deployment verification — green serves all traffic or none

Cons:

  • Requires 2x infrastructure during deployment window (cost)
  • Database migrations must be backward-compatible (expand-then-contract)
  • Long-running transactions during switchover may be interrupted
  • POS terminal WebSocket (Socket.io) connections must reconnect after switchover
  • Integration webhook endpoints must handle brief unavailability during DNS propagation

References

  • Architecture Styles Review, Finding HIGH-5 (deployment strategy gap)
  • Chapter 04: Architecture Styles, Section L.9A (System Architecture)
  • Ch 04: Architecture Styles, Section L.9A (System Architecture) (Deployment Guide chapter — planned future rewrite)

ADR-046: Nexus Dual Deployment Architecture (Tauri Desktop + Web App)

Superseded by ADR-052: The dual deployment architecture (Tauri desktop + separate web admin) has been replaced by a unified React web application. “Nexus POS” is now a single web app with role-based navigation. Hardware peripherals use web protocols (Star WebPRNT, USB HID, Stripe Terminal SDK) instead of Tauri Rust commands. See ADR-052.

2.46 ADR-046: Nexus Dual Deployment Architecture

FieldValue
StatusSUPERSEDED (by ADR-052: Unified Web Application)
Date2026-02-28
Decision MakersArchitecture Review Team
ContextThe platform needs both a desktop POS application (store terminals with hardware access and offline capability) and a web-based admin interface (browser-based management). Previously these were separate applications with separate codebases (ADR-007: Blazor Server admin, ADR-008: .NET MAUI POS). This created duplicated UI development and inconsistent UX.

Context

The platform requires two deployment targets for its user interface: (1) a desktop POS application running on store terminals with hardware access (receipt printers, barcode scanners, cash drawers), offline-first SQLite storage, and sync capability; and (2) a web-based administration interface for tenant managers to configure products, employees, locations, integrations, and view reports from any browser.

Previously, these were planned as separate applications with separate codebases (ADR-007: Blazor Server for Admin Portal, ADR-008: .NET MAUI for POS Client). This approach would have duplicated UI components, state management logic, and design system implementation across two different frameworks.

With the tech stack pivot to TypeScript (ADR-006), both targets can share a single React/TypeScript codebase — deployed as a Tauri 2.0 desktop app for POS terminals and as a standard React web app for admin browser access.

Decision

We will use a single React/TypeScript codebase deployed in two modes:

  • Nexus POS (Tauri 2.0 desktop): For store terminals. Includes hardware access (printers, scanners, drawers via Tauri commands), local SQLite database (better-sqlite3), offline-first capability with sync queue.
  • Nexus Admin (React web app): For administrator browser access. Standard React SPA served via CDN or Central API static hosting. No hardware access needed, always-online, connects directly to Central API.

Product naming: Nexus POS (desktop), Nexus Admin (web), Nexus Raptag (mobile RFID).

Considered Options

  1. Separate codebases — Different frameworks for desktop (Tauri) and web (Next.js/React)
  2. Single codebase with dual deployment — Same React app, Tauri wraps for desktop, deployed as web for admin
  3. Desktop-only with remote access — All users including admins use Tauri app

Decision Outcome

Chosen: Single codebase with dual deployment because it reduces UI development by ~40%, ensures consistent UX between POS and admin, and shares all components, routing, and state management. Hardware-dependent features are abstracted behind isTauri() runtime checks. Admin-only and POS-only routes use role-based code splitting.

Trade-offs

Pros:

  • Single React component library — design once, deploy twice
  • Shared state management (React Query + Zustand) across both targets
  • Consistent UX — admin and POS share visual language
  • Conditional hardware features via Tauri API detection (window.__TAURI__)
  • One design system (TailwindCSS + shadcn/ui or Radix UI)
  • Shared authentication flow — JWT tokens work identically in both targets

Cons:

  • Must carefully abstract hardware-dependent code behind feature checks
  • Some admin-only views (reporting, user management) not needed on POS (managed via route-based code splitting and lazy loading)
  • Tauri-specific Rust commands need separate build pipeline alongside TypeScript
  • Web and desktop share the same online-first data strategy (React Query → Central API) but desktop adds a thin offline fallback (2-table SQLite: product cache + sales queue). The abstraction layer detects connectivity state and routes transparently (see ADR-048).

Implementation Risks

  1. Testing surface doubles — Every data hook needs testing in both Tauri and web mode. Mitigation: CI runs test suite with isTauri() mocked to both true and false.
  2. Feature drift — POS gets hardware features, Admin gets reporting dashboards, shared codebase becomes conditional-heavy. Mitigation: Route-based code splitting, lazy loading, shared components must never import platform-specific code.
  3. Offline cache staleness — SQLite product cache (2 tables) may have stale prices during brief outages. Mitigation: Flag-on-sync detects price discrepancies; cache shows last_refreshed warning after 1 hour offline (see ADR-048, L.10A.1E).
  4. Tauri Rust command maintenance — Custom Rust commands for hardware access need Rust-capable developers. Mitigation: Limit custom commands to thin wrappers; most hardware access via established Tauri plugins.

Supersedes

  • ADR-007: Admin Portal Framework (Blazor Server) — the separate Admin Portal has been eliminated; administration is now integrated into the Nexus web application
  • ADR-013: RFID Configuration in Tenant Admin Portal — the “Admin Portal” concept has been replaced by Nexus Admin; RFID configuration is accessed via Nexus Admin > Settings > RFID section

References

  • ADR-008: POS Client Framework (Tauri 2.0 + React/TypeScript)
  • ADR-006: Node.js + TypeScript for Central API
  • ADR-047: Raptag Mobile Framework (React Native)
  • ADR-048: Online-First POS Data Strategy

ADR-047: Raptag Mobile Framework — React Native

2.47 ADR-047: Raptag Mobile Framework

FieldValue
StatusAccepted
Date2026-02-28
Decision MakersArchitecture Review Team
ContextThe Nexus Raptag RFID counting app runs on Android mobile devices with Zebra RFID readers (TC21/TC26 with RFD40 sleds). It requires Zebra RFID SDK integration, offline-first SQLite storage, barcode scanning, and sync with the Central API. With the tech stack pivot to TypeScript (ADR-006), the mobile framework should align with the unified language strategy.

Context

The Nexus Raptag mobile app is a dedicated RFID inventory counting application used by store staff on Android handheld devices (Zebra TC21/TC26 with RFD40 RFID sleds). It requires integration with the Zebra RFID SDK for bulk tag reading (40+ tags/second), offline-first SQLite storage for counting sessions, barcode scanning for item lookup, and background sync with the Central API for uploading count results.

With the platform standardized on TypeScript (Central API via Node.js, Nexus POS via React per ADR-052), the mobile framework should maintain the unified language strategy to enable code sharing and reduce developer context-switching.

Decision

We will use React Native with Expo for the Nexus Raptag mobile RFID app.

Considered Options

  1. .NET MAUI — Cross-platform .NET mobile framework (rejected: different language ecosystem from TypeScript stack)
  2. React Native + Expo — TypeScript-based mobile framework with native module support (chosen)
  3. Flutter — Dart-based cross-platform framework (rejected: Dart language breaks TypeScript unity)
  4. Kotlin native — Android-only native development (rejected: no code sharing with web/desktop)

Decision Outcome

Chosen: React Native with Expo because it maintains TypeScript as the unified language across the entire platform, enables sharing of business logic, domain types, and validation schemas with the Central API via npm packages, and provides Expo OTA updates for pushing RFID configuration changes to field devices without app store review cycles.

Trade-offs

Pros:

  • Unified TypeScript — same types, validators (Zod), and API client shared with Central API
  • Expo OTA updates — critical for deploying RFID configuration changes to field devices without app store review
  • Shared npm packages — domain models, API types, validation schemas reused across all platform clients
  • React component patterns familiar to Nexus POS developers — reduced learning curve
  • Hot reload during development — fast iteration on scanning UI
  • React Native New Architecture (Fabric + TurboModules) provides near-native performance for scanning UI

Cons:

  • Zebra RFID SDK bridge requires native Java/Kotlin module maintenance
  • React Native performance adequate for scanning UI but not compute-heavy tasks (acceptable — RFID counting is I/O-bound)
  • Expo may need ejection for deep Zebra hardware integration (Expo Dev Client handles this without full ejection)
  • Larger APK size than pure native (~30MB vs ~10MB) — acceptable for enterprise devices

References

  • ADR-027: RFID Counting-Only Scope
  • ADR-052: Unified Web Application (Nexus POS)
  • Ch 05: Architecture Components, Section 5.16 (RFID Counting Subsystem)

ADR-048: Online-First POS Data Strategy

2.48 ADR-048: Online-First POS Data Strategy

FieldValue
StatusAccepted
Date2026-03-01
Decision MakersArchitecture Review Team
ContextThe Blueprint originally specified offline-first architecture for POS terminals (ADR-002) — 6-table SQLite cache, CRDTs for conflict resolution, sync queue with priority tiers, platform-aware data hooks. Through structured analysis of the target retail environment, this was found to create daily complexity for a scenario (internet outages) that occurs minutes per year.

Context

The POS platform’s data architecture must address two concerns: (1) how POS terminals access product, inventory, and customer data during normal operation; and (2) how sales continue during internet outages.

ADR-002 chose an offline-first strategy where all POS operations run against a local SQLite database first, with background sync to the Central API. This required 6 SQLite tables, CRDTs for conflict resolution, platform-aware data hooks (useLocalFirst() vs useAPI()), sync priority tiers, and complex conflict resolution logic.

Analysis of the target market revealed:

  • Internet outages are rare and brief (minutes per year) for target tenants
  • Near real-time sync (1-5 minutes) is required for Shopify inventory accuracy
  • Immediate config propagation is expected (admin saves → POS reflects within seconds)
  • Integration flows (Shopify, Amazon, Google Merchant) are simpler when all data flows through the Central API in real-time
  • The offline-first architecture doubles the testing surface (every data hook tested in both Tauri and web mode) and introduces daily consistency gaps (stale caches, integration timing) for a scenario that barely occurs

Decision

We will use an online-first data strategy for POS terminals, with a thin offline safety net:

  • ONLINE (99.99% of time): Nexus POS reads/writes directly to the Central API via React Query. WebSocket push delivers real-time config updates and inventory changes. React Query’s in-memory cache provides instant lookups for recently-scanned products.
  • OFFLINE (rare, brief): Nexus POS falls back to a 2-table SQLite WASM store (sql.js/wa-sqlite + OPFS) — a read-only product cache for pricing lookups and an append-only sales queue for transactions. Sales never stop.

Nexus POS uses React Query → Central API for all data access. The SQLite WASM offline fallback is a thin safety net — not a separate data layer.

Product names: Nexus POS (unified web app, ADR-052), Nexus Raptag (mobile RFID — retains full offline-first per ADR-047, as RFID counting sessions are legitimately disconnected).

Considered Options

  1. Keep offline-first (ADR-002 status quo) — 6-table SQLite, CRDTs, platform-aware hooks, sync priority tiers
  2. Online-first with thin offline fallback — React Query → API, 2-table SQLite WASM (product cache + sales queue), no CRDTs
  3. Online-only (no offline capability) — reject sales during outages; simplest but unacceptable for retail

Decision Outcome

Chosen: Online-first with thin offline fallback because it optimizes for the 99.99% online case (immediate data consistency, simpler integrations, unified data access) while preserving sales continuity for the rare offline scenario. Eliminates CRDTs, reduces SQLite from 6 to 2 tables, removes platform-aware data hooks, and simplifies the sync layer from priority-tiered event queue to simple FIFO sales flush. SQLite runs in the browser via WASM (sql.js/wa-sqlite) with OPFS for persistence.

3-State Connection Monitor

The POS terminal uses a layered detection system to determine connectivity state:

StateDetectionBehavior
ONLINEWebSocket connected + health ping OKAll reads/writes → Central API via React Query
DEGRADEDWebSocket dropped, health ping intermittentReads: try API (2s timeout) → fallback to SQLite cache. Writes: API + local backup queue
OFFLINE3 consecutive health pings fail (~15 sec)Reads: SQLite product cache. Writes: local sales queue only

Detection layers (fastest → most reliable):

  1. Socket.io events — instant connect/disconnect signals
  2. Health ping — HTTP GET /health every 5 seconds (catches stale WebSocket)
  3. navigator.onLine — Browser API (instant hint, verified by health ping)

The DEGRADED state prevents rapid flapping between ONLINE and OFFLINE during spotty internet. Components never observe the connection state directly — the data access layer routes transparently.

SQLite Schema (2 Tables)

product_cache — Read-only, server-authoritative (SQLite WASM via OPFS):

  • Pre-warmed on Nexus POS startup (full catalog download in background)
  • Updated incrementally via WebSocket push events (product.updated, product.created)
  • Never written to by Nexus POS — only by sync from Central API
  • Includes last_refreshed timestamp for staleness detection

sales_queue — Append-only, offline transactions (SQLite WASM via OPFS):

  • Written only during OFFLINE/DEGRADED states
  • Each sale has a UUID (sale_id) for idempotent processing
  • Flushed to Central API on recovery (FIFO order, oldest first)
  • API uses sale_id for upsert — safe to retry partial flushes

Recovery Sequence

When connectivity restores (OFFLINE → DEGRADED → ONLINE):

  1. Flush sales queue — POST each queued sale to Central API (oldest first, in order)
  2. Idempotent processing — API uses sale_id UUID for upsert; partial retries are safe
  3. Wait for confirmations — each sale acknowledged before moving to next
  4. Refresh product cache — prices/inventory may have changed during outage
  5. Resume WebSocket — re-subscribe to real-time push events
  6. Switch to API mode — data layer routes all reads/writes through Central API

Cashier sees: status indicator transitions from red (offline) → yellow (flushing) → green (online). No manual action required.

Price Discrepancy Handling (Flag-on-Sync)

When offline sales sync, the API compares sale.unit_price against product.current_price:

  • If prices match: sale accepted normally
  • If prices differ: sale accepted but flagged as price_discrepancy: true with sold_price, current_price, and difference recorded
  • Admin sees a “Price Discrepancies” alert in Nexus POS (MANAGER+ role) with options to issue a credit or dismiss
  • Additional safeguard: if the product cache is older than 1 hour during offline mode, the POS shows a subtle banner: “Product data may be outdated”

Shopify Inventory Protection (Safety Buffers)

The existing BRD safety buffer mechanism (Section 6.x) protects against overselling during outages:

  • Channel Available = POS Available - Safety Buffer
  • Buffer absorbs the discrepancy window during brief outages (configurable per product category, default 2-3 units)
  • On recovery: queue flush → inventory adjusted → integration layer immediately pushes corrected counts to Shopify/Amazon/Google Merchant
  • Overselling requires: outage + in-store sales exceeding buffer + simultaneous online sales on same SKU (extremely unlikely given minutes/year outages)

Trade-offs

Pros:

  • Eliminates CRDTs — no two-way merge needed (cache is read-only, queue is append-only)
  • Reduces SQLite from 6 tables to 2 — dramatically simpler local schema
  • Removes platform-aware data hooks — Nexus POS uses React Query → API uniformly
  • Simplifies sync from priority-tiered event queue to simple FIFO sales flush
  • Real-time config propagation — admin changes reflected on POS within seconds via WebSocket
  • Simpler integration flows — all data flows through Central API; Shopify/Amazon see real-time inventory
  • Reduced testing surface — single web deployment target, no platform-conditional code paths for data access

Cons:

  • API latency affects scan speed when online (mitigated by React Query in-memory cache + Redis server-side cache; <200ms for simple lookups)
  • Central API becomes a hard dependency when online (mitigated by 3-state fallback; SQLite WASM takes over within 15 seconds)
  • Product cache may be stale during offline periods (mitigated by flag-on-sync + staleness warning)
  • Does not support extended offline operation (hours/days) as gracefully as offline-first (accepted: target market has reliable internet)
  • SQLite WASM has slightly higher overhead than native better-sqlite3 (acceptable for the offline fallback use case; OPFS provides persistence)

Note on Raptag: This ADR does not change the Nexus Raptag mobile app’s data strategy. RFID counting sessions are legitimately disconnected (warehouse floor, intermittent device connectivity) and retain full offline-first per ADR-047.

Supersedes

  • ADR-002: Offline-First POS Architecture — replaced by online-first with offline fallback

References

  • ADR-002: Offline-First POS Architecture (superseded)
  • ADR-052: Unified Web Application (Nexus POS is now a single React web app)
  • ADR-047: Raptag Mobile Framework (React Native) — retains offline-first
  • Ch 04: Architecture Styles, Section L.10A.1 (Online-First with Offline Fallback)
  • Ch 05: Architecture Components, Section 6.x (Safety Buffers for Channel Inventory)

ADR-049: Real-Time Transport — Socket.io

2.49 ADR-049: Real-Time Transport — Socket.io

FieldValue
StatusAccepted
Date2026-03-01
Decision MakersArchitecture Review Team
ContextThe 3-state connection monitor (ONLINE/DEGRADED/OFFLINE from ADR-048) needs a real-time transport for server-push updates, connection health heartbeats, and multi-device coordination. Nexus POS is a unified web application (ADR-052).

Context

ADR-048 defines a 3-state connection monitor (ONLINE/DEGRADED/OFFLINE) that requires a real-time transport for: (1) server-push price and inventory updates to Nexus POS, (2) connection health heartbeats that feed the DEGRADED state detection, and (3) multi-device coordination such as register locking (preventing two users from operating the same register simultaneously). The transport must work reliably in retail network environments and gracefully handle intermittent connectivity.

Decision

We will use Socket.io with WebSocket as the primary transport and HTTP long-polling as the automatic fallback.

Considered Options

  1. Socket.io — Bidirectional, room-based, auto-reconnect, transport fallback
  2. Server-Sent Events (SSE) — Unidirectional server-to-client push over HTTP
  3. Raw WebSocket — Native browser WebSocket API without abstraction layer
  4. Long Polling — Periodic HTTP requests simulating real-time

Decision Outcome

Chosen: Socket.io because it provides bidirectional communication (needed for register lock/unlock commands from server to client), built-in reconnection with exponential backoff (critical for 3-state monitor DEGRADED detection), room-based broadcasting (per-tenant, per-location event routing), and automatic transport fallback (WebSocket → HTTP long-polling) for restrictive network environments.

Trade-offs

Pros:

  • Bidirectional — server can push updates AND send commands (register lock, force-logout, config refresh)
  • Built-in reconnection with exponential backoff — feeds directly into ADR-048’s 3-state connection monitor
  • Room-based broadcasting — events routed per-tenant and per-location without client-side filtering
  • Automatic transport fallback — WebSocket → HTTP long-polling handles corporate firewalls and proxy servers
  • Mature ecosystem — well-tested with Node.js/Express, Redis adapter for horizontal scaling

Cons:

  • Socket.io client dependency adds ~50KB to the client bundle
  • Sticky sessions required if horizontal scaling (mitigated by Redis adapter: @socket.io/redis-adapter)
  • Not a standard protocol — custom framing on top of WebSocket (mitigated by widespread adoption and tooling)

References

  • ADR-048: Online-First POS Data Strategy (3-state connection monitor)
  • ADR-052: Unified Web Application (Nexus POS)
  • Ch 04: Architecture Styles, Section L.9A (System Architecture)

ADR-050: Prisma Migrate with Custom RLS Policies

2.50 ADR-050: Prisma Migrate with Custom RLS Policies

FieldValue
StatusAccepted
Date2026-03-01
Decision MakersArchitecture Review Team
ContextPrisma ORM provides Prisma Migrate for schema management but has no native understanding of PostgreSQL Row-Level Security (RLS) policies required by ADR-001.

Context

Prisma ORM (selected as part of the ADR-046 tech stack) provides Prisma Migrate for schema management — generating migration files from schema changes, tracking migration history, and applying migrations in order. However, Prisma has no native understanding of PostgreSQL Row-Level Security (RLS) policies (ADR-001). Every tenant-scoped table requires both standard DDL (CREATE TABLE, indexes) and custom RLS SQL (CREATE POLICY, ENABLE ROW LEVEL SECURITY). These must be created and updated together as part of the same migration workflow.

Decision

We will use Prisma Migrate for schema DDL with custom SQL migration files for RLS policies. Tenant provisioning uses a dedicated service that: (1) runs Prisma Migrate for schema, (2) executes RLS policy SQL scripts, and (3) seeds tenant configuration.

Considered Options

  1. Prisma Migrate + custom SQL files — Prisma handles DDL, companion .sql files handle RLS
  2. Raw SQL migrations only — Skip Prisma Migrate, manage all DDL and RLS in hand-written SQL
  3. Prisma Migrate with $executeRaw in seed scripts — RLS policies applied outside the migration system
  4. Third-party migration tool (dbmate, golang-migrate) — Replace Prisma Migrate entirely

Decision Outcome

Chosen: Prisma Migrate + custom SQL files because it preserves Prisma’s schema diffing, TypeScript type generation, and migration history while accommodating RLS policies that Prisma cannot generate. Each migration that adds a tenant-scoped table includes a companion RLS policy file in the same migrations folder.

Implementation pattern:

  • Standard Prisma schema changes generate migration SQL via prisma migrate dev
  • Developer adds a companion SQL file in the same migration folder for RLS policies
  • CI validation checks every table with tenant_id has a corresponding RLS policy
  • Prisma Client middleware calls SET LOCAL app.current_tenant = $tenantId on every connection via $queryRaw

Trade-offs

Pros:

  • Preserves Prisma’s schema diffing, type generation, and migration history tracking
  • RLS policies live alongside the DDL migrations they relate to (co-located, not scattered)
  • CI can validate RLS coverage: every table with tenant_id must have a corresponding policy
  • Prisma Client middleware provides a clean interception point for SET LOCAL app.current_tenant
  • Migration rollback includes both DDL and RLS changes

Cons:

  • Every new tenant-scoped table requires both a Prisma schema change AND a custom RLS SQL file (discipline needed)
  • Prisma Migrate does not track or diff the custom SQL files — developer must remember to add them
  • Custom SQL files are not reflected in the Prisma schema (RLS is invisible to schema.prisma)
  • SET LOCAL app.current_tenant must be called on every connection — forgetting breaks isolation

References

  • ADR-001: Shared Tables with Row-Level Security Multi-Tenancy
  • ADR-052: Unified Web Application (Prisma ORM selection, originally ADR-046)
  • Ch 04: Architecture Styles, Section L.10A.4 (Multi-Tenancy)
  • Ch 06: Database Strategy

ADR-051: State Management — React Query (Server) + Zustand (Client)

2.51 ADR-051: State Management — React Query + Zustand

FieldValue
StatusAccepted
Date2026-03-01
Decision MakersArchitecture Review Team
ContextTwo client applications (Nexus POS web app, Nexus Raptag mobile) need state management for both server-fetched data and local UI state under the ADR-048 online-first architecture.

Context

Two client applications (Nexus POS unified web app per ADR-052, Nexus Raptag mobile per ADR-047) need state management for both server-fetched data and local UI state. The ADR-048 online-first architecture means most state comes from the Central API — product data, inventory, customer records, and configuration are all server-authoritative. However, the Nexus POS active cart must survive brief offline periods without losing items, and UI state (connection status, preferences, active modals) is purely local.

The state management solution must clearly separate server state (cached API responses) from client state (cart, UI, connection status) to avoid the common pitfall of treating all state identically.

Decision

We will use React Query (TanStack Query) for all server state and Zustand for client-only state. Clear boundary: if data exists on the server, use React Query; if data is local-only or must survive offline, use Zustand with optional persistence.

Considered Options

  1. React Query + Zustand — Dedicated server-state cache + lightweight client-state store
  2. Redux Toolkit (RTK Query + slices) — Unified state management with built-in API cache
  3. React Context API + custom hooks — Built-in React state with no external dependencies
  4. Jotai / Recoil — Atomic state management libraries

Decision Outcome

Chosen: React Query + Zustand because React Query eliminates manual fetch/cache/retry code for the 80% of state that comes from the API, while Zustand provides a minimal, unopinionated store for the 20% that is local-only. The two libraries have no overlap and no conflict.

State boundary rule: “If it has a REST endpoint, use React Query. If it’s local-only, use Zustand.”

React Query manages:

  • Product catalog, inventory levels, customer records (cached API responses)
  • Background refetch, optimistic updates, retry logic, pagination
  • Stale-while-revalidate for instant UI with background freshness checks

Zustand manages:

  • Active cart items (must survive DEGRADED state without losing items mid-sale)
  • Register UI state (active modal, selected tab, sidebar collapsed)
  • 3-state connection monitor status (ONLINE/DEGRADED/OFFLINE from ADR-048)
  • User preferences (theme, receipt format, default payment method)
  • Cart persistence: Zustand persist middleware writes to localStorage (Nexus POS web) or AsyncStorage (Nexus Raptag mobile). On reconnection, cart syncs via API

Trade-offs

Pros:

  • React Query handles caching, background refetch, optimistic updates, retry, pagination — eliminates manual fetch/cache code
  • Zustand is ~1KB, no boilerplate, no reducers, no actions — just a function that returns state
  • Clear separation prevents the “everything in Redux” anti-pattern
  • Cart items survive DEGRADED/OFFLINE states via Zustand persist middleware
  • Both libraries are TypeScript-first with excellent type inference

Cons:

  • Two state libraries to learn (mitigated by clear boundary rule and small Zustand API surface)
  • Cart items live in Zustand during a sale but must be persisted to the server on sale completion via React Query mutation (two-step)
  • Zustand persist middleware uses localStorage which has ~5MB limit (sufficient for cart state, not for catalog)

References

  • ADR-048: Online-First POS Data Strategy
  • ADR-052: Unified Web Application (Nexus POS)

ADR-052: Unified Web Application (Nexus POS)

2.52 ADR-052: Unified Web Application

FieldValue
StatusAccepted
Date2026-03-02
Decision MakersArchitecture Review Team
ContextADR-046 defined dual deployment (Tauri desktop + React web). Analysis revealed target retailers use standard PCs/tablets with Chrome/Edge, hardware peripherals work via web protocols, and the “Admin Portal” vs “POS Terminal” split creates artificial product complexity when role-based routing achieves the same outcome.

Context

ADR-046 established a dual deployment architecture: “Nexus POS” as a Tauri 2.0 desktop application for store terminals with native hardware access, and “Nexus Admin” as a React web application for browser-based administration. Both shared a single React/TypeScript codebase but required separate build pipelines, platform-conditional code (isTauri() checks), and Rust-based hardware integration for the desktop variant.

Analysis of the target market (small-to-medium multi-location retailers) revealed:

  1. Standard hardware: Target retailers use commodity PCs and tablets running Chrome or Edge — not dedicated POS terminals requiring native desktop wrappers
  2. Web-based peripherals: Modern receipt printers (Star Micronics, Epson) expose HTTP/WebSocket APIs (Star WebPRNT, Epson ePOS SDK); barcode scanners operate as USB HID keyboard wedge devices; cash drawers connect to receipt printers via kick-out cables; payment terminals use browser-compatible SDKs (Stripe Terminal)
  3. Artificial product split: The “Nexus POS” vs “Nexus Admin” distinction created two product names for what is functionally one application with different role-based views. A CASHIER needs the sales terminal; a MANAGER needs reports and configuration; both use the same codebase
  4. Build complexity: Tauri requires a Rust build pipeline, WebView2 dependency management, and platform-specific installers — overhead for minimal benefit when web deployment achieves the same outcome

Decision

We will deploy a single React/TypeScript web application called “Nexus POS”. Users see different menus and pages based on their assigned roles (OWNER, MANAGER, CASHIER, BUYER, AUDITOR). There is no separate “Nexus Admin” product.

Product names: Nexus POS (web app), Nexus Raptag (mobile RFID, unchanged per ADR-047).

Hardware integration via web protocols:

  • Receipt Printers: Star WebPRNT (HTTP POST to printer’s built-in web server) or Epson ePOS SDK (WebSocket). ESC/POS commands sent over network — no native access required.
  • Barcode Scanners: USB HID keyboard wedge — scanner outputs keystrokes captured by standard keydown event listeners. Works identically in any browser.
  • Cash Drawers: Connected to receipt printer via RJ-11 kick-out cable. Drawer opens when receipt printer sends ESC/POS drawer-open command. Solved automatically when receipt printing is solved.
  • Payment Terminals: Stripe Terminal JavaScript SDK communicates with Verifone/WisePOS reader over local network. Browser-native, SAQ-A compliant (no card data touches our system).

Offline storage: SQLite WASM (sql.js/wa-sqlite) with OPFS for browser-persistent storage. Same 2-table schema from ADR-048 (product_cache + sales_queue), same 3-state connection monitor — different runtime (WASM instead of native better-sqlite3).

Considered Options

  1. Keep dual deployment (Tauri + web) — Maintain ADR-046 architecture with isTauri() conditional code
  2. Unified web application — Single React SPA, role-based routing, web-based hardware integration
  3. Progressive Web App (PWA) — Web app with service worker for offline, installable on desktop
  4. Electron — Desktop wrapper with full Node.js access (rejected: 150MB+ bundle, Chromium overhead)

Decision Outcome

Chosen: Unified web application because it eliminates the Rust build pipeline, removes platform-conditional code, unifies the product naming, and uses web-standard hardware protocols that work across all modern browsers. The SQLite WASM runtime provides the same offline fallback capability as native better-sqlite3 with slightly higher overhead (acceptable for the rare offline scenario).

Trade-offs

Pros:

  • Single build pipeline — React + Vite, no Rust compilation
  • No platform-conditional code — eliminates isTauri() checks and all window.__TAURI__ detection
  • Unified product name — “Nexus POS” for all users regardless of role
  • Role-based navigation — CASHIER sees sales terminal, MANAGER sees dashboard + reports, OWNER sees configuration
  • Web-standard hardware — Star WebPRNT and USB HID work across Chrome, Edge, Firefox
  • Instant deployment — CDN-served SPA, no desktop installer distribution
  • Simpler testing — single deployment target, no dual-mode test matrix

Cons:

  • SQLite WASM has ~2-3x overhead vs native better-sqlite3 (acceptable: offline fallback is rare, performance-critical path is online API access)
  • Web Serial API and WebUSB have limited browser support (Firefox) — mitigated by targeting Chrome/Edge which dominate enterprise retail
  • No offline application startup — web app requires network to load initially (mitigated: service worker can cache app shell for offline reload)
  • Browser tab can be accidentally closed — no system tray or always-on-top (mitigated: POS terminals use kiosk mode or dedicated browser profile)

Supersedes

  • ADR-046: Nexus Dual Deployment Architecture (Tauri Desktop + Web App)
  • ADR-007: Admin Portal Framework (Blazor Server) — already superseded by ADR-046, now further obsoleted
  • ADR-013: RFID Configuration in Tenant Admin Portal — “Admin Portal” concept fully eliminated

References

  • ADR-008: POS Client Framework (React/TypeScript architecture principles remain valid; Tauri-specific parts superseded)
  • ADR-047: Raptag Mobile Framework (React Native — unchanged)
  • ADR-048: Online-First POS Data Strategy (unchanged; SQLite runtime changes from native to WASM)
  • Ch 04: Architecture Styles, Section L.9A (System Architecture)

How to Propose a New ADR

ADR Proposal Process
====================

1. Copy the ADR template
2. Fill in Context, Decision, Consequences
3. Set Status to "proposed"
4. Submit for architecture review
5. Discuss in architecture meeting
6. Update based on feedback
7. Set Status to "accepted" when approved
8. Add to ADR Index

MADR Template (Markdown Any Decision Records)

We use the MADR (Markdown Any Decision Records) format, which is more comprehensive than the basic ADR format and better suited for complex architectural decisions.

Full MADR Template

# ADR-XXX: [Short Title of Solved Problem and Solution]

## Status

[proposed | accepted | deprecated | superseded by ADR-YYY]

## Date

YYYY-MM-DD

## Decision-Makers

- [Name/Role 1]
- [Name/Role 2]

## Technical Story

[Link to ticket/issue: JIRA-123, GitHub Issue #456]

## Context and Problem Statement

[Describe the context and problem statement, e.g., in free form
using two to three sentences or in the form of an illustrative
story. You may want to articulate the problem in form of a question.]

## Decision Drivers

* [Driver 1, e.g., a force, facing concern, …]
* [Driver 2, e.g., a force, facing concern, …]
* [Driver 3, e.g., a force, facing concern, …]

## Considered Options

1. [Option 1]
2. [Option 2]
3. [Option 3]
4. [Option 4]

## Decision Outcome

**Chosen Option**: "[Option X]"

### Justification

[Justification for why this option was chosen. Reference the
decision drivers and explain how this option best addresses them.]

### Positive Consequences

* [e.g., improvement of quality attribute satisfaction, follow-up
  decisions required, …]
* …

### Negative Consequences

* [e.g., compromising quality attribute, follow-up decisions required,
  technical debt introduced, …]
* …

## Pros and Cons of the Options

### [Option 1]

[Example: Schema-per-tenant multi-tenancy]

**Pros:**
* Good, because [argument a]
* Good, because [argument b]

**Cons:**
* Bad, because [argument c]
* Bad, because [argument d]

### [Option 2]

[Example: Row-level multi-tenancy]

**Pros:**
* Good, because [argument a]
* Good, because [argument b]

**Cons:**
* Bad, because [argument c]

### [Option 3]

[Example: Database-per-tenant]

**Pros:**
* Good, because [argument a]

**Cons:**
* Bad, because [argument b]
* Bad, because [argument c]

## Links

* [Link type] [Link to ADR] <!-- example: Refined by ADR-007 -->
* [Link type] [Link to external resource]
* Supersedes ADR-XXX
* Related to ADR-YYY

## Notes

[Any additional notes, discussion points, or future considerations]

MADR Example: Kafka Selection

# ADR-014: Apache Kafka for Event Streaming

## Status

accepted

## Date

2026-01-15

## Decision-Makers

- Architecture Team
- Infrastructure Team

## Technical Story

ARCH-456: Select event streaming platform for POS event sourcing

## Context and Problem Statement

Our POS platform uses event sourcing for the Sales and Inventory
domains. We need an event streaming platform that supports:
- Event replay for new consumers
- Durable storage for audit compliance
- High throughput during peak retail periods (Black Friday)
- Multi-datacenter replication for disaster recovery

Which event streaming platform should we use?

## Decision Drivers

* Replayability - New analytics services must process historical events
* Durability - Events must survive broker failures (PCI compliance)
* Throughput - Handle 10,000+ events/second during peak
* Ecosystem - Good client libraries for .NET
* Operations - Team can manage without dedicated staff

## Considered Options

1. Apache Kafka
2. RabbitMQ with Shovel plugin
3. Amazon Kinesis
4. Redis Streams
5. PostgreSQL LISTEN/NOTIFY

## Decision Outcome

**Chosen Option**: "Apache Kafka (with KRaft mode)"

### Justification

Kafka is the only option that provides true event replayability with
configurable retention. New consumers can start from the beginning
of the log and process all historical events. This is critical for:
- Adding new analytics modules
- Rebuilding projections after bugs
- Audit investigations

KRaft mode eliminates ZooKeeper dependency, simplifying operations.

### Positive Consequences

* Complete replayability for compliance and analytics
* Proven at massive scale (LinkedIn, Uber)
* Strong .NET client (Confluent.Kafka)
* Schema Registry for event versioning

### Negative Consequences

* More complex than RabbitMQ
* Requires understanding of partitioning
* Higher resource usage than simpler queues

## Pros and Cons of the Options

### Apache Kafka

**Pros:**
* Good, because events are retained for configurable duration
* Good, because consumers can replay from any offset
* Good, because it handles 100K+ messages/second
* Good, because KRaft mode simplifies deployment

**Cons:**
* Bad, because it requires more operational knowledge
* Bad, because partition management adds complexity

### RabbitMQ with Shovel

**Pros:**
* Good, because it's simpler to operate
* Good, because team has existing experience

**Cons:**
* Bad, because messages are deleted after consumption
* Bad, because replay requires external archival

### Amazon Kinesis

**Pros:**
* Good, because it's fully managed
* Good, because it has replay capability

**Cons:**
* Bad, because of vendor lock-in
* Bad, because pricing is complex at scale

### Redis Streams

**Pros:**
* Good, because it's simple
* Good, because it's low latency

**Cons:**
* Bad, because durability is limited
* Bad, because it's not designed for long-term storage

### PostgreSQL LISTEN/NOTIFY

**Pros:**
* Good, because no additional infrastructure

**Cons:**
* Bad, because it doesn't scale
* Bad, because messages are ephemeral

## Links

* Refined by ADR-015 (Schema Registry Selection)
* Related to ADR-003 (Event Sourcing for Sales Domain)
* [Kafka Documentation](https://kafka.apache.org/documentation/)

## Notes

Evaluated during Q1 2026 architecture review. Confluent Cloud was
considered but rejected due to cost; self-hosted Kafka preferred.

**UPDATE (v3.0.0)**: Kafka is **deferred to v2.0**. Per the Architecture
Styles analysis (Chapter 04, Section L.4A.2),
v1.0 uses PostgreSQL event tables with LISTEN/NOTIFY for event notification
and Transactional Outbox for guaranteed delivery. This ADR remains valid
for v2.0 planning when scale justifies the Kafka operational overhead.

ADR Tooling & Automation

ToolPurposeInstallation
adr-toolsCLI for creating/managing ADRsbrew install adr-tools
Log4brainsADR documentation site generatornpm install -g log4brains
adr-viewerWeb-based ADR viewerDocker image available

ADR Tools CLI

# Install adr-tools
brew install adr-tools  # macOS
# or
sudo apt install adr-tools  # Ubuntu

# Initialize ADR directory
adr init docs/adr

# Create new ADR
adr new "Use Kafka for Event Streaming"
# Creates: docs/adr/0014-use-kafka-for-event-streaming.md

# Supersede an ADR
adr new -s 3 "Replace Event Sourcing with Outbox Pattern"
# Creates new ADR that supersedes ADR-003

# List all ADRs
adr list

# Generate ADR index
adr generate toc > docs/adr/README.md

Log4brains Integration

Log4brains generates a searchable documentation website from ADRs:

# Install Log4brains
npm install -g log4brains

# Initialize in project
log4brains init

# Start preview server
log4brains preview

# Build static site
log4brains build

# Deploy to GitHub Pages
log4brains build --basePath /pos-platform-adr
# .github/workflows/adr-docs.yml

name: ADR Documentation

on:
  push:
    branches: [main]
    paths:
      - 'docs/adr/**'

jobs:
  build-adr-site:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
        with:
          fetch-depth: 0  # Full history for dates

      - uses: actions/setup-node@v4
        with:
          node-version: '20'

      - name: Install Log4brains
        run: npm install -g log4brains

      - name: Build ADR site
        run: log4brains build --basePath /pos-platform-adr

      - name: Deploy to GitHub Pages
        uses: peaceiris/actions-gh-pages@v3
        with:
          github_token: ${{ secrets.GITHUB_TOKEN }}
          publish_dir: .log4brains/out

ADR Linting

# .github/workflows/adr-lint.yml

name: ADR Lint

on:
  pull_request:
    paths:
      - 'docs/adr/**'

jobs:
  lint-adr:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Validate ADR Format
        run: |
          for file in docs/adr/*.md; do
            # Check required sections
            if ! grep -q "## Status" "$file"; then
              echo "ERROR: $file missing Status section"
              exit 1
            fi
            if ! grep -q "## Context" "$file" && ! grep -q "## Context and Problem Statement" "$file"; then
              echo "ERROR: $file missing Context section"
              exit 1
            fi
            if ! grep -q "## Decision" "$file" && ! grep -q "## Decision Outcome" "$file"; then
              echo "ERROR: $file missing Decision section"
              exit 1
            fi
          done
          echo "All ADRs pass validation"

      - name: Check ADR Numbering
        run: |
          # Ensure sequential numbering
          expected=1
          for file in docs/adr/[0-9]*.md; do
            num=$(basename "$file" | grep -o '^[0-9]*')
            if [ "$num" != "$expected" ]; then
              echo "WARNING: Expected ADR-$expected, found ADR-$num"
            fi
            expected=$((expected + 1))
          done

ADR Review Checklist

# ADR Review Checklist

Before accepting an ADR, verify:

## Structure
- [ ] Uses MADR template
- [ ] Has clear title
- [ ] Status is set correctly
- [ ] Date is current
- [ ] Decision-makers are listed

## Content Quality
- [ ] Context clearly explains the problem
- [ ] Decision drivers are explicit
- [ ] At least 3 options were considered
- [ ] Pros/cons are documented for each option
- [ ] Chosen option justification references drivers

## Completeness
- [ ] Positive consequences listed
- [ ] Negative consequences listed (be honest!)
- [ ] Risks identified
- [ ] Mitigations proposed for risks
- [ ] Links to related ADRs

## Traceability
- [ ] Linked to technical story/ticket
- [ ] References relevant documentation
- [ ] Supersedes/relates to other ADRs if applicable

## Approval
- [ ] Architecture team reviewed
- [ ] Security team reviewed (if applicable)
- [ ] Infrastructure team reviewed (if applicable)

ADR Template


Summary

These Architecture Decision Records capture the foundational technical decisions for the POS Platform:

ADRKey DecisionPrimary Benefit
ADR-001Schema-per-tenant Corrected → Row-Level RLS (Ch 04 L.10A.4)Tenant isolation via tenant_id + PostgreSQL RLS policies
ADR-002Offline-first Superseded by ADR-048Replaced by online-first with offline fallback
ADR-003Event sourcingComplete audit trail and temporal queries
ADR-004JWT + PINSecure API + fast cashier workflow
ADR-005PostgreSQLRLS multi-tenancy and JSONB flexibility
ADR-006Node.js + TypeScript (Central API)Unified TypeScript stack, Prisma ORM, Socket.io
ADR-007Blazor Server Superseded by ADR-046Replaced by Nexus dual deployment (React web app)
ADR-008Tauri 2.0 + React/TypeScript (Nexus POS)Native hardware access, shared React codebase, lightweight binary
ADR-009Redis for session & cacheDistributed session, sub-ms cache, pub/sub
ADR-010Webhook + Polling (Shopify)Real-time sync with fallback consistency
ADR-011SAQ-A Semi-Integrated paymentsMinimal PCI scope, no card data in system
ADR-012LGTM Stack (observability)Open-source, self-hosted, unified dashboards
ADR-013RFID in Admin Portal Superseded by ADR-046RFID config now in Nexus Admin > Settings > RFID
ADR-014Pinned major.minor npm versions with lock fileBuild reproducibility with security patches
ADR-015Queue-and-Sync with CRDTs Superseded by ADR-048Replaced by online-first; CRDTs eliminated
ADR-016ERR-Mxxx error codesStructured, machine-parseable, module-aligned
ADR-017Layered Testing Pyramid (Vitest + Playwright + k6)Fast feedback, real DB tests, contract testing
ADR-018Affirm BNPL IntegrationThird-party financing without PCI scope
ADR-019SAQ-A Semi-Integrated Payment ScopeZero card data in POS, minimal PCI burden
ADR-020Split Tender Payment SupportMultiple payment methods per transaction
ADR-021Layaway Payment PlansState-machine-driven installment lifecycle
ADR-022Tax-Inclusive Display with Compound CalcAccurate 3-level tax, customer transparency
ADR-023Compound Tax (3-Level State/County/City)Jurisdiction-accurate tax computation
ADR-024Gift Card Compliance (State Escheatment)Legal compliance with state unclaimed-property laws
ADR-0256-Status Inventory State MachineDeterministic status transitions, audit trail
ADR-026Reservation-Based Inventory Hold ModelPrevents overselling across channels
ADR-027RFID Counting-Only ScopeFocused RFID value, reduced complexity
ADR-028Physical Count Freeze PeriodData integrity during full counts
ADR-029Adjustment Manager ApprovalShrinkage control, accountability
ADR-030Auto-Suggest Transfers AlgorithmVelocity-based stock redistribution
ADR-031Shopify Webhook + Polling Dual SyncReal-time events with polling fallback
ADR-032Strictest-Rule-Wins ValidationCross-platform compliance by default
ADR-033Amazon SP-API IntegrationMarketplace reach with OAuth 2.0/LWA auth
ADR-034Google Merchant Center FeedProduct visibility before Content API EOL
ADR-035Channel Safety Buffer CalculationPrevents overselling across channels
ADR-036POS-Master Default for ChannelsSingle source of truth for product data
ADR-037Offline Conflict Resolution via CRDTs Superseded by ADR-048Replaced by online-first; CRDTs eliminated
ADR-038Transactional Outbox for EventsReliable event publishing, no dual-write
ADR-039CQRS Boundary (Sales Domain Only)Targeted complexity where value is highest
ADR-040Eventual Consistency SLAPredictable sync guarantees per channel
ADR-0416-Gate Security PyramidAutomated layered security scanning
ADR-042E2E Testing Removed (duplicate of ADR-017)
ADR-043LGTM Observability Removed (duplicate of ADR-012)
ADR-044API Performance TargetsSLA-driven p99 latency budgets
ADR-045Blue-Green Deployment StrategyZero-downtime releases with instant rollback
ADR-046Nexus Dual Deployment Architecture Superseded by ADR-052Replaced by unified web application
ADR-047Raptag Mobile Framework (React Native)Unified TypeScript for RFID mobile app
ADR-048Online-First POS Data StrategyOnline-first API access, 2-table SQLite WASM offline fallback
ADR-049Real-Time Transport — Socket.ioBidirectional push, auto-reconnect, room-based routing
ADR-050Prisma Migrate with Custom RLS PoliciesSchema DDL + companion RLS SQL in same migration
ADR-051State Management — React Query + ZustandServer-state cache + lightweight client-state store
ADR-052Unified Web Application (Nexus POS)Single React web app, role-based navigation, web hardware protocols

These 52 records (43 active, 7 superseded [001, 002, 007, 013, 015, 037, 046], 2 removed [042, 043]) form the architectural foundation upon which the rest of the system is built.


Document Information

AttributeValue
Version7.0.0
Created2025-12-29
Updated2026-03-02
AuthorClaude Code
StatusActive
PartII - Architecture
Chapter02 of 9

Change Log

VersionDateChanges
7.0.02026-03-02Unified Web Application: Added ADR-052 (single React web app replacing Tauri desktop + web admin split). Superseded ADR-046 (Dual Deployment). Updated ADR-048 (better-sqlite3 → SQLite WASM via sql.js/wa-sqlite + OPFS). Updated ADR-008 (Tauri-specific parts superseded by ADR-052). Updated ADR-049 (Socket.io references). Updated ADR-051 (2 apps not 3, localStorage not Tauri). Updated ADR-047 references. Total: 52 records (43 active, 7 superseded [001, 002, 007, 013, 015, 037, 046], 2 removed [042, 043]).
6.3.02026-03-01Online-first consolidation: Superseded ADR-015 (CRDTs) and ADR-037 (CRDT conflict resolution) by ADR-048. Fixed ADR-040 context (offline-first→online-first, ADR-015→ADR-048). Fixed ADR-038 destination (signalr→socketio). Expanded ADR-007 superseded note. Added ADR-049 (Socket.io real-time transport), ADR-050 (Prisma Migrate + RLS), ADR-051 (React Query + Zustand state management). Total: 51 records (43 active, 6 superseded [001, 002, 007, 013, 015, 037], 2 removed [042, 043]).
6.2.02026-03-01Online-first pivot: Added ADR-048 (Online-First POS Data Strategy), superseding ADR-002 (Offline-First). ADR-046 con and Implementation Risk #3 updated for 2-table SQLite. Total: 48 records (42 active, 4 superseded [001, 002, 007, 013], 2 removed [042, 043]).
6.1.02026-02-28Tech stack pivot Phase 1: ADR-006 rewritten (ASP.NET Core → Node.js + TypeScript), ADR-008 rewritten (.NET MAUI → Tauri 2.0 + React), ADR-014 rewritten (NuGet → npm), ADR-017 updated (xUnit → Vitest), ADR-001 corrected (Strategy C → Strategy A RLS), ADR-007 superseded (by ADR-046), ADR-013 superseded (by ADR-046), ADR-042 removed (duplicate of ADR-017), ADR-043 removed (duplicate of ADR-012). Added ADR-046 (Nexus Dual Deployment Architecture) and ADR-047 (Raptag Mobile Framework — React Native). Updated SignalR → Socket.io, MediatR → command/query bus, StackExchange.Redis → ioredis. Total: 47 records (42 active, 3 superseded, 2 removed).
1.0.02025-12-29Initial ADRs (001-006)
2.0.02026-01-01Added ADR-013 (RFID Configuration), MADR template, tooling section
3.0.02026-02-22ADR-001 marked SUPERSEDED (Schema-Per-Tenant replaced by Row-Level RLS per Ch 04 L.10A.4); added Kafka v2.0 deferral note to ADR-014 example (per Ch 04 L.4A.2); fixed Next Chapter link; renumbered chapter references for v3.0.0
5.2.02026-02-27Added 10 new ADRs (007-012, 014-017): Blazor Server (Admin), .NET MAUI Blazor Hybrid (POS), Redis, Shopify Webhook+Polling, SAQ-A Payments, LGTM Stack, NuGet Versioning, Queue-and-Sync CRDTs, ERR-Mxxx Error Codes, Layered Testing Pyramid. Removed Future ADRs table (all now accepted). Updated Summary table with all 17 ADRs.
5.2.12026-02-27Added 28 new ADRs (018-045) covering Payment & Financials, Inventory & Stock Management, Multi-Channel Integration, Data Consistency & Conflict Resolution, and Architecture Patterns & Infrastructure. Sourced from Ch 04 and Ch 05. Total ADRs: 45.

Next Chapter: Chapter 03: Architecture Characteristics


This chapter is part of the POS Blueprint Book. All content is self-contained.