Chapter 02: Architecture Decision Records

Documenting Key Technical Decisions

This chapter documents the major architectural decisions for the POS Platform using Architecture Decision Records (ADRs). Each ADR captures the context, decision, and consequences of a significant technical choice.

What is an ADR?

Architecture Decision Records provide a structured way to document important technical decisions:

ADR Structure
=============

+------------------------------------------------------------------+
|  ADR-XXX: [Title]                                                 |
+------------------------------------------------------------------+
|  Status: [proposed | accepted | deprecated | superseded]         |
|  Date: YYYY-MM-DD                                                 |
|  Deciders: [who made the decision]                               |
+------------------------------------------------------------------+
|                                                                   |
|  CONTEXT                                                          |
|  - What is the issue?                                            |
|  - What forces are at play?                                      |
|  - What constraints exist?                                       |
|                                                                   |
|  DECISION                                                         |
|  - What is the change?                                           |
|  - What did we choose?                                           |
|                                                                   |
|  CONSEQUENCES                                                     |
|  - What are the positive outcomes?                               |
|  - What are the negative outcomes?                               |
|  - What risks are introduced?                                    |
|                                                                   |
+------------------------------------------------------------------+

ADR-001: Shared Tables with Row-Level Security Multi-Tenancy

Note: This ADR originally documented Schema-Per-Tenant (Strategy C) but was corrected to reflect the actual decision: Shared Tables with Row-Level Security (Strategy A). The RLS implementation is detailed in Chapter 04, Section L.10A.4.

+==================================================================+
|  ADR-001: Shared Tables with Row-Level Security Multi-Tenancy    |
+==================================================================+
|  Status: SUPERSEDED (corrected to Row-Level RLS, Ch 04           |
|          Section L.10A.4)                                         |
|  Date: 2025-12-29                                                |
|  Deciders: Architecture Team                                      |
+==================================================================+

CONTEXT
-------
We are building a multi-tenant POS platform that will serve multiple
independent retail businesses. Each tenant needs:

1. Strong data isolation for security and compliance
2. Easy backup and restore of individual tenant data
3. Ability to scale individual tenants independently
4. Compliance with SOC 2 and potential HIPAA requirements
5. Efficient connection pooling across all tenants

We evaluated three multi-tenancy strategies:

  Strategy A: Shared Tables (Row-Level Security)
  - All tenants share tables
  - tenant_id column on every business table
  - PostgreSQL RLS policies enforce isolation

  Strategy B: Separate Databases
  - Each tenant gets own database
  - Complete isolation
  - High connection overhead

  Strategy C: Schema-Per-Tenant
  - Single database, separate schemas
  - SET search_path per request
  - Logical isolation, shared infrastructure

DECISION
--------
We will use SHARED TABLES with ROW-LEVEL SECURITY multi-tenancy
(Strategy A).

Each tenant is identified by a tenant_id column on every business
table. PostgreSQL Row-Level Security (RLS) policies enforce isolation:

  CREATE POLICY tenant_isolation ON <table>
    USING (tenant_id = current_setting('app.current_tenant')::uuid);

The tenant is resolved from the subdomain (e.g., nexus.pos-platform.com)
and SET app.current_tenant is called per request via middleware.

CONSEQUENCES
------------
Positive:
  + Strong data isolation via PostgreSQL RLS (database-enforced)
  + Single schema — migrations apply once, not per-tenant
  + tenant_id enables straightforward cross-tenant analytics
    (platform admin)
  + Standard PostgreSQL feature — no custom middleware risk
  + All tenants share connection pool

Negative:
  - tenant_id required on every business table (discipline needed)
  - Every query must be RLS-aware (mitigated by Prisma middleware)
  - Cross-tenant queries require explicit bypasses
    (SET app.current_tenant = '')
  - Noisy neighbor risk on shared tables (mitigated by index
    partitioning)

Risks:
  - Forgetting tenant_id on new tables breaks isolation
  - RLS policies must be applied to every new table
  - Need robust middleware to always set app.current_tenant

Mitigations:
  - Prisma middleware automatically injects tenant_id on every query
  - CI/CD linter checks all tables for tenant_id + RLS policy
  - Integration tests verify tenant isolation per API endpoint

ADR-002: Offline-First POS Architecture

Superseded: The offline-first approach has been replaced by an online-first with offline fallback strategy. Target retail environments have reliable internet (outages measured in minutes/year). The online-first approach eliminates CRDTs, reduces SQLite from 6 tables to 2, and simplifies integration flows while preserving sales continuity during brief outages. See ADR-048.

+==================================================================+
|  ADR-002: Offline-First POS Architecture                         |
+==================================================================+
|  Status: SUPERSEDED (by ADR-048: Online-First POS Data Strategy) |
|  Date: 2025-12-29                                                |
|  Deciders: Architecture Team                                      |
+==================================================================+

CONTEXT
-------
POS terminals operate in retail environments where network
connectivity is unreliable:

1. Internet outages occur (ISP issues, weather, accidents)
2. WiFi can be congested during peak shopping hours
3. Store networks may have maintenance windows
4. Rural locations may have poor connectivity

A traditional online-required POS would:
- Block sales during outages (lost revenue)
- Show errors during slow connections (poor UX)
- Require manual workarounds (paper receipts)

Business requirements:
- Sales must NEVER be blocked by network issues
- Receipts must print immediately
- Data must eventually sync to central system
- Inventory should be reasonably accurate

DECISION
--------
We will implement OFFLINE-FIRST architecture for POS clients.

Key design elements:
1. Local SQLite database on each POS terminal
2. All operations work against local database first
3. Event queue for pending changes
4. Background sync when connectivity available
5. Conflict resolution for concurrent changes

Data flow:
  User Action -> Local DB -> Event Queue -> [Background] -> Central API

CONSEQUENCES
------------
Positive:
  + Sales never blocked by network issues
  + Instant response time (local operations)
  + Resilient to any connectivity problem
  + Business continues regardless of server status
  + Better user experience for cashiers

Negative:
  - Data is eventually consistent (not immediate)
  - Inventory counts may drift until sync
  - More complex architecture
  - Conflict resolution logic required
  - Local storage management needed

Risks:
  - Data loss if local device fails before sync
  - Inventory overselling possible during outages
  - Conflict resolution edge cases

Mitigations:
  - Aggressive sync when online (every 30 seconds)
  - Local database backup to secondary storage
  - Conservative inventory thresholds
  - Clear offline indicator in UI
  - Deterministic conflict resolution rules

ADR-003: Event Sourcing for Sales Domain

+==================================================================+
|  ADR-003: Event Sourcing for Sales Domain                        |
+==================================================================+
|  Status: ACCEPTED                                                 |
|  Date: 2025-12-29                                                |
|  Deciders: Architecture Team                                      |
+==================================================================+

CONTEXT
-------
The Sales domain has specific requirements that traditional CRUD
does not adequately address:

1. Complete audit trail required (PCI-DSS compliance)
2. Need to answer "what happened?" not just "what is?"
3. Offline clients need conflict-free merge capability
4. Historical analysis (sales trends, patterns)
5. Debugging production issues by replaying events

Traditional CRUD limitations:
- Only stores current state
- Updates overwrite history
- Hard to reconstruct past states
- Audit logs separate from data model

DECISION
--------
We will use EVENT SOURCING for the Sales aggregate.

Implementation:
1. Append-only event store in PostgreSQL
2. Events are the source of truth
3. Read models (projections) for queries
4. Snapshots for performance on long streams

Events captured:
- SaleCreated, SaleLineItemAdded, PaymentReceived, SaleCompleted
- SaleVoided, RefundProcessed
- All inventory changes (InventorySold, InventoryAdjusted)

NOT event-sourced (traditional CRUD):
- Products (read-heavy, infrequent changes)
- Employees (HR data, simple lifecycle)
- Locations (configuration data)

CONSEQUENCES
------------
Positive:
  + Complete audit trail built into data model
  + Temporal queries ("inventory on Dec 15 at 3pm")
  + Offline sync via event merge (append-only = no conflicts)
  + Debugging by event replay
  + Analytics on event streams
  + Natural fit for CQRS pattern

Negative:
  - More complex than CRUD
  - Requires event versioning strategy
  - Projections must be rebuilt if logic changes
  - Storage grows over time (mitigated by snapshots)
  - Learning curve for developers

Risks:
  - Event schema evolution complexity
  - Projection bugs cause stale read models
  - Performance without proper snapshotting

Mitigations:
  - Event versioning from day one
  - Automated projection rebuild process
  - Snapshot every 100 events
  - Clear documentation and training

ADR-004: JWT + PIN Authentication

+==================================================================+
|  ADR-004: JWT + PIN Authentication                               |
+==================================================================+
|  Status: ACCEPTED                                                 |
|  Date: 2025-12-29                                                |
|  Deciders: Architecture Team, Security Team                       |
+==================================================================+

CONTEXT
-------
POS systems have unique authentication requirements:

1. API access needs secure, stateless authentication
2. Cashiers need quick clock-in at physical terminals
3. Sensitive actions need additional verification
4. Multiple employees may share a terminal
5. Terminals may be offline

Requirements:
- Strong authentication for API/Admin access
- Fast authentication for cashiers (< 2 seconds)
- Manager override capability
- Works offline for cashier PIN

Industry standards:
- JWT is standard for API authentication
- PINs are standard for POS quick access
- Password + MFA for Nexus Admin access

DECISION
--------
We will implement a HYBRID authentication system:

1. JWT for API Authentication
   - Nexus Admin uses email + password + optional MFA
   - Issues JWT token (15 min access, 7 day refresh)
   - Standard Bearer token in Authorization header

2. PIN for POS Terminal Access
   - 4-6 digit PIN per employee
   - Stored as bcrypt hash in database
   - Used for: clock-in, sale attribution, drawer access

3. Manager Override
   - Sensitive actions require manager PIN
   - Void, large discount, price override
   - Manager enters their PIN to authorize

4. Offline PIN Validation
   - Employee records with PIN hashes cached locally
   - Validated against local cache when offline
   - Sync employee changes when online

CONSEQUENCES
------------
Positive:
  + Secure API access with industry-standard JWT
  + Fast cashier workflow with PIN
  + Manager oversight on sensitive operations
  + Works offline for POS operations
  + Clear audit trail (who did what)

Negative:
  - Two authentication systems to maintain
  - PIN is less secure than password (brute force)
  - Local PIN cache could be extracted
  - Token refresh complexity

Risks:
  - PIN guessing attacks
  - Stolen JWT tokens
  - Stale employee cache (terminated employee)

Mitigations:
  - Rate limiting on PIN attempts (3 failures = lockout)
  - Short JWT expiry (15 minutes)
  - Aggressive employee sync (every 5 minutes)
  - PIN attempt logging and alerting
  - Secure local storage encryption

ADR-005: PostgreSQL as Primary Database

+==================================================================+
|  ADR-005: PostgreSQL as Primary Database                         |
+==================================================================+
|  Status: ACCEPTED                                                 |
|  Date: 2025-12-29                                                |
|  Deciders: Architecture Team                                      |
+==================================================================+

CONTEXT
-------
We need a database that supports:

1. Row-Level Security multi-tenancy
2. JSONB for flexible event storage
3. Strong ACID guarantees for financial data
4. Good performance at scale
5. Mature ecosystem and tooling

Options considered:
- PostgreSQL: Schema support, JSONB, mature
- MySQL: Popular, but weaker schema support
- SQL Server: Good, but licensing costs
- MongoDB: Document store, no ACID, no schemas
- CockroachDB: Distributed, but complexity

DECISION
--------
We will use POSTGRESQL 16 as the primary database.

Justifications:
1. Native Row-Level Security (RLS) for multi-tenancy isolation
   (Originally: schema support; updated per ADR-001 supersession)
2. Excellent JSONB for event storage
3. Strong ACID for financial transactions
4. Proven at scale (Instagram, Uber, etc.)
5. Rich extension ecosystem (PostGIS, etc.)
6. Open source, no licensing costs
7. Excellent tooling (pgAdmin, pg_dump)

CONSEQUENCES
------------
Positive:
  + Native RLS for multi-tenant data isolation (see ADR-001 supersession)
  + JSONB enables flexible event data
  + Strong consistency guarantees
  + Mature, well-documented
  + No licensing costs
  + Excellent community support

Negative:
  - Single point of failure without replication
  - Requires PostgreSQL expertise
  - Not as horizontally scalable as NoSQL
  - Schema migrations need coordination

Mitigations:
  - Streaming replication for HA
  - Regular backups with pg_dump
  - Team training on PostgreSQL
  - Migration automation tooling

ADR-006: Node.js + TypeScript for Central API

2.6 ADR-006: Central API Framework

Field	Value
Status	Accepted
Date	2026-02-28
Decision Makers	Architecture Review Team
Context	The Central API needs a backend framework that supports high-performance I/O, strong typing, real-time features, and alignment with the frontend TypeScript ecosystem.

Context

The Central API is the backbone of the POS platform — serving REST endpoints for sales, inventory, customers, reporting, admin/setup, and integrations. It must support real-time inventory broadcasts to connected POS terminals, type-safe database access with automatic migrations, and Docker-based deployment on commodity hardware.

With the frontend stack standardized on React/TypeScript (Nexus POS via Tauri, Nexus Admin via web, Nexus Raptag via React Native), selecting a TypeScript-based backend enables a unified language across the entire platform. Shared types, validation schemas (Zod), and API contracts can be published as npm packages consumed by all clients.

Decision

We will use Node.js + Express/Fastify + TypeScript for the Central API.

Considered Options

ASP.NET Core (C#) — High performance, strong typing, EF Core, SignalR
Node.js + Express/Fastify (TypeScript) — Unified TypeScript stack, Prisma ORM, Socket.io
Go (Gin) — Raw performance, small binary, but no type sharing with frontend
Python (FastAPI) — Excellent for ML, but weaker typing and slower I/O
Java (Spring) — Enterprise-grade, but verbose and no frontend code sharing

Decision Outcome

Chosen: Node.js + Express/Fastify + TypeScript because it unifies the entire platform on a single language (TypeScript), enables shared types between API and all client applications via npm packages, provides excellent I/O performance for the database-heavy POS workload, and offers the largest package ecosystem (2M+ npm packages).

Team context: Full TypeScript expertise aligned with React (Nexus POS/Admin) and React Native (Nexus Raptag) frontends. No language context-switching between backend and frontend development.

Trade-offs

Pros:

Unified TypeScript across entire stack — API, Nexus POS (Tauri + React), Nexus Admin (React web), Nexus Raptag (React Native)
Prisma ORM for type-safe PostgreSQL queries with automatic migrations and introspection
Socket.io for real-time inventory broadcasts to connected POS terminals (replaces SignalR)
Massive npm ecosystem (2M+ packages) — battle-tested libraries for every integration need
Excellent Docker support — Alpine Node.js images with small footprint (~50MB)
Same language for frontend and backend eliminates context-switching and enables code sharing
Strong typing via TypeScript catches errors at compile time with strict mode enabled
Shared validation schemas (Zod) and API types published as npm packages

Cons:

Single-threaded event loop — CPU-bound tasks require worker threads (mitigated by worker_threads for report generation)
Less raw compute performance than Go/Rust/C# — acceptable for I/O-bound POS workloads (database queries, Redis lookups, HTTP calls)
Node.js ecosystem moves fast — dependency churn (mitigated by pinned versions and lock file, see ADR-014)

References

Ch 04: Architecture Styles, Section L.9A (System Architecture)
ADR-014: npm Package Versioning (Pinned Major.Minor with Lock File)
ADR-046: Nexus Dual Deployment Architecture

ADR Index

ADR	Title	Status	Date
ADR-001	Shared Tables with Row-Level Security Multi-Tenancy	Superseded (corrected to Row-Level RLS, Ch 04 L.10A.4)	2025-12-29
ADR-002	Offline-First POS Architecture	Superseded (by ADR-048)	2025-12-29
ADR-003	Event Sourcing for Sales Domain	Accepted	2025-12-29
ADR-004	JWT + PIN Authentication	Accepted	2025-12-29
ADR-005	PostgreSQL as Primary Database	Accepted	2025-12-29
ADR-006	Node.js + TypeScript for Central API	Accepted	2026-02-28
ADR-007	Admin Portal Framework (Blazor Server)	Superseded (by ADR-046)	2026-02-27
ADR-008	POS Client Framework (Tauri 2.0 + React/TypeScript)	Accepted	2026-02-28
ADR-009	Redis for Session & Cache	Accepted	2026-02-27
ADR-010	Shopify Sync Strategy (Webhook + Polling)	Accepted	2026-02-27
ADR-011	Payment Gateway (SAQ-A Semi-Integrated)	Accepted	2026-02-27
ADR-012	Logging & Monitoring (LGTM Stack)	Accepted	2026-02-27
ADR-013	RFID Configuration in Tenant Admin	Superseded (by ADR-046)	2026-01-01
ADR-014	npm Package Versioning (Pinned Major.Minor with Lock File)	Accepted	2026-02-28
ADR-015	Offline Sync Strategy (Queue-and-Sync with CRDTs)	Accepted	2026-02-27
ADR-016	Error Code Structure (ERR-Mxxx Hierarchical)	Accepted	2026-02-27
ADR-017	Test Strategy (Layered Testing Pyramid)	Accepted	2026-02-27
ADR-018	Affirm BNPL Integration	Accepted	2026-02-27
ADR-019	SAQ-A Semi-Integrated Payment Scope	Accepted	2026-02-27
ADR-020	Split Tender Payment Support	Accepted	2026-02-27
ADR-021	Layaway Payment Plans	Accepted	2026-02-27
ADR-022	Tax-Inclusive Display with Compound Calculation	Accepted	2026-02-27
ADR-023	Compound Tax (3-Level State/County/City)	Accepted	2026-02-27
ADR-024	Gift Card Compliance (State Escheatment)	Accepted	2026-02-27
ADR-025	6-Status Inventory State Machine	Accepted	2026-02-27
ADR-026	Reservation-Based Inventory Hold Model	Accepted	2026-02-27
ADR-027	RFID Counting-Only Scope (No Lifecycle)	Accepted	2026-02-27
ADR-028	Physical Count Freeze Period	Accepted	2026-02-27
ADR-029	Adjustment Manager Approval (Universal)	Accepted	2026-02-27
ADR-030	Auto-Suggest Transfers Algorithm	Accepted	2026-02-27
ADR-031	Shopify Webhook + Polling Dual Sync	Accepted	2026-02-27
ADR-032	Strictest-Rule-Wins Cross-Platform Validation	Accepted	2026-02-27
ADR-033	Amazon SP-API Integration Strategy	Accepted	2026-02-27
ADR-034	Google Merchant Center Feed Strategy	Accepted	2026-02-27
ADR-035	Channel Safety Buffer Calculation	Accepted	2026-02-27
ADR-036	POS-Master Default for External Channels	Accepted	2026-02-27
ADR-037	Offline Conflict Resolution via CRDTs	Accepted	2026-02-27
ADR-038	Transactional Outbox for Event Publishing	Accepted	2026-02-27
ADR-039	CQRS Boundary (Sales Domain Only)	Accepted	2026-02-27
ADR-040	Eventual Consistency SLA (5s Online, 30min Offline)	Accepted	2026-02-27
ADR-041	6-Gate Security Pyramid	Accepted	2026-02-27
ADR-042	~~E2E Testing Strategy~~	Removed (duplicate of ADR-017)	2026-02-27
ADR-043	~~LGTM Observability Stack~~	Removed (duplicate of ADR-012)	2026-02-27
ADR-044	API Performance Targets	Accepted	2026-02-27
ADR-045	Blue-Green Deployment Strategy	Accepted	2026-02-27
ADR-046	Nexus Dual Deployment Architecture	Accepted	2026-02-28
ADR-047	Raptag Mobile Framework (React Native)	Accepted	2026-02-28
ADR-048	Online-First POS Data Strategy	Accepted	2026-03-01

ADR-013: RFID Configuration Embedded in Tenant Admin Portal

Superseded: The “Admin Portal” concept has been eliminated. RFID configuration is now accessed via Nexus Admin web app > Settings > RFID section. The decision to embed RFID in the main application (rather than a separate portal) remains valid — only the product surface name has changed. See ADR-046.

+==================================================================+
|  ADR-013: RFID Configuration Embedded in Tenant Admin Portal     |
+==================================================================+
|  Status: SUPERSEDED (by ADR-046: Nexus Dual Deployment           |
|          Architecture)                                            |
|  Date: 2026-01-01                                                |
|  Deciders: Architecture Team                                      |
+==================================================================+

CONTEXT
-------
RapOS includes RFID inventory capabilities via the Raptag mobile app.
The question arose: where should RFID configuration (device management,
printer setup, tag encoding settings, templates) be managed?

We evaluated three options:

  Option A: Embed in Tenant Admin Portal (app.rapos.com)
  - RFID settings as feature-flagged section in existing portal
  - Uses existing authentication, permissions, navigation
  - Shared context with products, locations, users

  Option B: Separate RFID Portal (rfid.rapos.com)
  - Dedicated portal just for RFID configuration
  - 4th portal in the architecture
  - Independent scaling and development

  Option C: Hybrid Approach
  - Basic settings in Tenant Admin
  - Advanced configuration in separate portal
  - Users navigate between portals

Research was conducted on major RFID vendors:
- SML Clarity: Single platform, modular components
- Checkpoint HALO/ItemOptix: Unified SaaS platform
- Avery Dennison atma.io: Role-based dashboards in one platform
- Impinj ItemSense: Single Management Console

Key finding: NO major RFID vendor uses separate portals for RFID
configuration. All embed RFID features within unified platforms.

DECISION
--------
We will EMBED RFID configuration in the Tenant Admin Portal (Option A).

Implementation:
- Settings > RFID section (feature-flagged)
- Devices tab: Claim codes, device list, release
- Printers tab: IP configuration, test connectivity
- Tag Configuration tab: EPC prefix (read-only), variance thresholds
- Templates tab: Label template library

Mobile app downloads configuration from central API on startup.
No RFID configuration in the mobile app itself.

CONSEQUENCES
------------
Positive:
  + Matches industry pattern (SML, Checkpoint, Avery Dennison)
  + Single login/URL for all tenant management
  + Shared context with products, locations, users
  + Lower development cost (one portal, not two)
  + Progressive disclosure manages complexity
  + Same permissions system applies to RFID

Negative:
  - Could become bloated if RFID features grow significantly
  - Enterprise customers might want dedicated RFID admin
  - Feature flags add slight complexity

Risks:
  - Tenant Admin may feel "cluttered" with many features
  - RFID power users may want more dedicated experience

Mitigations:
  - Use progressive disclosure (collapse advanced settings)
  - Role-based visibility (hide RFID from non-RFID users)
  - Monitor feedback; re-evaluate if enterprise demand grows
  - Feature-flagged sections can be extracted later if needed

Re-evaluation Triggers:
  - Multiple enterprise customers (100+ stores) request separation
  - RFID feature count exceeds 20+ configuration screens
  - Evidence that RFID admins are different people than Tenant admins

ADR-007: Admin Portal Framework — Blazor Server

Superseded: This ADR documents the original C#/Blazor Server architecture that was rejected during the v6.1.0 tech stack pivot. The separate Admin Portal has been eliminated. Administration is now integrated into the Nexus web application — the same React/TypeScript codebase deployed as both a Tauri desktop app (Nexus POS) and a standard web app (Nexus Admin). The current architecture uses React/TypeScript with Tauri 2.0 for the desktop POS client (see ADR-046 Nexus Dual Deployment). This record is preserved for historical context.

2.7 ADR-007: Admin Portal Framework

Field	Value
Status	Superseded (by ADR-046)
Date	2026-02-27
Decision Makers	Architecture Review Team
Context	The Admin Portal needs a frontend framework that integrates with the .NET backend and supports real-time features.

Context

The Admin Portal (app.rapos.com) is the central management interface for tenant administrators. It provides dashboards, product management, employee management, reporting, and configuration. The portal requires real-time data updates (inventory levels, sales dashboards, integration sync status) and must share authentication and authorization logic with the Central API.

The team already uses C# for the Central API (ASP.NET Core 8.0), the POS Client (.NET MAUI), and the Mobile App (.NET MAUI). Introducing a JavaScript-based frontend would require maintaining two toolchains, two build systems, and two sets of domain models with mapping layers.

Admin portals are inherently server-heavy workloads: data-dense tables, reporting dashboards, configuration forms, and audit logs. Unlike consumer-facing SPAs, admin portals benefit more from server-side rendering and direct database access than from client-side interactivity.

Decision

We will use Blazor Server for the Admin Portal.

Considered Options

React SPA — JavaScript/TypeScript Single Page Application with REST/GraphQL API calls
Angular SPA — TypeScript-based enterprise SPA framework
Vue.js SPA — Progressive JavaScript framework
Blazor Server — Server-side Razor components with SignalR real-time updates

Decision Outcome

Chosen: Blazor Server because it unifies the entire stack on C#/.NET, eliminates the need for a separate JavaScript build toolchain, provides built-in real-time updates via SignalR (already used for inventory broadcasts), and enables sharing of domain models, validation logic, and DTOs directly between the API and the portal.

Trade-offs

Pros:

Unified .NET stack — same language, same models, same tooling across API, Admin Portal, POS Client, and Mobile
Built-in real-time via SignalR — dashboard updates, inventory alerts, sync status without polling
No separate build toolchain — no Node.js, npm, webpack, or Vite required for the Admin Portal
Server-side rendering — thin client, no large JavaScript bundles, fast initial load
Shared Blazor components with POS Client (.NET MAUI Blazor Hybrid)
Full access to .NET ecosystem (FluentValidation, MediatR, EF Core) in UI logic
Simplified authentication — shares the same ASP.NET Core Identity/JWT infrastructure

Cons:

Requires persistent SignalR connection — higher server memory per concurrent user
Latency on every UI interaction (round-trip to server) — acceptable for admin workloads, not for consumer SPAs
Smaller UI component ecosystem compared to React (mitigated by MudBlazor, Radzen, Syncfusion)
Team must learn Razor component model if unfamiliar (low risk given existing C# expertise)

References

Ch 04: Architecture Styles, Section L.9A (System Architecture) (Admin Portal details — planned future rewrite)
ADR-006: Node.js + TypeScript for Central API

ADR-008: POS Client Framework — Tauri 2.0 + React/TypeScript

Note (v7.0.0): The Tauri 2.0 desktop wrapper has been replaced by a pure React web application (ADR-052). Hardware peripherals now use web protocols (Star WebPRNT for receipt printers, USB HID for barcode scanners, Stripe Terminal SDK for payment terminals). SQLite offline storage uses WASM (sql.js/wa-sqlite + OPFS) instead of native better-sqlite3. The React/TypeScript architecture and shared codebase principles from this ADR remain valid.

2.8 ADR-008: POS Client Framework

Field	Value
Status	Accepted (Tauri-specific parts superseded by ADR-052)
Date	2026-02-28
Decision Makers	Architecture Review Team
Context	The POS Client runs on store terminals (Windows desktops/tablets), needs native hardware access (receipt printers ESC/POS, barcode scanners HID/serial, cash drawers RJ-11), offline-first local SQLite, and cross-platform desktop deployment.

Context

The POS Client (Nexus POS) runs on store terminals (Windows desktops/tablets) and must integrate with physical retail hardware: receipt printers (ESC/POS protocol), barcode scanners (HID/serial), cash drawers (RJ-11 trigger). It must operate fully offline with a local SQLite database and sync queued transactions when connectivity is restored.

With the tech stack pivot to TypeScript (ADR-006), the POS Client should use the same React/TypeScript codebase as the Nexus Admin web application. Tauri 2.0 enables wrapping a React web app as a native desktop application with Rust-powered backend commands for hardware access and performance-critical operations. The same React codebase is deployed as both a Tauri desktop app (Nexus POS) and a standard web app (Nexus Admin) — see ADR-046.

Decision

We will use Tauri 2.0 + React/TypeScript for the POS Client, with better-sqlite3 for local offline storage.

Considered Options

Electron — Chromium-based desktop app with Node.js backend (rejected: 150MB+ bundle, Chromium overhead)
Tauri 2.0 — Rust-based lightweight desktop app with web frontend (chosen)
PWA (Progressive Web App) — Browser-based with service worker caching (rejected: no native hardware access)
.NET MAUI Blazor Hybrid — Native .NET desktop app with embedded Blazor WebView (rejected: different language ecosystem from TypeScript stack)

Decision Outcome

Chosen: Tauri 2.0 + React/TypeScript because it provides full offline capability with local SQLite via better-sqlite3 (Tauri sidecar or native plugin), direct hardware access via Tauri Rust commands (receipt printer ESC/POS, barcode scanner, cash drawer), shares the same React codebase with Nexus Admin web app (dual deployment from single source), and produces a lightweight binary (~10MB vs Electron 150MB+).

Trade-offs

Pros:

Full offline capability with local SQLite via better-sqlite3 (Tauri sidecar or native plugin)
Direct hardware access via Tauri Rust commands (receipt printer ESC/POS, barcode scanner, cash drawer)
Same React codebase as Nexus Admin web app — dual deployment from single source (ADR-046)
Lightweight binary (~10MB vs Electron 150MB+) — important for store terminal hardware
No bundled Chromium — uses system WebView2 (Windows) reducing memory footprint
Rust backend for performance-critical paths (encryption, local DB operations, sync)
TypeScript shared types with Central API via npm packages
Single design system (TailwindCSS + shadcn/ui) across Nexus POS and Nexus Admin

Cons:

Tauri 2.0 is newer than Electron — smaller community, fewer third-party plugins (growing rapidly)
Rust commands require Rust expertise for hardware integration layer (contained scope)
WebView2 dependency on Windows (auto-installed on Windows 10 21H2+ and Windows 11)
Some rendering differences between WebView2 and Chrome (mitigated by consistent React component library)

References

Ch 04: Architecture Styles, Section L.10A.1 (Offline Strategy) (POS Client details — planned future rewrite)
ADR-002: Offline-First POS Architecture
ADR-046: Nexus Dual Deployment Architecture

ADR-009: Redis for Session & Cache

2.9 ADR-009: Redis for Session & Cache

Field	Value
Status	Accepted
Date	2026-02-27
Decision Makers	Architecture Review Team
Context	The platform needs distributed session management and caching for a horizontally scaled API layer.

Context

The Central API is deployed as multiple stateless instances behind a load balancer. User sessions (JWT refresh tokens, active cart state for the Nexus Admin) and frequently accessed data (product catalog, tax rates, tenant configuration) must be available to any API instance. In-memory caching per-instance leads to inconsistency when requests are load-balanced across instances.

Additionally, Module 6 (Integrations) requires real-time pub/sub for broadcasting inventory updates to connected POS terminals via Socket.io, and caching safety buffer computations to avoid recalculating on every channel sync.

Decision

We will use Redis 7.x for distributed session management, cache-aside pattern, and pub/sub real-time notifications.

Considered Options

In-memory per-instance — Each API instance maintains its own cache
Memcached — Simple distributed key-value cache
PostgreSQL-based sessions — Store sessions in the primary database
Redis 7.x — Distributed cache, session store, and pub/sub

Decision Outcome

Chosen: Redis 7.x because it supports all three use cases (session, cache, pub/sub) in a single infrastructure component, has excellent Node.js integration via ioredis, and provides sub-millisecond read latency for product lookups during checkout.

Trade-offs

Pros:

Distributed session — any API instance can serve any user without sticky sessions
Cache-aside pattern — product catalog, tax rates, and tenant config cached with configurable TTL
Pub/sub for real-time — inventory update broadcasts to Socket.io rooms without polling PostgreSQL
Sub-millisecond read latency — critical for checkout performance (NFR-PERF-001: < 500ms p99)
Built-in data structures (sorted sets for leaderboards, streams for event buffering)
Proven at scale — used by GitHub, Twitter, Stack Overflow

Cons:

Additional infrastructure component to deploy and monitor
Data loss on restart if not using AOF persistence (mitigated by AOF + RDB snapshots)
Memory-bound — cost increases with data volume (mitigated by TTL eviction policies)
Single-threaded command processing — throughput limited per instance (mitigated by Redis Cluster for scale)

References

Chapter 04: Architecture Styles, Section L.9A (Data Layer)
Chapter 09: Indexes & Performance

ADR-010: Shopify Sync Strategy — Webhook + Polling Hybrid

2.10 ADR-010: Shopify Sync Strategy

Field	Value
Status	Accepted
Date	2026-02-27
Decision Makers	Architecture Review Team
Context	Shopify integration requires real-time inventory sync with fallback for missed webhooks.

Context

The POS platform syncs product catalog and inventory levels bidirectionally with Shopify. Shopify provides webhooks for real-time notifications (products/update, inventory_levels/update, orders/create) but webhooks can be missed due to network issues, Shopify outages, or endpoint failures. The platform must guarantee eventual consistency between POS inventory and Shopify inventory.

BRD v18.0 Module 6 (Section 6.3) defines Shopify as the primary e-commerce integration with OAuth 2.0/PKCE authentication, GraphQL Admin API at 50 points/second rate limiting, and mandatory @idempotent mutations (required 2026-04).

Decision

We will use a Webhook + Polling hybrid strategy for Shopify synchronization.

Considered Options

Pure Webhook — Rely solely on Shopify webhooks for all sync
Pure Polling — Poll Shopify API on intervals for all changes
Shopify Flow — Use Shopify’s built-in automation workflows
Webhook + Polling hybrid — Webhooks for real-time, polling as fallback

Decision Outcome

Chosen: Webhook + Polling hybrid because webhooks provide near-real-time sync (< 5 seconds processing) for the common case, while scheduled polling (every 15 minutes) catches any missed webhooks and ensures eventual consistency. Both paths use idempotent processing with the same event pipeline.

Trade-offs

Pros:

Near-real-time sync via webhooks (< 5 seconds processing per NFR-INTG-001)
Guaranteed eventual consistency via polling fallback
Idempotent processing — same handler for webhook and polling events (no double-counting)
Resilient to webhook delivery failures (Shopify retries for 48 hours, polling catches the rest)
Rate-limit-aware polling with adaptive backoff

Cons:

More complex than pure polling (webhook endpoint, signature verification, retry handling)
Polling adds API calls that count against Shopify rate limits (mitigated by delta queries with updated_at_min)
Must handle duplicate events from both webhook and poll (mitigated by idempotency framework with 24-hour dedup window)

References

Ch 04: Architecture Styles, Section L.4B (Integration Architecture) (Integration patterns — see also Ch 05 Module 6)
BRD v20.0 Section 6.3 (Shopify Integration)

ADR-011: Payment Gateway — SAQ-A Semi-Integrated

2.11 ADR-011: Payment Gateway

Field	Value
Status	Accepted
Date	2026-02-27
Decision Makers	Architecture Review Team, Security Team
Context	The platform must process card payments with minimal PCI compliance scope.

Context

POS terminals must accept card payments (chip, tap, swipe) in physical stores. PCI-DSS compliance is mandatory, but the level of compliance effort varies dramatically based on how card data is handled. Full integration (SAQ-D) requires 300+ controls; semi-integrated (SAQ-A) requires ~30 controls because card data never touches our system.

The platform must support multiple payment providers to avoid vendor lock-in and enable tenant choice. The offline capability requires that payment tokens (not card data) can be stored locally for void/refund operations.

Decision

We will use SAQ-A Semi-Integrated terminals with Stripe Terminal and Square Terminal as supported providers.

Considered Options

Full Integration (SAQ-D) — Card data flows through our system, encrypted and tokenized
Semi-Integrated (SAQ-A) — Card data handled entirely by terminal/processor, we receive tokens only
Redirect-only — Customer redirected to payment page (not applicable for in-store POS)
Hosted Fields — Embedded payment form from provider (web-only, not applicable for desktop POS)

Decision Outcome

Chosen: SAQ-A Semi-Integrated because no card data (PAN, CVV, track data, PIN block) ever touches our system. The POS Client sends a payment request to the terminal SDK, the terminal communicates directly with the payment processor, and we receive only a token, approval code, and masked card number (****1234). This reduces PCI scope from 300+ controls to ~30.

Trade-offs

Pros:

Minimal PCI scope (SAQ-A: ~30 controls vs. SAQ-D: 300+ controls)
No card data storage, transmission, or processing in our system
Multi-provider support — Stripe Terminal and Square Terminal via provider abstraction
Token-based void/refund — works offline using stored payment tokens
Terminal firmware managed by provider (no EMV kernel maintenance)

Cons:

Dependent on terminal hardware availability and provider SDK updates
Terminal communication adds latency (~1-3 seconds for chip transactions)
Limited control over payment UX (terminal screen is provider-controlled)
Two provider SDKs to maintain (Stripe Terminal SDK, Square Terminal SDK)

References

Ch 04: Architecture Styles, Section L.10A.3 (Payment Integration) (Security & Auth details — planned future rewrite)
Ch 04: Architecture Styles, Section L.8 (Security — 6-Gate Pyramid)
BRD v20.0 Section 1.18 (Payments)

ADR-012: Logging & Monitoring — LGTM Stack

2.12 ADR-012: Logging & Monitoring

Field	Value
Status	Accepted
Date	2026-02-27
Decision Makers	Architecture Review Team, Infrastructure Team
Context	The platform needs unified observability across API, POS clients, integrations, and infrastructure.

Context

The POS platform has multiple observable surfaces: the Central API (multiple instances), POS terminals in stores (offline-capable), external integrations (Shopify, Amazon, Google Merchant), and infrastructure (PostgreSQL, Redis, Kafka v2.0). Operators need logs, metrics, and distributed traces to diagnose issues like “why did this sale fail to sync?” or “why is the Shopify circuit breaker open?”

Cloud-native SaaS solutions (Datadog, New Relic) offer convenience but at significant cost ($15-25/host/month) and with vendor lock-in. The platform uses OpenTelemetry for instrumentation, which enables backend-agnostic telemetry collection.

Decision

We will use the LGTM Stack (Loki, Grafana, Tempo, Mimir/Prometheus) for observability.

Considered Options

ELK Stack (Elasticsearch, Logstash, Kibana) — Established log aggregation platform
Datadog — Cloud-native SaaS observability platform
Cloud-native (CloudWatch, Azure Monitor) — Cloud provider native tools
LGTM Stack (Loki, Grafana, Tempo, Prometheus) — Open-source observability platform

Decision Outcome

Chosen: LGTM Stack because it is fully open-source (no per-host licensing), self-hosted (data stays on our infrastructure for PCI compliance), and designed for the OpenTelemetry ecosystem. Grafana provides unified dashboards for logs (Loki), traces (Tempo), and metrics (Prometheus), with native Node.js auto-instrumentation via @opentelemetry/sdk-node.

Trade-offs

Pros:

Open-source — no per-host licensing costs, no vendor lock-in
Self-hosted — data stays on infrastructure (PCI compliance for audit logs)
Unified dashboards in Grafana — logs, metrics, and traces correlated by trace ID
Loki uses label-based indexing (not full-text) — lower storage costs than Elasticsearch
Native OpenTelemetry support — Node.js auto-instrumentation for Express/Fastify, Prisma, HTTP client
Integration-specific dashboards: circuit breaker state, DLQ depth, sync latency, safety buffer violations

Cons:

Operational overhead — must manage Loki, Tempo, Prometheus, Grafana infrastructure
Less feature-rich than Datadog for APM (no automatic service maps, no AI anomaly detection)
Grafana alerting is functional but less sophisticated than PagerDuty/OpsGenie (mitigated by Alertmanager integration)
Storage management required for long-term log/metric retention

References

Ch 04: Architecture Styles, Section L.7 (Observability) (Monitoring details — planned future rewrite)
Ch 04: Architecture Styles, Section L.8 (Security — FIM via Wazuh)

ADR-014: npm Package Versioning — Pinned Major.Minor with Lock File

2.14 ADR-014: npm Package Versioning

Field	Value
Status	Accepted
Date	2026-02-28
Decision Makers	Architecture Review Team
Context	The platform depends on critical npm packages that must be version-controlled for build reproducibility and security.

Context

The POS platform uses multiple npm packages for core functionality: Prisma for type-safe PostgreSQL access, ioredis for caching, Socket.io for real-time broadcasts, Zod for schema validation, pino for structured logging, jose for JWT operations, and argon2 for password hashing. The frontend uses React, TailwindCSS, shadcn/ui, React Query, and Zustand.

Floating version ranges (*, latest) can introduce breaking changes in CI/CD. Exact pinning (4.18.0) prevents security patches. A balanced approach is needed. The monorepo uses pnpm as the package manager with a committed lock file.

Decision

We will use pinned major.minor in package.json (e.g., "express": "^4.18") with pnpm-lock.yaml committed for full reproducibility. Dependabot/Renovate automates PR-based updates.

Considered Options

Floating versions (*) — Always use latest available
Exact pinning (4.18.2) — Lock to specific patch version
Caret ranges (^4.18.0) — Allow minor + patch updates
Pinned major.minor with lock file (^4.18 + committed pnpm-lock.yaml)

Decision Outcome

Chosen: Pinned major.minor with lock file because it ensures build reproducibility via the committed pnpm-lock.yaml (identical installs across developer machines and CI/CD) while allowing patch-level security fixes. Dependabot/Renovate creates PRs for major/minor bumps with changelog review.

Key Package Versions:

Package	Pinned Version	Purpose
express or fastify	^4.18 / ^5.0	HTTP framework
@prisma/client	^5.x	PostgreSQL ORM (type-safe)
ioredis	^5.x	Redis client
socket.io	^4.x	Real-time WebSocket
zod	^3.x	Schema validation
pino	^8.x	Structured logging
@opentelemetry/sdk-node	^1.x	Observability instrumentation
jose	^5.x	JWT signing/verification (RS256)
argon2	^0.x	Password hashing (Argon2id)
better-sqlite3	^11.x	SQLite for Tauri POS local DB
kafkajs	^2.x	Kafka client (v2.0 future)

Trade-offs

Pros:

Build reproducibility — pnpm-lock.yaml ensures identical dependency trees across all environments
Automatic security patches — patch versions flow through automatically
Consistent across developer machines and CI/CD
Dependabot/Renovate creates PRs for major/minor bumps with changelog review
pnpm strict mode prevents phantom dependencies

Cons:

Patch-level changes could theoretically introduce bugs (extremely rare, mitigated by CI test suite)
Requires manual intervention for major/minor upgrades (by design — these are reviewed)
Lock file must be committed and kept up to date (pnpm-lock.yaml)
npm ecosystem has higher dependency churn than .NET (mitigated by lock file + Renovate)

References

Ch 04: Architecture Styles, Section L.8 (SCA — Snyk/OWASP) (Dev Environment details — planned future rewrite)
PCI-DSS 4.0 Req 6.3.2 (SBOM generation)

ADR-015: Offline Sync Strategy — Queue-and-Sync with CRDTs

SUPERSEDED: This ADR has been superseded by ADR-048 (Online-First with Offline Fallback). CRDTs were eliminated in v6.2.0. This record is preserved for historical context.

2.15 ADR-015: Offline Sync Strategy

Field	Value
Status	Superseded (by ADR-048)
Date	2026-02-27
Decision Makers	Architecture Review Team
Context	POS terminals operating offline must sync transactions and inventory changes without data loss or conflicts.

Context

ADR-002 established offline-first as a core requirement. This ADR specifies the sync mechanism. When POS terminals are offline, sales, payments, and inventory changes accumulate locally. When connectivity is restored, these changes must be pushed to the Central API and merged with changes from other terminals and the Nexus Admin.

The key challenge is conflict resolution: two terminals may sell the last unit of a product simultaneously, or an admin may update a price while a terminal is offline. The sync strategy must handle these cases deterministically without data loss.

Decision

We will use Queue-and-Sync with CRDTs for offline synchronization.

Considered Options

Sync-on-connect — Full database sync when connectivity is restored
Optimistic sync — Push local changes, accept server response as authority
Operational Transforms (OT) — Transform operations based on concurrent changes
Queue-and-Sync with CRDTs — Priority-based sync queue with CRDT merge for conflict-free data types

Decision Outcome

Chosen: Queue-and-Sync with CRDTs because it combines append-only event queuing (sales are conflict-free by nature) with CRDT data structures for data types that need merge (inventory counters, price updates, cart items). Priority-based queuing ensures critical data (sales, payments) syncs before less critical data (customer updates, analytics).

Sync Priority Tiers:

Priority	Event Types	Sync Timing
1 (Critical)	Sales, Payments, Refunds, Voids	Immediate when online
2 (Important)	Inventory adjustments, Transfers	Within 5 minutes
3 (Normal)	Customer updates, Loyalty changes	Within 15 minutes
4 (Low)	Analytics events, Logs	Batch sync hourly

CRDT Usage:

CRDT Type	Use Case	Merge Strategy
PN-Counter	Inventory levels (+/-)	Sum increments, sum decrements
LWW-Register	Price updates, last modified	Highest timestamp wins
OR-Set	Cart items, applied discounts	Union with tombstones
G-Counter	Transaction counts, sales counts	Sum all increments

Trade-offs

Pros:

Sales never conflict — append-only events with unique IDs
Inventory converges automatically — PN-Counter CRDTs are mathematically guaranteed to converge
Priority-based sync — critical financial data syncs before convenience data
Parked sales support — up to 5 parked sales per terminal with 4-hour TTL
Queue limit (100 transactions) prevents unbounded offline operation

Cons:

CRDT implementation adds complexity to the sync layer
PN-Counters can temporarily show incorrect inventory (converges after sync)
Tombstone management for OR-Sets requires periodic compaction (7-day TTL)
Some operations blocked offline (customer create, gift card activation) to prevent inconsistencies

References

Chapter 04: Architecture Styles, Section L.10A.1 (Online-First with Offline Fallback)
ADR-002: Offline-First POS Architecture (superseded)
ADR-003: Event Sourcing for Sales Domain
ADR-048: Online-First POS Data Strategy (supersedes this ADR)

ADR-016: Error Code Structure — ERR-Mxxx Hierarchical

2.16 ADR-016: Error Code Structure

Field	Value
Status	Accepted
Date	2026-02-27
Decision Makers	Architecture Review Team
Context	The platform needs a structured error code system for consistent error handling across 7 modules.

Context

The POS platform has 7 BRD modules, each generating different types of errors. Without a structured error code system, error handling degrades to HTTP status codes and free-form messages, making it difficult for POS Client developers, integration partners, and support teams to programmatically handle specific error conditions.

BRD v20.0 already defines module-specific error codes (ERR-5xxx for Module 5, ERR-6xxx for Module 6). This ADR formalizes the structure across all modules.

Decision

We will use a hierarchical ERR-Mxxx error code structure where M identifies the module (1-6) and xxx identifies the specific error within that module.

Considered Options

HTTP-only — Rely solely on HTTP status codes (400, 404, 409, 500)
Free-form strings — Arbitrary error codes like “SALE_NOT_FOUND”, “INVENTORY_INSUFFICIENT”
Exception-based — Let exception types define error categories
ERR-Mxxx hierarchical — Structured numeric codes with module prefix

Decision Outcome

Chosen: ERR-Mxxx hierarchical because it provides predictable, documented, machine-parseable error codes that map directly to BRD module boundaries. POS Client developers can switch on error code ranges, and support teams can triage by module.

Error Code Ranges:

Range	Module	Examples
ERR-1xxx	Module 1: Sales	ERR-1001 (sale not found), ERR-1010 (void window expired)
ERR-2xxx	Module 2: Inventory	ERR-2001 (insufficient stock), ERR-2010 (transfer rejected)
ERR-3xxx	Module 3: Customers	ERR-3001 (duplicate email), ERR-3010 (loyalty balance insufficient)
ERR-4xxx	Module 4: Reporting	ERR-4001 (date range too large), ERR-4010 (export limit exceeded)
ERR-5xxx	Module 5: Admin/Setup	ERR-5071 (register IP change limit), ERR-5072 (register retire requires OWNER)
ERR-6xxx	Module 6: Integrations	ERR-6001 (provider auth failed), ERR-6010 (circuit breaker open)

Trade-offs

Pros:

Predictable structure — POS Client can switch on error range (1xxx = sales, 2xxx = inventory)
Machine-parseable — error codes are numeric, not free-form strings
Aligned with BRD module boundaries — easy to trace errors to requirements
Supports i18n — error codes mapped to localized messages on the client
Documented in API reference (Appendix A — planned future rewrite) — developers know all possible errors per endpoint

Cons:

Requires maintaining error code registry (mitigated by code generation from registry file)
Must avoid error code conflicts as modules grow (mitigated by 1000-code range per module)
Error codes are less self-descriptive than string codes (mitigated by including message field in error response)

References

Ch 05: Architecture Components (BRD v20.0 Sections 5.x and 6.x — error code definitions) (API Design chapter — planned future rewrite)

ADR-017: Test Strategy — Layered Testing Pyramid

2.17 ADR-017: Test Strategy

Field	Value
Status	Accepted
Date	2026-02-27
Decision Makers	Architecture Review Team, QA Team
Context	The platform needs a testing strategy that balances coverage, speed, and confidence for a multi-tenant POS system.

Context

The POS platform processes financial transactions, manages inventory across multiple locations, and integrates with 6 external provider families. Testing must verify correctness at multiple levels: domain logic (tax calculation, commission reversal), API contracts (multi-tenant isolation, error codes), integration behavior (Shopify webhooks, payment terminals), and end-to-end workflows (offline sale → sync → inventory update).

BRD v18.0 defines 36 user stories with Gherkin acceptance criteria. Three platform sandboxes (Shopify Dev Store, Amazon SP-API Sandbox, Google Merchant test account) must be exercised in CI/CD.

Decision

We will use a Layered Testing Pyramid with specific tool choices per layer.

Considered Options

Flat testing — Equal effort at all levels, no pyramid structure
E2E-heavy — Focus on end-to-end tests with minimal unit tests
Property-based — Use property-based testing (QuickCheck/FsCheck) as primary strategy
Layered Testing Pyramid — Traditional pyramid: many unit, fewer integration, fewest E2E

Decision Outcome

Chosen: Layered Testing Pyramid because it provides fast feedback at the bottom (unit tests in < 5 seconds), confidence in the middle (integration tests with real PostgreSQL via Testcontainers-node), and end-to-end validation at the top (Playwright for browser automation). This matches the team’s TypeScript expertise and CI/CD pipeline constraints.

Testing Pyramid:

Layer	Tool	Coverage Target	Speed	Scope
Unit	Vitest	80%	< 5 sec	Domain logic, validators, calculators
Integration	Testcontainers-node + Vitest	15%	< 2 min	API endpoints, DB queries, Redis, RLS
E2E	Playwright	5%	< 10 min	Full workflows: login → sale → receipt
Load	k6	N/A	30 min	Black Friday simulation: 500 concurrent, 1000 TPS
Contract	Pact	N/A	< 1 min	Shopify/Amazon/Google sandbox API contracts
Security	6-Gate Pyramid	N/A	< 5 min	SAST, SCA, Secrets, ArchUnit, Pact, Manual

Trade-offs

Pros:

Fast feedback — Vitest unit tests run in seconds with native TypeScript support, catching regressions immediately
Real database testing — Testcontainers-node spins up PostgreSQL 16 with RLS for integration tests
Multi-tenant isolation verified — integration tests confirm tenant_id RLS policies prevent cross-tenant access
Contract testing with external platforms — Pact verifies Shopify/Amazon/Google API contracts
Load testing prevents performance regressions — k6 validates NFR-PERF-001 (< 500ms p99 checkout)

Cons:

Testcontainers-node requires Docker in CI/CD (standard in modern CI)
Playwright E2E tests are slower and more brittle (mitigated by limiting to critical paths only)
Load testing requires dedicated environment (not run on every commit, only on release candidates)
Contract tests depend on external sandbox availability (mitigated by recorded responses as fallback)

References

Chapter 04: Architecture Styles, Section L.6 (QA & Testing)
Chapter 04: Architecture Styles, Section L.8 (6-Gate Security Pyramid)
(Dev Environment and Checklists chapters — planned future rewrite)

ADR-018: Affirm BNPL Integration

2.18 ADR-018: Affirm BNPL Integration

Field	Value
Status	Accepted
Date	2026-02-27
Decision Makers	Architecture Review Team
Context	The platform needs a Buy Now Pay Later (BNPL) option for high-value retail transactions at the point of sale.

Context

Retail clothing transactions can reach $200-$500+, creating friction for customers who prefer installment payments. The POS must offer a third-party financing option that does not add PCI scope, integrates with the existing checkout flow, and pays the merchant in full immediately while the customer repays the financing provider directly.

BRD v20.0 Section 1.3 defines Third-Party Financing as a payment method alongside cash, card, gift card, on-account, and layaway. The financing flow must support both in-store QR code presentation and customer-device redirect.

Decision

We will integrate Affirm as the BNPL provider for in-store financing.

Considered Options

Affirm — Established BNPL provider with in-store POS SDK, QR code flow, and merchant dashboard
Klarna — Popular BNPL with strong e-commerce presence but limited in-store POS integration
Afterpay/Clearpay — Fixed 4-installment model, limited flexibility for higher-value purchases
In-house installment plans — Build custom financing directly in the POS system

Decision Outcome

Chosen: Affirm because it provides a well-documented in-store API, supports variable loan terms (3-36 months), pays the merchant full amount immediately (the store receives 100% of the sale amount from Affirm), and the customer completes the entire application on their own device. No card data or financial data touches the POS system — only a charge_id, loan_id, and approval status are stored.

Trade-offs

Pros:

Full payment received from Affirm immediately — no credit risk for the merchant
No PCI scope increase — customer’s financial data handled entirely by Affirm
QR code flow integrates cleanly into existing POS checkout sequence
Affirm handles all underwriting, collections, and customer communication
Established retail brand (Peloton, Shopify, Walmart) provides customer trust

Cons:

Affirm charges merchant fees (typically 3-6% per transaction) reducing margin
Approval is not guaranteed — customer may be declined, requiring fallback to another payment method
Adds dependency on Affirm API availability during checkout (mitigated by circuit breaker)
Limited to Affirm-supported markets (US primarily)

References

Chapter 05: Architecture Components, Section 1.3 (Financial Settlement)
Ch 05: Architecture Components, Module 6 (Integrations) (Integration Patterns chapter — planned future rewrite)
ADR-019: SAQ-A Semi-Integrated Payment Scope

ADR-019: SAQ-A Semi-Integrated Payment Scope

2.19 ADR-019: SAQ-A Semi-Integrated Payment Scope

Field	Value
Status	Accepted
Date	2026-02-27
Decision Makers	Architecture Review Team, Security Team
Context	Card payment processing requires PCI-DSS compliance; the scope of compliance depends on how card data is handled.

Context

POS terminals must accept chip, tap, and swipe card payments. PCI-DSS compliance levels range from SAQ-A (~30 controls, card data never touches our system) to SAQ-D (300+ controls, full card data flow through our system). The choice fundamentally shapes the security architecture, development effort, and ongoing compliance cost.

BRD v20.0 Section 1.18 mandates that “Card data NEVER touches your system” and specifies a semi-integrated terminal architecture where the payment terminal communicates directly with the payment processor. The POS backend receives only tokens, approval codes, and masked card numbers (last 4 digits).

Decision

We will implement SAQ-A semi-integrated payment terminals where card data is handled entirely by the terminal hardware and payment processor SDK. The POS system stores only: transaction_id, payment_token, approval_code, masked_card_number (****1234), card_brand, entry_method, terminal_id, timestamp, and amount.

Considered Options

SAQ-D Full Integration — Card data encrypted and tokenized through our system (300+ PCI controls)
SAQ-A Semi-Integrated — Card data handled by terminal/processor, we receive tokens only (~30 controls)
SAQ-A-EP (E-commerce) — Redirect to hosted payment page (not applicable for in-store POS)

Decision Outcome

Chosen: SAQ-A Semi-Integrated because it reduces PCI compliance scope by 90% (from 300+ to ~30 controls), eliminates the risk of card data breach from our systems, and supports token-based void/refund operations that work offline. Stripe Terminal and Square Terminal are supported as interchangeable providers via the IIntegrationProvider abstraction (Ch 05 Section 6.2.1).

Trade-offs

Pros:

90% reduction in PCI compliance scope and audit effort
Zero card data in our system — breach of our database exposes no payment card information
Token-based refund/void works offline using stored payment tokens
Terminal firmware and EMV kernel managed by provider — no maintenance burden
Multi-provider support via provider abstraction prevents vendor lock-in

Cons:

Dependent on terminal hardware availability and SDK compatibility
Terminal communication adds 1-3 seconds latency for chip transactions
Limited control over payment UX (terminal screen controlled by provider)
Must maintain two provider SDKs (Stripe Terminal, Square Terminal)

References

Chapter 05: Architecture Components, Section 1.18 (Payment Integration)
Ch 04: Architecture Styles, Section L.8 (Security) (Security chapters — planned future rewrite)
ADR-011: Payment Gateway (SAQ-A Semi-Integrated)

ADR-020: Split Tender Payment Support

2.20 ADR-020: Split Tender Payment Support

Field	Value
Status	Accepted
Date	2026-02-27
Decision Makers	Architecture Review Team
Context	Retail customers frequently need to pay with multiple payment methods in a single transaction.

Context

Retail transactions commonly involve multiple payment methods: cash + card, multiple credit cards, gift card + card, on-account + cash, or Affirm for the remaining balance. BRD v20.0 Section 1.3 defines a tender loop where the cashier selects payment methods iteratively until the remaining balance reaches zero. Each tender is tracked independently for refund routing — a refund must be returned to the original payment method.

Decision

We will support unlimited split tender combinations where any payment method can be combined with any other. Each tender in a transaction is stored as a separate payment record with its own token/reference, enabling per-tender refund routing.

Considered Options

Single tender only — One payment method per transaction (simplest but poor UX)
Two-tender maximum — Allow at most two payment methods (limits flexibility)
Unlimited split tender — Any number of payment methods per transaction

Decision Outcome

Chosen: Unlimited split tender because retail customers expect payment flexibility, and gift card partial balances naturally require a second tender for the remainder. Each payment record stores its own token (for card), reference (for Affirm), or cash amount, enabling precise refund routing back to the original payment source.

Trade-offs

Pros:

Maximum payment flexibility — matches customer expectations in retail
Gift card partial balance + card is a common scenario handled naturally
Per-tender refund routing — each payment token tracked independently
Supports combining all 6 payment types: cash, card, gift card, on-account, layaway deposit, Affirm

Cons:

Refund logic complexity — must track which tender to refund to and in what order
Multiple card tenders mean multiple terminal interactions during checkout
Receipt layout must accommodate variable number of payment lines
Reconciliation reports must aggregate across tender types

References

Chapter 05: Architecture Components, Section 1.3 (Financial Settlement)
Ch 05: Architecture Components, Section 3.8 (Payment Processing) (API Design chapter — planned future rewrite)

ADR-021: Layaway Payment Plans

2.21 ADR-021: Layaway Payment Plans

Field	Value
Status	Accepted
Date	2026-02-27
Decision Makers	Architecture Review Team
Context	Some customers need to pay for high-value items over time with a deposit and installments, with inventory reserved until paid in full.

Context

Layaway is a traditional retail financing model where the customer pays a minimum deposit, inventory is reserved (not released), and the customer makes additional payments over time until the full amount is paid. BRD v20.0 Section 1.3 defines a layaway state machine: DEPOSIT_PAID -> RESERVED -> PAID_IN_FULL -> COMPLETED, with CANCELLED and FORFEITED as terminal states. The credit limit calculation must include pending layaway balances.

Decision

We will implement native layaway with configurable minimum deposit percentage, reservation-based inventory hold, and a state machine governing the layaway lifecycle. Layaway balances are included in the credit limit calculation: Available Credit = Credit Limit - (Current Debt + Pending Layaway Balances + Current Cart Total).

Considered Options

No layaway — Direct customers to Affirm BNPL instead
Basic layaway — Deposit + single final payment, no partial installments
Full layaway with installments — Deposit + multiple partial payments with deadline tracking

Decision Outcome

Chosen: Full layaway with installments because it is a standard expectation in brick-and-mortar retail, allows flexible payment schedules, and reserves inventory to guarantee availability. Unlike Affirm, layaway involves no third-party fees — the store manages the payment plan directly.

Trade-offs

Pros:

No third-party fees — merchant keeps full margin
Inventory reserved for customer until paid in full
Configurable minimum deposit percentage per tenant
Overdue tracking with forfeiture rules protects against abandoned layaways
Familiar model for retail staff and customers

Cons:

Inventory is tied up during the layaway period (not available for other sales)
Risk of forfeiture — must handle cancellation refund policies (configurable)
Adds complexity to credit limit calculations
Reporting must track outstanding layaway liability

References

Chapter 05: Architecture Components, Section 1.3 (Layaway State Machine)
Chapter 05: Architecture Components, Module 7 (State Machine Reference)
ADR-020: Split Tender Payment Support

ADR-022: Tax-Inclusive Display with Compound Calculation

2.22 ADR-022: Tax-Inclusive Display with Compound Calculation

Field	Value
Status	Accepted
Date	2026-02-27
Decision Makers	Architecture Review Team
Context	The POS must calculate and display tax correctly for US retail, where tax is calculated externally (not embedded in the price).

Context

US retail uses tax-exclusive pricing — product prices on the shelf do not include tax, and tax is calculated at checkout based on the store’s jurisdiction. BRD v20.0 Section 1.17 defines a tax hierarchy where product-level exemptions have highest priority, followed by customer-level exemptions, followed by the store’s location-based compound jurisdiction rate. Tax is computed per line item and displayed as a separate total on the receipt.

Section 5.9 defines the compound tax model: State + County + City rates summed at time of sale. Example: Norfolk, VA = State 4.3% + Regional 0.7% + City 1.0% = 6.0% compound rate.

Decision

We will use tax-exclusive pricing with compound tax calculation at checkout. Product prices are stored without tax. At checkout, all active rates for the store’s tax jurisdiction are summed and applied to each taxable line item. The receipt displays subtotal, tax breakdown (optionally by level), and total.

Considered Options

Tax-inclusive pricing — Embed tax in the product price (common in EU/UK, not US)
Tax-exclusive with flat rate — Single tax rate per location
Tax-exclusive with compound rate — Multi-level (State/County/City) summed at checkout
External tax service — Delegate to TaxJar/Avalara API for real-time calculation

Decision Outcome

Chosen: Tax-exclusive with compound rate because it matches US retail practice, supports the 3-level Virginia tax structure (the reference implementation), and enables future expansion to other states with complex district overlays (California) or no sales tax (Oregon). The tax engine is built internally rather than delegated to external services to ensure offline capability.

Trade-offs

Pros:

Matches US retail standard — prices on shelf exclude tax
3-level compound model handles all US jurisdictions (State + County + City + special districts)
Offline-capable — tax rates cached locally on POS terminal, no API call needed
Product-level and customer-level exemptions supported (reseller, non-profit, diplomatic)
Future-proof for multi-state expansion (California district taxes, Oregon no-tax)

Cons:

More complex than flat-rate tax — must manage jurisdiction-to-location mapping
Tax rate changes require admin updates (mitigated by scheduled effective dates)
Multi-jurisdiction reporting adds complexity to tax liability reports
Not suitable for EU/UK VAT without redesign (acceptable — target market is US)

References

Chapter 05: Architecture Components, Section 1.17 (Tax Calculation Engine)
Chapter 05: Architecture Components, Section 5.9 (Tax Configuration)
Chapter 07: Schema Design (tax_jurisdictions, tax_rates tables)

ADR-023: Compound Tax (3-Level State/County/City)

2.23 ADR-023: Compound Tax (3-Level State/County/City)

Field	Value
Status	Accepted
Date	2026-02-27
Decision Makers	Architecture Review Team
Context	The tax data model must support compound (additive) tax rates at multiple jurisdictional levels.

Context

US sales tax varies by jurisdiction and can consist of multiple additive layers: state tax, county tax, city tax, and sometimes special district surcharges. BRD v20.0 Section 5.9 defines a tax_jurisdictions table (jurisdiction code, name, state) and a tax_rates table with a level enum (STATE, COUNTY, CITY). Each location references a jurisdiction, and at time of sale all active rates for that jurisdiction are summed.

Example: Norfolk, VA = State 4.300% + Regional 0.700% + City 1.000% = 6.000% compound. Northern Virginia adds an additional 0.7% regional rate. Rate changes can be scheduled via effective_date with automatic activation.

Decision

We will implement a 3-level compound tax model using tax_jurisdictions and tax_rates tables. Each jurisdiction can have up to 3 active rate levels (STATE, COUNTY, CITY). Rates are summed at time of sale. Future rates are scheduled via effective_date with background activation.

Considered Options

Single flat rate per location — One rate column on the location table
2-level (State + Local) — State rate plus a single combined local rate
3-level compound (State/County/City) — Separate rate rows per level, summed at checkout
N-level with district overlay — Unlimited levels including special taxing districts

Decision Outcome

Chosen: 3-level compound because it covers the vast majority of US jurisdictions without the complexity of unlimited district overlays. The Virginia reference implementation (4 stores across different regions) validates this model. Special districts (California Proposition) can be modeled as a CITY-level rate until N-level support is needed. Unique constraint on (jurisdiction_id, level, effective_date) prevents duplicate rates.

Trade-offs

Pros:

Covers all current US jurisdictions (State + County + City covers 95%+ of cases)
Scheduled rate changes via effective_date — no manual intervention on tax change dates
Preserves historical rates for audit — rate changes never modify existing records
Simple SUM query at checkout: SELECT SUM(rate_percent) FROM tax_rates WHERE jurisdiction_id = ? AND is_active = true

Cons:

Cannot model California special district overlays (4th+ level) without schema extension
Requires admin to configure jurisdiction-to-location mapping per tenant
Rate scheduling background job must run reliably at midnight

References

Chapter 05: Architecture Components, Section 5.9 (Tax Configuration)
Chapter 07: Schema Design (Domain 15: Tax)
ADR-022: Tax-Inclusive Display with Compound Calculation

ADR-024: Gift Card Compliance (State Escheatment)

2.24 ADR-024: Gift Card Compliance (State Escheatment)

Field	Value
Status	Accepted
Date	2026-02-27
Decision Makers	Architecture Review Team, Legal
Context	Gift card management must comply with varying state-level escheatment and consumer protection laws.

Context

Gift cards are subject to state-specific regulations governing expiration, inactivity fees, and mandatory cash-out thresholds. BRD v20.0 Section 1.5 defines a jurisdiction compliance matrix: Virginia allows 5-year minimum expiry and inactivity fees after 12 months; California prohibits expiry, prohibits fees, and mandates cash-out at $10.00; New York prohibits both expiry and fees. The gift card state machine includes INACTIVE, ACTIVE, DEPLETED, EXPIRED, and CASHED_OUT states.

The system must default to the most restrictive rules (California-style: no expiry, no fees, cash-out required) and enable features only where jurisdiction permits.

Decision

We will implement jurisdiction-aware gift card rules that default to the most restrictive configuration (California-style) and enable expiry, fees, and cash-out thresholds per store location’s jurisdiction. The store’s physical location determines which rules apply.

Considered Options

Uniform national policy — Apply the most restrictive state’s rules everywhere (simple but limits flexibility)
Per-jurisdiction rules — Configure rules per state/jurisdiction with most-restrictive defaults
External compliance service — Delegate gift card compliance to a third-party service

Decision Outcome

Chosen: Per-jurisdiction rules with most-restrictive defaults because multi-state retail operations need location-specific compliance. Defaulting to California-style (no expiry, no fees, mandatory cash-out) ensures legal compliance even if jurisdiction configuration is incomplete. Stores in permissive jurisdictions can enable expiry and fees explicitly.

Trade-offs

Pros:

Legal compliance across all US jurisdictions from day one
Safe defaults — unconfigured jurisdictions use most restrictive rules
Cash-out workflow at POS for California compliance (balance <= $10.00)
Gift card liability reporting for accounting (outstanding balances = liability)
State machine enforces valid transitions (no invalid state changes)

Cons:

Jurisdiction rules must be maintained as laws change
Cash-out workflow adds complexity to POS checkout flow
Escheatment reporting (unclaimed property) required in some states after dormancy period
Gift card liability grows over time — reporting must track aging and dormant cards

References

Chapter 05: Architecture Components, Section 1.5 (Gift Card Management)
Chapter 05: Architecture Components, Section 1.5.2 (Jurisdiction Compliance Matrix)
Chapter 05: Architecture Components, Module 7 (Gift Card State Machine)

ADR-025: 6-Status Inventory State Machine

2.25 ADR-025: 6-Status Inventory State Machine

Field	Value
Status	Accepted
Date	2026-02-27
Decision Makers	Architecture Review Team
Context	Inventory at each location needs status tracking beyond simple quantity to manage quality holds, transit, reservations, and damage.

Context

Retail inventory is not simply “in stock” or “out of stock.” BRD v20.0 Section 4.2 defines six inventory statuses with a strict state machine governing transitions: AVAILABLE (sellable), QUARANTINE (quality hold), DAMAGED (cannot sell), PENDING_INSPECTION (received, needs review), RESERVED (allocated to order/transfer), and IN_TRANSIT (moving between locations). Only AVAILABLE stock can be sold at POS or transferred. All status changes require reason codes and are logged to the movement history audit trail.

Decision

We will implement a 6-status inventory state machine where each product-variant-location combination tracks quantity per status. Only AVAILABLE status is sellable. Transitions follow a strict state machine validated at the application layer against a state_transitions reference table.

Considered Options

Binary (in-stock / out-of-stock) — Simple quantity tracking
3-status (Available / Reserved / Damaged) — Minimal status tracking
6-status state machine — Full lifecycle with quality management and transit tracking
Continuous status field — Free-form status string (no transition enforcement)

Decision Outcome

Chosen: 6-status state machine because retail clothing operations require quality holds (QUARANTINE for items with potential defects), receiving inspection (PENDING_INSPECTION for new deliveries), reservation management (RESERVED for carts, transfers, online orders), and transit tracking (IN_TRANSIT between locations). Invalid transitions (e.g., QUARANTINE directly to RESERVED) are rejected by the API.

Trade-offs

Pros:

Only AVAILABLE stock appears as sellable — prevents selling damaged or quarantined items
RESERVED status prevents overselling in multi-terminal, multi-channel environments
IN_TRANSIT gives visibility into inventory movement between locations
Reason codes on every transition create a complete audit trail
State machine prevents invalid transitions (enforced at API and DB level)

Cons:

More complex than simple quantity tracking — 6 quantities per product-location instead of 1
Staff must understand status meanings and transition rules
Reporting must aggregate or filter by status
State machine logic adds validation overhead to every inventory operation

References

Chapter 05: Architecture Components, Section 4.2 (Inventory Status Model)
Chapter 05: Architecture Components, Module 7 (State Machine Reference)
Chapter 08: Entity Specifications

ADR-026: Reservation-Based Inventory Hold Model

2.26 ADR-026: Reservation-Based Inventory Hold Model

Field	Value
Status	Accepted
Date	2026-02-27
Decision Makers	Architecture Review Team
Context	Multiple terminals, parked transactions, online orders, and transfers all compete for the same inventory. A mechanism is needed to prevent overselling.

Context

BRD v20.0 Section 4.2.2 defines five reservation types: Sale Cart (hard reserve until payment or void), Parked Transaction (soft reserve with 4-hour TTL, overridable with warning), Transfer (hard reserve at source until shipped), Online Order (hard reserve at assigned store), and Hold-for-Pickup (hard reserve with configurable expiry, default 48 hours). When two terminals attempt to reserve the last unit simultaneously, first-commit-wins via database-level row locking.

Decision

We will implement a reservation-based inventory hold model with 5 reservation types, each with its own lifecycle and TTL. Reservations atomically move quantity from AVAILABLE to RESERVED. Concurrent conflicts are resolved by database row locking (first-commit-wins).

Considered Options

No reservation (optimistic) — Check quantity at payment time only, accept oversell risk
Soft reservation with warnings — Show warnings but allow selling through reserved stock
Hard reservation with TTL — Atomic reserve on add-to-cart, auto-release on expiry
Mixed hard/soft by type — Hard for carts and online orders, soft for parked transactions

Decision Outcome

Chosen: Mixed hard/soft by type because sale carts, online orders, and transfers need hard reserves to prevent overselling, while parked transactions benefit from soft reserves (other terminals can sell through with a warning, since parked sales may never be completed). Auto-release via background job (every 5 minutes) prevents inventory from being permanently locked by abandoned sessions.

Trade-offs

Pros:

Prevents overselling across multi-terminal, multi-channel environments
Parked transaction soft reserve allows override when stock is genuinely needed
Auto-release on expiry prevents permanent inventory lockup
5 reservation types cover all business scenarios (sale, park, transfer, online, hold)
Database row locking guarantees first-commit-wins under concurrent access

Cons:

Reservation management adds overhead to every cart operation (add/remove/void)
Background expiry job must run reliably (5-minute interval)
Soft reserve override can lead to parked transactions that can’t be recalled (reconciled at recall time)
Reservation table grows with transaction volume (mitigated by archival of COMMITTED/RELEASED records)

References

Chapter 05: Architecture Components, Section 4.2.2 (Reservation Model)
ADR-025: 6-Status Inventory State Machine
ADR-002: Offline-First POS Architecture

ADR-027: RFID Counting-Only Scope (No Lifecycle)

2.27 ADR-027: RFID Counting-Only Scope (No Lifecycle)

Field	Value
Status	Accepted
Date	2026-02-27
Decision Makers	Architecture Review Team
Context	RFID integration scope must be defined — either counting-only or full lifecycle tracking (sales, transfers, receiving).

Context

BRD v20.0 Section 5.16 explicitly scopes RFID as a “dedicated inventory counting subsystem.” RFID readers (Zebra MC3390R, RFD40, FX9600) are used for bulk inventory counting and auditing via the Raptag mobile app. Barcode scanners remain the input device for sales transactions, receiving, and transfers. The rfid_tags table tracks tag status as active, void, or lost — there are no sold_at, transferred_at, or sold_order_id fields.

This separation means RFID and barcode scanning are independent abstractions that coexist: Scanner = barcode (POS register, one-item-at-a-time via USB HID); RFID = counting (Raptag app, 40+ tags/second via radio frequency).

Decision

We will scope RFID to counting and auditing only. RFID does not participate in sales, receiving, or transfer workflows. The core inventory system tracks stock movements via barcode. RFID provides a parallel counting channel for physical inventory verification.

Considered Options

Full RFID lifecycle — Track every tag through sale, transfer, receiving, and returns
Counting-only — RFID for inventory counting and auditing, barcode for all other workflows
Hybrid (phased) — Start with counting, extend to receiving in v2.0

Decision Outcome

Chosen: Counting-only because full lifecycle RFID tracking would require replacing the barcode-based POS checkout flow with RFID readers at every register, fundamentally changing the hardware requirements and staff workflows. Counting-only provides the highest ROI (bulk counts in minutes vs. hours) with minimal disruption to existing barcode-based workflows. Tag status is limited to active, void, lost — no sales or transfer lifecycle fields.

Trade-offs

Pros:

Highest ROI — bulk inventory counts (2,000-100,000 items) completed in minutes vs. hours
No disruption to existing barcode-based POS, receiving, and transfer workflows
Simpler RFID schema — 12 tables vs. potentially 20+ for full lifecycle
Raptag mobile app focused on single purpose (counting) with clear UX
Scope can be expanded to receiving in v2.0 if business case emerges

Cons:

Cannot automatically decrement RFID tag counts on sale (counting snapshot may drift)
Receiving workflow still requires barcode scanning (no RFID speed benefit)
Two parallel inventory tracking systems (barcode quantity vs. RFID tag count) — reconciliation needed
Cannot provide real-time tag location or anti-theft alerts

References

Chapter 05: Architecture Components, Section 5.16 (RFID Configuration)
Ch 05: Architecture Components, Section 5.16 (RFID Counting) (Raptag Mobile chapter — planned future rewrite)
ADR-013: RFID Configuration in Tenant Admin

ADR-028: Physical Count Freeze Period

2.28 ADR-028: Physical Count Freeze Period

Field	Value
Status	Accepted
Date	2026-02-27
Decision Makers	Architecture Review Team
Context	During physical inventory counts, sales and transfers can change stock levels, causing reconciliation errors.

Context

BRD v20.0 Section 4.6.4 defines two counting modes: FREEZE mode (POS sales blocked at counting location, transfers queued) and SNAPSHOT mode (operations continue normally, system reconciles movements post-count). FREEZE mode provides highest accuracy for annual audits. SNAPSHOT mode enables counting during business hours without blocking sales. The mode is chosen per count by the manager and cannot be changed after the count starts.

Decision

We will support configurable count freeze with two modes (FREEZE and SNAPSHOT), selected per count session. FREEZE blocks POS sales at the counting location and queues inbound transfers. SNAPSHOT takes a point-in-time inventory snapshot and reconciles against post-count movements.

Considered Options

Always freeze — Block sales during every count (accurate but high business impact)
Never freeze (snapshot only) — Always count during business hours (lower accuracy)
Configurable per count — Manager chooses FREEZE or SNAPSHOT per count session

Decision Outcome

Chosen: Configurable per count because different counting scenarios have different accuracy requirements. Annual full physical counts benefit from FREEZE mode (after hours, maximum accuracy). Weekly cycle counts and monthly scans use SNAPSHOT mode (during hours, minimal disruption). The system defaults to SNAPSHOT; FREEZE must be explicitly selected by MANAGER/OWNER role.

Trade-offs

Pros:

Maximum flexibility — manager picks the right mode for each situation
FREEZE mode: perfect accuracy, no reconciliation needed
SNAPSHOT mode: zero business disruption, counts during peak hours
SNAPSHOT reconciliation formula: adjusted_expected = snapshot_qty - sales_during_count + receives_during_count
Only MANAGER/OWNER can initiate counts (access-controlled)

Cons:

FREEZE mode blocks revenue during the count window (mitigated by off-hours scheduling)
SNAPSHOT reconciliation is more complex and has slightly lower accuracy
Staff must understand the difference between modes
FREEZE mode queues transfers that must be processed after count approval

References

Chapter 05: Architecture Components, Section 4.6.4 (Configurable Count Freeze)
Chapter 05: Architecture Components, Section 4.6 (Inventory Counting & Auditing)
ADR-025: 6-Status Inventory State Machine

ADR-029: Adjustment Manager Approval (Universal)

2.29 ADR-029: Adjustment Manager Approval (Universal)

Field	Value
Status	Accepted
Date	2026-02-27
Decision Makers	Architecture Review Team
Context	Manual inventory adjustments directly affect stock levels and financial records. A control mechanism is needed.

Context

BRD v20.0 Section 4.7 mandates that all inventory adjustments require manager approval — positive (found stock), negative (shrinkage), and zero-net (reclassification). There is no auto-approval threshold. Adjustments are created with approval_status = PENDING and inventory is NOT changed until a MANAGER or OWNER explicitly approves. Rejected adjustments are preserved for audit. The cost impact (qty_change x weighted_avg_cost) is calculated and shown to the manager before approval.

Decision

We will require universal manager approval for all manual inventory adjustments, regardless of quantity or direction. No threshold-based auto-approval. Inventory quantities change only upon explicit manager approval.

Considered Options

No approval — Staff adjustments apply immediately (fast but no oversight)
Threshold-based — Small adjustments auto-approve, large adjustments require manager
Universal approval — All adjustments require manager review before inventory changes

Decision Outcome

Chosen: Universal approval because inventory accuracy is critical for a multi-store retail operation with financial audit requirements. Even small adjustments can indicate systematic issues (repeated theft, receiving errors). The cost impact display enables managers to make informed decisions. Approved adjustments are logged as ADJUSTMENT_UP or ADJUSTMENT_DOWN movements in the audit trail.

Trade-offs

Pros:

Complete management oversight of all inventory changes
Cost impact shown before approval — managers see financial consequence
PENDING status prevents premature inventory changes
Rejected adjustments preserved for audit — pattern analysis possible
Standard reason codes + custom tenant-defined codes for categorization

Cons:

Manager bottleneck — adjustments may wait for approval (mitigated by push notifications)
Additional workflow steps compared to instant adjustments
Managers must be responsive to avoid approval backlog
No fast-track for trivially small adjustments (by design)

References

Chapter 05: Architecture Components, Section 4.7 (Inventory Adjustments)
ADR-025: 6-Status Inventory State Machine

ADR-030: Auto-Suggest Transfers Algorithm

2.30 ADR-030: Auto-Suggest Transfers Algorithm

Field	Value
Status	Accepted
Date	2026-02-27
Decision Makers	Architecture Review Team
Context	Multi-store retail operations frequently have inventory imbalances — one store overstocked while another is understocked on the same product.

Context

BRD v20.0 Section 4.8.7 defines an auto-suggest transfer algorithm that continuously monitors inventory distribution relative to sales velocity across all locations. The algorithm calculates days of supply per product per location (qty_on_hand / avg_daily_sales_velocity), detects imbalances (one location >60 days of supply, another <15 days), and generates transfer suggestions targeting 30 days of supply at each location. The algorithm runs weekly (configurable) and never creates transfers automatically — all suggestions require manager review.

Decision

We will implement a velocity-based auto-suggest transfer algorithm that analyzes days of supply across locations, generates rebalancing suggestions, and presents them to managers for review and approval. Suggestions that are approved create transfer requests via the standard transfer workflow.

Considered Options

Manual only — Managers identify imbalances and create transfers manually
Rule-based alerts — Alert when stock is below threshold, but no transfer suggestion
Auto-suggest with manager review — Algorithm suggests specific transfers, manager approves
Fully automated — Algorithm creates and ships transfers without human review

Decision Outcome

Chosen: Auto-suggest with manager review because it combines algorithmic efficiency (analyzing hundreds of product-location combinations weekly) with human judgment (manager knowledge of upcoming promotions, seasonal shifts, display requirements). The algorithm provides data-driven starting points; managers adjust quantities and approve or reject.

Trade-offs

Pros:

Data-driven rebalancing across all locations — impossible to replicate manually at scale
Manager review preserves business judgment (upcoming promotions, seasonal knowledge)
Configurable thresholds: overstocked (>60 days), understocked (<15 days), target (30 days)
Trailing 30-day sales velocity adapts to changing demand patterns
Suggestions expire after 7 days if unreviewed — no stale recommendations

Cons:

Algorithm may suggest transfers that conflict with upcoming promotions (mitigated by manager review)
Dead stock (zero velocity at both locations) excluded — requires separate manual review
HQ warehouse uses different thresholds than stores (90-day overstocked threshold)
Weekly batch analysis may miss rapid demand changes (mitigated by on-demand trigger option)

References

Chapter 05: Architecture Components, Section 4.8.7 (Auto-Suggest Transfers)
Chapter 05: Architecture Components, Section 4.5 (Reorder Management)

ADR-031: Shopify Webhook + Polling Dual Sync

2.31 ADR-031: Shopify Webhook + Polling Dual Sync

Field	Value
Status	Accepted
Date	2026-02-27
Decision Makers	Architecture Review Team
Context	Shopify inventory and product sync must be near-real-time with guaranteed eventual consistency.

Context

BRD v20.0 Section 6.3 defines Shopify as the primary e-commerce integration. Shopify webhooks provide near-real-time notifications (products/update, inventory_levels/update, orders/create) but can be missed due to network issues, Shopify outages, or endpoint failures. Shopify retries failed webhook deliveries for 48 hours, but delivery is not guaranteed. The platform must guarantee eventual consistency between POS and Shopify inventory.

This ADR formalizes the dual-sync strategy previously captured in ADR-010. The detailed implementation in Section 6.3 specifies OAuth 2.0/PKCE authentication, GraphQL Admin API at 50 points/second rate limiting, and mandatory @idempotent mutations (required April 2026).

Decision

We will use a Webhook + Polling hybrid for Shopify synchronization. Webhooks provide near-real-time sync (<5 seconds processing) for the common case. Scheduled polling (every 15 minutes) using updated_at_min delta queries catches any missed webhooks. Both paths use the same idempotent event handler pipeline (24-hour dedup window).

Considered Options

Pure Webhook — Rely solely on Shopify webhooks for all sync
Pure Polling — Poll Shopify API on intervals for all changes
Webhook + Polling hybrid — Real-time webhooks with polling fallback

Decision Outcome

Chosen: Webhook + Polling hybrid because webhooks alone cannot guarantee delivery, and polling alone introduces unacceptable latency for inventory updates that could cause overselling. The hybrid approach provides <5-second normal latency with guaranteed eventual consistency via 15-minute polling catchup. Idempotent processing prevents double-counting when both webhook and poll detect the same change.

Trade-offs

Pros:

Near-real-time sync via webhooks (<5 seconds for the common case)
Guaranteed eventual consistency via polling fallback
Idempotent processing handles duplicates from webhook + poll overlap
Rate-limit-aware polling with adaptive backoff protects against API throttling

Cons:

More complex than either pure approach
Polling adds API calls against Shopify rate limits (mitigated by delta queries)
Webhook endpoint requires HMAC signature verification and retry handling
Must maintain webhook registration lifecycle (register on connect, deregister on disconnect)

References

Chapter 05: Architecture Components, Section 6.3 (Shopify Integration)
ADR-010: Shopify Sync Strategy (foundational decision)
Ch 05: Architecture Components, Module 6 (Integrations) (Integration Patterns chapter — planned future rewrite)

ADR-032: Strictest-Rule-Wins Cross-Platform Validation

2.32 ADR-032: Strictest-Rule-Wins Cross-Platform Validation

Field	Value
Status	Accepted
Date	2026-02-27
Decision Makers	Architecture Review Team
Context	Products listed on Shopify, Amazon, and Google Merchant Center must meet each platform’s distinct validation requirements.

Context

BRD v20.0 Section 6.6 defines a unified product data validation matrix comparing field-level requirements across Shopify, Amazon, and Google Merchant Center. Each platform has different constraints — Google limits titles to 150 chars, Amazon requires 1000x1000px minimum images, Shopify is most permissive. The common pattern of “create now, fix later” leads to suppressed listings, disapproved products, and lost revenue.

The strictest-rule-wins principle means: title max 150 chars (Google strictest), image min 1000x1000px (Amazon strictest), no watermarks (Amazon + Google), barcode required (treat as mandatory for channel eligibility), brand required (Amazon + Google).

Decision

We will enforce strictest-rule-wins validation at the point of product data entry in the POS system. Any product passing POS validation is immediately eligible for listing on all connected platforms without remediation. Products failing validation can still be used for in-store POS sales but are blocked from external channel sync.

Considered Options

Per-platform validation at sync time — Validate only when pushing to each platform
Strictest-rule-wins at data entry — Enforce most restrictive requirements for all platforms upfront
Tiered validation — POS-only products have relaxed rules; channel-listed products have strict rules

Decision Outcome

Chosen: Strictest-rule-wins at data entry because it eliminates the expensive “fix after suppression” cycle. Products created correctly the first time avoid listing delays, disapprovals, and the operational cost of chasing validation errors across three platforms. The pre-sync validation engine (PASS/WARN/FAIL) provides clear feedback at product creation time.

Trade-offs

Pros:

Any product passing POS validation is immediately listable on all channels
Eliminates suppressed listings, disapprovals, and remediation cycles
Single validation standard — staff learns one set of rules, not three
Pre-sync engine provides actionable PASS/WARN/FAIL feedback with remediation guidance
Image validation catches the #1 cause of listing suppression (watermarks, resolution, background)

Cons:

POS-only products must meet stricter requirements than necessary (mitigated by channel-listing flag)
Requirements may change as platforms update their rules (mitigated by configurable validation matrix)
More fields required at product creation (brand, weight, barcode) — slightly longer data entry
Google’s 150-char title limit is more restrictive than many retailers want

References

Chapter 05: Architecture Components, Section 6.6 (Cross-Platform Product Data Requirements)
Chapter 05: Architecture Components, Section 6.6.2 (Image Requirements Matrix)

ADR-033: Amazon SP-API Integration Strategy

2.33 ADR-033: Amazon SP-API Integration Strategy

Field	Value
Status	Accepted
Date	2026-02-27
Decision Makers	Architecture Review Team
Context	Multi-channel retail requires Amazon marketplace integration for product listings, order fulfillment, and inventory sync.

Context

BRD v20.0 Section 6.4 defines Amazon integration via the Selling Partner API (SP-API) with OAuth 2.0/LWA authentication, regional endpoints (NA/EU/FE), and support for both FBA (Fulfilled by Amazon) and FBM (Fulfilled by Merchant) fulfillment models. The integration covers catalog items API, listings API, orders API, feeds API, and push notifications (SQS). Amazon SP-API polls every 2 minutes for inventory updates.

Key constraints: rate limits vary by API (5 requests/second for catalog, 1 request/second for feeds), access tokens expire every 1 hour, and per-marketplace pricing is required.

Decision

We will integrate with Amazon SP-API supporting both FBA and FBM fulfillment models, with OAuth 2.0/LWA token lifecycle management, per-marketplace catalog sync, and SQS-based push notifications for order and inventory events.

Considered Options

No Amazon integration — POS-only retail without Amazon marketplace
Amazon MWS (legacy) — Older Marketplace Web Service API (deprecated)
Amazon SP-API — Current Selling Partner API with OAuth 2.0 and modern endpoints
Third-party aggregator — Use a service like ChannelAdvisor to manage Amazon listing

Decision Outcome

Chosen: Direct Amazon SP-API integration because it provides full control over the integration, avoids third-party aggregator fees, and aligns with the provider abstraction architecture (IIntegrationProvider interface). MWS is deprecated. The POS backend handles token lifecycle transparently — proactive refresh at T-5 minutes before expiry, fallback force-refresh on 401 responses.

Trade-offs

Pros:

Full control over catalog, listing, order, and inventory sync
FBA + FBM support — tenants choose fulfillment model per product
SQS push notifications reduce polling overhead for order/inventory events
OAuth 2.0/LWA aligns with modern authentication standards
Per-marketplace support (US, CA, MX under NA region)

Cons:

Complex API with different rate limits per endpoint
Token management complexity (1-hour expiry, proactive refresh)
Amazon-specific field mappings (Browse Node taxonomy, product type definitions)
Amazon Brand Registry requirements add complexity for branded products
FBA inventory is read-only (Amazon manages stock) — requires separate monitoring

References

Chapter 05: Architecture Components, Section 6.4 (Amazon SP-API Integration)
Ch 05: Architecture Components, Module 6 (Integrations) (Integration Patterns chapter — planned future rewrite)
ADR-032: Strictest-Rule-Wins Cross-Platform Validation

ADR-034: Google Merchant Center Feed Strategy

2.34 ADR-034: Google Merchant Center Feed Strategy

Field	Value
Status	Accepted
Date	2026-02-27
Decision Makers	Architecture Review Team
Context	Google Shopping and Local Inventory Ads require product data feeds managed via the Google Merchant API.

Context

BRD v20.0 Section 6.5 defines Google Merchant Center integration for product data management, local inventory advertising, and Google Business Profile linkage. CRITICAL: The Content API for Shopping reaches end-of-life on August 18, 2026 — all new development MUST target the Merchant API (v1beta/v1). Google uses OAuth 2.0 with service accounts (self-signed JWTs exchanged for 60-minute access tokens).

The Merchant API separates writes (ProductInput resource) from reads (Product resource — Google-enriched version after validation). Disapproval prevention is critical as Google can suspend product listings for policy violations.

Decision

We will target the Merchant API (v1beta/v1) from day one, with OAuth 2.0 service account authentication, outbound product feed management, and local inventory advertising. No development against the deprecated Content API.

Considered Options

Content API (v2.1) — Current API but reaching EOL August 2026
Merchant API (v1beta/v1) — New API, long-term supported
Supplemental feed only — Use Google’s automated crawl with supplemental data
Third-party feed manager — Delegate to GoDataFeed, DataFeedWatch, etc.

Decision Outcome

Chosen: Direct Merchant API integration because Content API EOL is August 2026 (within the platform’s launch timeline), the Merchant API provides new features (local inventory, GBP integration) only available on the new API, and direct integration avoids third-party feed manager costs. Service account auth with tenant-specific encryption keys aligns with the credential vault architecture.

Trade-offs

Pros:

Future-proof — no migration needed when Content API shuts down
Local Inventory Ads support for brick-and-mortar stores
Google Business Profile linkage for “available nearby” search results
Product status API enables proactive disapproval monitoring and remediation
Service account auth avoids user-interactive OAuth flows

Cons:

Merchant API is still in v1beta — minor API changes possible before GA
Google processing adds 30-minute latency to inventory updates
Product disapproval rules are complex and change frequently
Service account JSON key management adds security complexity (AES-256-GCM encrypted at rest)
2x daily batch sync cadence for Google (vs. near-real-time for Shopify)

References

Chapter 05: Architecture Components, Section 6.5 (Google Merchant API Integration)
ADR-032: Strictest-Rule-Wins Cross-Platform Validation

ADR-035: Channel Safety Buffer Calculation

2.35 ADR-035: Channel Safety Buffer Calculation

Field	Value
Status	Accepted
Date	2026-02-27
Decision Makers	Architecture Review Team
Context	External channels sync inventory with varying latency (Shopify <5s, Amazon <2min, Google <30min). During sync gaps, concurrent sales can cause overselling.

Context

BRD v20.0 Section 6.7.2 defines safety buffers that withhold a configurable number of units from external channel listings. The primary formula is: Channel Available Qty = POS Available Qty - Safety Buffer. Three buffer modes are supported: FIXED (subtract fixed units), PERCENTAGE (subtract % of stock), and MIN_RESERVE (floor-based). Buffers are configurable per-product, per-channel, with a 4-level priority resolution: product+channel > product > channel > tenant-wide default.

Recommended defaults: Shopify 0-2 units (low latency), Amazon FBM 5-10% (2-minute lag), Google Merchant 10-15% (30-minute processing delay).

Decision

We will implement configurable safety buffers per product per channel with 3 calculation modes and 4-level priority resolution. Higher-latency channels receive larger default buffers to compensate for sync lag.

Considered Options

No buffers — List full POS quantity on all channels (highest oversell risk)
Flat global buffer — Same buffer for all channels and products
Per-channel default buffers — Different buffer per channel, same for all products
Per-product per-channel configurable — Full flexibility with priority resolution

Decision Outcome

Chosen: Per-product per-channel configurable because sync latency varies dramatically between channels (Shopify <5s vs. Google <30min), and high-velocity products need different buffers than slow movers. The 4-level priority resolution enables tenants to set sensible defaults while overriding for specific products or channels. min_channel_qty threshold hides products from channel when available falls below minimum (default: 1).

Trade-offs

Pros:

Tunable oversell protection per channel based on sync latency
High-velocity products can have larger buffers than slow movers
max_channel_qty cap prevents revealing full warehouse stock to competitors
Priority resolution (product+channel > product > channel > tenant) minimizes configuration effort
Walk-in customer stock protected — buffers ensure in-store availability

Cons:

Configuration complexity — many possible combinations of product x channel x mode
Buffers reduce listed quantity — may lose online sales if set too aggressively
Buffer calculations add overhead to every inventory sync event
Must recalculate buffers when POS quantity changes (event-driven)

References

Chapter 05: Architecture Components, Section 6.7.2 (Safety Buffer Configuration)
Chapter 05: Architecture Components, Section 6.7.3 (Oversell Prevention Rules)
ADR-031: Shopify Webhook + Polling Dual Sync

ADR-036: POS-Master Default for External Channels

2.36 ADR-036: POS-Master Default for External Channels

Field	Value
Status	Accepted
Date	2026-02-27
Decision Makers	Architecture Review Team
Context	When product data conflicts exist between POS and external channels, a source-of-truth must be defined.

Context

BRD v20.0 Section 6.1 establishes that the POS system is the “single source of truth for product data.” All external channels (Shopify, Amazon, Google Merchant) receive product catalog and inventory levels from the POS system. No external channel can directly modify POS inventory — all inbound changes are processed through the sync engine with conflict resolution. Section 6.7 states: “Auto-correction pushes POS quantity to the platform (POS always wins in reconciliation).”

Decision

We will use POS-master default where the POS system is the authoritative source for product data and inventory levels. External channels receive computed quantities. During reconciliation, discrepancies between POS and channel-reported quantities are resolved by pushing the POS value to the channel.

Considered Options

POS-master — POS is source of truth, channels receive data from POS
Channel-master — Each channel is source of truth for its own data
Bidirectional merge — Changes from any source merged via conflict resolution
Last-write-wins — Most recent change from any source wins

Decision Outcome

Chosen: POS-master because the physical store is where inventory physically exists. The POS system tracks every stock movement (sale, return, adjustment, transfer, count, receiving) with a complete audit trail. External channels may report stale or incorrect quantities due to sync delays, customer cancellations, or platform glitches. POS-master ensures one source of truth for financial reporting and inventory accuracy.

Trade-offs

Pros:

Single source of truth — no ambiguity about correct inventory levels
Reconciliation is deterministic — POS always wins, no merge conflicts
Financial reports based on POS data (auditable, event-sourced)
Protects against external platform data corruption or unauthorized changes
Simplifies sync architecture — one-way authority, bidirectional data flow

Cons:

Shopify admin inventory adjustments are overwritten at next reconciliation
Staff must make all inventory changes in the POS system, not in external platforms
If POS data is incorrect, the error propagates to all channels
External-only inventory (e.g., FBA stock managed by Amazon) must be handled as read-only exception

References

Chapter 05: Architecture Components, Section 6.1 (Integration Overview)
Chapter 05: Architecture Components, Section 6.7 (Cross-Platform Inventory Sync)
ADR-035: Channel Safety Buffer Calculation

ADR-037: Offline Conflict Resolution via CRDTs

SUPERSEDED: This ADR has been superseded by ADR-048 (Online-First with Offline Fallback). CRDTs were eliminated in v6.2.0. This record is preserved for historical context.

2.37 ADR-037: Offline Conflict Resolution via CRDTs

Field	Value
Status	Superseded (by ADR-048)
Date	2026-02-27
Decision Makers	Architecture Review Team
Context	When multiple POS terminals operate offline simultaneously, their local changes must merge without data loss when connectivity is restored.

Context

Chapter 04, Section L.10A.1H defines CRDTs (Conflict-free Replicated Data Types) as the merge strategy for offline POS terminals. The traditional sync problem: Terminal A sells 5 units offline (local: 95), Terminal B receives shipment +20 offline (local: 120) — neither 95 nor 120 is correct; the answer is 115. CRDTs solve this by tracking operations, not state.

Four CRDT types are used: PN-Counter for inventory levels (+/-), LWW-Register for price updates (highest timestamp wins), OR-Set for cart items and discounts (union with tombstones), and G-Counter for transaction counts (sum all increments). Sales themselves are conflict-free by nature (append-only events with unique IDs).

Decision

We will use CRDTs for offline data merge alongside append-only event queuing for sales. PN-Counters track inventory, LWW-Registers track last-modified data, OR-Sets track collections, and G-Counters track monotonic counts. This is complementary to ADR-015 (Queue-and-Sync).

Considered Options

Last-write-wins globally — Most recent change overwrites all others
Server-authoritative — Server state overwrites all offline changes
Operational transforms (OT) — Transform operations based on concurrent edits
CRDTs — Mathematically guaranteed convergence without coordination

Decision Outcome

Chosen: CRDTs because they are mathematically guaranteed to converge regardless of message ordering, duplication, or network partition duration. PN-Counters are particularly suited for inventory (sum increments, sum decrements, compute net). Sales events are inherently conflict-free (append-only with unique IDs), so CRDTs complement rather than replace event sourcing.

Trade-offs

Pros:

Mathematically guaranteed convergence — no coordination required between terminals
PN-Counter correctly handles concurrent sales and receives (example: 100 - 5 + 20 = 115)
LWW-Register handles price updates with deterministic resolution (highest timestamp)
OR-Set handles cart item additions/removals with tombstone-based conflict resolution
No data loss — all offline operations are preserved and merged

Cons:

CRDT implementation adds complexity to the sync layer
PN-Counters can temporarily show incorrect inventory until all terminals sync
OR-Set tombstones require periodic compaction (7-day TTL)
Development team must understand CRDT semantics for correct implementation
MV-Register (for customer preferences) keeps all concurrent values — may need manual resolution

References

Chapter 04: Architecture Styles, Section L.10A.1H (CRDTs)
ADR-015: Offline Sync Strategy (Queue-and-Sync with CRDTs)
ADR-002: Offline-First POS Architecture

ADR-038: Transactional Outbox for Event Publishing

2.38 ADR-038: Transactional Outbox for Event Publishing

Field	Value
Status	Accepted
Date	2026-02-27
Decision Makers	Architecture Review Team
Context	Domain events must be reliably published to downstream consumers (Socket.io, webhooks, sync engine) without losing events or creating inconsistency.

Context

Chapter 04, Section L.4A defines the Transactional Outbox pattern: business data and outbox event are written atomically in the same database transaction. A relay process polls the outbox and publishes events, guaranteeing at-least-once delivery without distributed transactions. The event_outbox table (Section L.4A.1) stores: event_id, destination (socketio/webhook/sync), status (pending/processed), attempts, last_error, and timestamps.

This eliminates the dual-write problem: if the application writes to the database but the event publish fails (or vice versa), data and events become inconsistent. The outbox ensures both succeed or both fail within the same DB transaction.

Decision

We will use a Transactional Outbox pattern with a PostgreSQL event_outbox table. Domain events are written to the outbox in the same transaction as the business data. A background relay polls the outbox and publishes events to destinations (Socket.io rooms, webhook endpoints, sync engine).

Considered Options

Publish-then-write — Publish event first, then write to database (lost data if DB write fails)
Write-then-publish — Write to database first, then publish event (lost events if publish fails)
Distributed transaction (2PC) — Coordinate DB and message broker atomically (complex, slow)
Transactional Outbox — Write data + event in same DB transaction, relay publishes asynchronously

Decision Outcome

Chosen: Transactional Outbox because it guarantees at-least-once delivery using only the existing PostgreSQL database — no additional message broker infrastructure required for v1.0. The outbox relay runs as a background service, polling every 1 second for pending events. Failed publications are retried with exponential backoff and eventually routed to a dead-letter table.

Trade-offs

Pros:

Atomic write — business data and event are guaranteed consistent
No additional infrastructure — uses PostgreSQL (already deployed)
At-least-once delivery with retry and dead-letter handling
Destinations are pluggable (Socket.io, webhook, sync, future Kafka)
Works with PostgreSQL LISTEN/NOTIFY for low-latency relay notification

Cons:

Polling relay adds slight latency vs. direct publish (~1 second)
Outbox table grows and needs periodic cleanup (processed events archived)
At-least-once means consumers must be idempotent (handled by idempotency framework)
Single relay process is a potential bottleneck (mitigated by partition-based relay in v2.0)

References

Chapter 04: Architecture Styles, Section L.4A.1 (Event Store & Outbox Schema)
Chapter 05: Architecture Components, Section 6.2.3 (Transactional Outbox)
ADR-003: Event Sourcing for Sales Domain

ADR-039: CQRS Boundary (Sales Domain Only)

2.39 ADR-039: CQRS Boundary (Sales Domain Only)

Field	Value
Status	Accepted
Date	2026-02-27
Decision Makers	Architecture Review Team
Context	CQRS adds complexity; it should be applied only where the read/write model divergence justifies the overhead.

Context

Chapter 04, Section L.4A defines per-module CQRS scope. Module 1 (Sales) uses full CQRS with separate read/write models and Event Sourcing. Module 4 (Inventory) uses materialized read models with ES for audit trail. Modules 2 (Customers), 3 (Catalog), 5 (Setup) use standard CRUD. Module 6 (Integrations) uses audit-trail-only ES. The Sales domain has the strongest case for CQRS: financial audit requirements, offline sync via event replay, temporal queries (“what was inventory at 3pm?”), and complex read models (dashboard aggregations).

Decision

We will apply full CQRS only to the Sales domain (Module 1). All other modules use standard CRUD with optional materialized views for performance. A command/query bus dispatches commands and queries in the Sales module.

Considered Options

CQRS everywhere — Full CQRS for all modules
CQRS for Sales only — Full CQRS for Sales, CRUD for everything else
CQRS for Sales + Inventory — Full CQRS for both financial domains
No CQRS — Standard CRUD everywhere with audit logging

Decision Outcome

Chosen: CQRS for Sales only because Sales has the strongest requirement (PCI-DSS audit trail, offline event replay, complex read models for dashboards). Inventory uses a lighter pattern — materialized read models for current levels with ES for the movement audit trail, but not full CQRS command/query separation. Applying CQRS everywhere would add unnecessary complexity to simple CRUD modules like Customers and Setup.

Trade-offs

Pros:

Full audit trail and temporal queries for the financial domain (Sales)
Command/query bus dispatch provides clean separation of concerns
Read models optimized for dashboard queries without affecting write performance
Non-Sales modules remain simple CRUD — lower development and maintenance cost
Event replay capability for offline sync and debugging

Cons:

Developers must understand two patterns (CQRS for Sales, CRUD for others)
Read model projections must be rebuilt if projection logic changes
Event versioning adds complexity for Sales domain events
Boundary between CQRS and CRUD modules must be clearly documented

References

Chapter 04: Architecture Styles, Section L.4A (CQRS & Event Sourcing Scope)
ADR-003: Event Sourcing for Sales Domain
ADR-038: Transactional Outbox for Event Publishing

ADR-040: Eventual Consistency SLA (5s Online, 30min Offline)

2.40 ADR-040: Eventual Consistency SLA

Field	Value
Status	Accepted
Date	2026-02-27
Decision Makers	Architecture Review Team
Context	The platform accepts eventual consistency for inventory sync. Concrete SLA targets are needed for each sync channel.

Context

Chapter 04, Section L.10A.1 establishes that the online-first architecture accepts eventual consistency for inventory sync across channels. BRD v20.0 Section 6.7.1 defines per-channel sync latency targets: Shopify <5 seconds via webhooks, Amazon FBM <2 minutes via SP-API push, Google Merchant <30 minutes (Google processing time). Reconciliation polls run at defined intervals (Shopify 15min, Amazon 30min, Google 6hr). POS terminals operating in offline fallback mode sync critical data within 30 seconds of connectivity restoration.

Decision

We will define explicit eventual consistency SLAs per sync channel with target latencies, reconciliation intervals, and maximum acceptable lag. Online POS terminals have 5-second consistency targets; offline terminals sync critical data within 30 seconds of reconnection.

Considered Options

Strong consistency — All changes immediately visible everywhere (requires always-online)
Best-effort eventual — No defined SLA, sync when possible
Tiered SLA per channel — Explicit targets per sync channel and data priority

Decision Outcome

Chosen: Tiered SLA per channel because different channels have fundamentally different latency characteristics and business impact. Shopify needs near-real-time to prevent overselling; Google Merchant tolerates 30-minute processing; offline POS terminals prioritize sales/payment sync over analytics. The SLA framework provides measurable targets for monitoring and alerting.

Trade-offs

Pros:

Measurable sync targets for monitoring and SLA alerting
Priority-based sync ensures critical financial data (sales, payments) syncs first
Channel-specific targets match actual platform capabilities
Reconciliation intervals catch drift before it becomes operationally significant

Cons:

Inventory counts may be temporarily inaccurate across channels during sync windows
Overselling possible during sync gaps (mitigated by safety buffers — ADR-035)
Monitoring infrastructure needed to track sync latency per channel
Offline sync queue may grow large during extended outages (capped at 100 transactions)

References

Chapter 04: Architecture Styles, Section L.10A.1 (Online-First with Offline Fallback)
Chapter 05: Architecture Components, Section 6.7.1 (Sync Latency Targets)
ADR-048: Online-First POS Data Strategy
ADR-035: Channel Safety Buffer Calculation

ADR-041: 6-Gate Security Pyramid

2.41 ADR-041: 6-Gate Security Pyramid

Field	Value
Status	Accepted
Date	2026-02-27
Decision Makers	Architecture Review Team, Security Team
Context	The codebase is generated by Claude Code agents. A single security gate is insufficient for AI-generated code that processes financial transactions.

Context

Chapter 04, Section L.8 identifies that AI-generated code requires defense-in-depth security validation. A single SonarQube gate cannot catch missing authorization checks, incorrect OAuth implementation, SAQ-A violations, architecture drift, or insecure CORS/CSP headers. The platform processes PCI-scoped financial transactions and stores encrypted credentials for 6 external provider families.

The 6-Gate Security Pyramid provides layered verification: SAST (Gate 1), SCA + SBOM (Gate 2), Secrets Detection (Gate 3), Architecture Conformance (Gate 4), Contract Tests (Gate 5), and Manual Security Review (Gate 6). All 6 gates block merge. FIM via Wazuh monitors deployed systems.

Decision

We will implement a 6-Gate Security Test Pyramid in the CI/CD pipeline where all 6 gates must pass before code can be merged to the main branch.

Considered Options

Single SAST gate — SonarQube/CodeQL only
SAST + SCA — Static analysis plus dependency scanning
Cloud security suite — Snyk/Datadog full platform (vendor-dependent)
6-Gate Pyramid — Layered security with SAST, SCA, Secrets, ArchUnit, Pact, Manual

Decision Outcome

Chosen: 6-Gate Pyramid because each gate catches different vulnerability classes that others miss. SAST finds code-level bugs; SCA finds vulnerable dependencies; Secrets Detection finds leaked credentials; Architecture Conformance prevents module boundary violations; Contract Tests verify external API behavior; Manual Review covers security-critical paths that automated tools cannot fully validate. All gates are merge-blocking.

Trade-offs

Pros:

Defense-in-depth — 6 independent verification layers
SBOM generation (Gate 2) satisfies PCI-DSS 4.0 Req 6.3.2
Architecture Conformance (Gate 4) prevents Module 6 from accessing Module 1 internals
Contract Tests (Gate 5) verify Shopify/Amazon/Google sandbox API behavior
Manual Review (Gate 6) provides human oversight for payment and credential flows

Cons:

6 gates add CI/CD pipeline time (mitigated by parallel execution of Gates 1-4)
Manual Review (Gate 6) creates human bottleneck for security-tagged PRs
Must maintain ArchUnit rules and Pact contracts as system evolves
Tooling cost: SonarQube, Snyk, GitLeaks, ArchUnit, Pact licenses

References

Chapter 04: Architecture Styles, Section L.8 (Security & Compliance Strategy)
Ch 04: Architecture Styles, Section L.8 (Security) (Security Compliance chapter — planned future rewrite)
ADR-019: SAQ-A Semi-Integrated Payment Scope

ADR-042: [REMOVED — Duplicate of ADR-017]

This ADR was removed in v6.1.0. The E2E testing strategy (Playwright + k6) is fully covered by ADR-017: Test Strategy (Layered Testing Pyramid). Consolidating to avoid duplicated guidance.

ADR-043: [REMOVED — Duplicate of ADR-012]

This ADR was removed in v6.1.0. The LGTM Observability Stack is fully covered by ADR-012: Logging & Monitoring (LGTM Stack). Consolidating to avoid duplicated guidance.

ADR-044: API Performance Targets

2.44 ADR-044: API Performance Targets

Field	Value
Status	Accepted
Date	2026-02-27
Decision Makers	Architecture Review Team
Context	The POS API must meet specific latency targets to ensure responsive checkout and withstand peak retail traffic.

Context

Chapter 04, Section L.6 defines the Black Friday load testing scenario: 500 concurrent users, 1000 TPS target, p99 latency <500ms over 30 minutes. The API Gateway processes requests through 5 stages: rate limiting (100 req/min/client), JWT authentication, tenant resolution, request logging, and route dispatch. Redis caching provides sub-millisecond reads for product catalog, tax rates, and tenant configuration during checkout.

The POS checkout path is latency-critical: cashiers expect instant response to item scan, price lookup, and payment initiation. Non-checkout paths (reporting, configuration) have relaxed targets.

Decision

We will define explicit API performance targets with p99 latency budgets per endpoint category, validated by k6 load testing in CI/CD.

Performance Targets:

Endpoint Category	p99 Target	Measurement
Checkout (item scan, payment)	< 500ms	k6 load test
Product lookup (cached)	< 100ms	Redis cache hit
Inventory query	< 200ms	Materialized read model
Reporting / Dashboard	< 2s	Acceptable for non-interactive
Webhook processing	< 5s	Shopify, Amazon inbound

Considered Options

No defined targets — Optimize as needed based on user complaints
Single global target — One latency target for all endpoints
Tiered targets by category — Different targets for checkout vs. reporting vs. webhook

Decision Outcome

Chosen: Tiered targets by category because checkout latency directly impacts cashier productivity and customer experience, while reporting and dashboard queries are inherently slower and non-blocking. The k6 load testing framework (ADR-017) validates these targets on every release candidate.

Trade-offs

Pros:

Clear, measurable targets for development teams
k6 load tests enforce targets in CI/CD — performance regressions caught before deployment
Redis caching ensures sub-100ms product lookups during checkout
Tiered approach avoids over-engineering low-priority endpoints

Cons:

Must maintain k6 test scripts as API evolves
Load testing requires dedicated environment (resource cost)
Targets may need revision as user base scales
p99 targets require careful measurement methodology (warm-up periods, steady state)

References

Chapter 04: Architecture Styles, Section L.6 (Load Testing)
Chapter 04: Architecture Styles, Section L.9A (System Architecture)
ADR-009: Redis for Session & Cache
ADR-017: Test Strategy (Layered Testing Pyramid)

ADR-045: Blue-Green Deployment Strategy

2.45 ADR-045: Blue-Green Deployment Strategy

Field	Value
Status	Accepted
Date	2026-02-27
Decision Makers	Architecture Review Team, Infrastructure Team
Context	Deployments of the Central API must not disrupt active POS terminals, ongoing payment transactions, or integration sync operations.

Context

The Architecture Styles Review (Upload/Architecture-Styles-Review.md, Finding HIGH-5) identified that no deployment strategy was specified. A failed deployment could break inventory sync across all channels for all tenants. BRD Section 6.7.5 mandates channel freeze after 2 hours of sync failure. The modular monolith architecture means the entire Central API deploys as a single unit — a failed deployment affects all modules simultaneously.

Database migration rollback, integration freeze procedures, and health check-based automatic rollback are required for safe deployments.

Decision

We will use blue-green deployment with automatic rollback on health check failure. The load balancer switches traffic from the current (blue) environment to the new (green) environment only after health checks pass. If health checks fail, traffic automatically routes back to blue.

Considered Options

Rolling update — Gradually replace instances (risk of mixed-version routing)
Canary deployment — Route small percentage to new version, gradually increase
Blue-green deployment — Full parallel environment with instant switchover
Feature flags only — Deploy code but gate features behind flags

Decision Outcome

Chosen: Blue-green deployment because the modular monolith deploys as a single unit, making canary (partial routing) complex without microservices boundaries. Blue-green provides instant rollback by switching the load balancer back to the previous environment. Database migrations must be backward-compatible (expand-then-contract pattern) so both blue and green can run against the same database schema during transition.

Trade-offs

Pros:

Instant rollback — switch load balancer back to previous environment
Zero-downtime deployment — green environment validated before receiving traffic
Health check validation before cutover (API, database connectivity, Redis, integration endpoints)
Full environment parity — green runs the same infrastructure as blue
Simplifies post-deployment verification — green serves all traffic or none

Cons:

Requires 2x infrastructure during deployment window (cost)
Database migrations must be backward-compatible (expand-then-contract)
Long-running transactions during switchover may be interrupted
POS terminal WebSocket (Socket.io) connections must reconnect after switchover
Integration webhook endpoints must handle brief unavailability during DNS propagation

References

Architecture Styles Review, Finding HIGH-5 (deployment strategy gap)
Chapter 04: Architecture Styles, Section L.9A (System Architecture)
Ch 04: Architecture Styles, Section L.9A (System Architecture) (Deployment Guide chapter — planned future rewrite)

ADR-046: Nexus Dual Deployment Architecture (Tauri Desktop + Web App)

Superseded by ADR-052: The dual deployment architecture (Tauri desktop + separate web admin) has been replaced by a unified React web application. “Nexus POS” is now a single web app with role-based navigation. Hardware peripherals use web protocols (Star WebPRNT, USB HID, Stripe Terminal SDK) instead of Tauri Rust commands. See ADR-052.

2.46 ADR-046: Nexus Dual Deployment Architecture

Field	Value
Status	SUPERSEDED (by ADR-052: Unified Web Application)
Date	2026-02-28
Decision Makers	Architecture Review Team
Context	The platform needs both a desktop POS application (store terminals with hardware access and offline capability) and a web-based admin interface (browser-based management). Previously these were separate applications with separate codebases (ADR-007: Blazor Server admin, ADR-008: .NET MAUI POS). This created duplicated UI development and inconsistent UX.

Context

The platform requires two deployment targets for its user interface: (1) a desktop POS application running on store terminals with hardware access (receipt printers, barcode scanners, cash drawers), offline-first SQLite storage, and sync capability; and (2) a web-based administration interface for tenant managers to configure products, employees, locations, integrations, and view reports from any browser.

Previously, these were planned as separate applications with separate codebases (ADR-007: Blazor Server for Admin Portal, ADR-008: .NET MAUI for POS Client). This approach would have duplicated UI components, state management logic, and design system implementation across two different frameworks.

With the tech stack pivot to TypeScript (ADR-006), both targets can share a single React/TypeScript codebase — deployed as a Tauri 2.0 desktop app for POS terminals and as a standard React web app for admin browser access.

Decision

We will use a single React/TypeScript codebase deployed in two modes:

Nexus POS (Tauri 2.0 desktop): For store terminals. Includes hardware access (printers, scanners, drawers via Tauri commands), local SQLite database (better-sqlite3), offline-first capability with sync queue.
Nexus Admin (React web app): For administrator browser access. Standard React SPA served via CDN or Central API static hosting. No hardware access needed, always-online, connects directly to Central API.

Product naming: Nexus POS (desktop), Nexus Admin (web), Nexus Raptag (mobile RFID).

Considered Options

Separate codebases — Different frameworks for desktop (Tauri) and web (Next.js/React)
Single codebase with dual deployment — Same React app, Tauri wraps for desktop, deployed as web for admin
Desktop-only with remote access — All users including admins use Tauri app

Decision Outcome

Chosen: Single codebase with dual deployment because it reduces UI development by ~40%, ensures consistent UX between POS and admin, and shares all components, routing, and state management. Hardware-dependent features are abstracted behind isTauri() runtime checks. Admin-only and POS-only routes use role-based code splitting.

Trade-offs

Pros:

Single React component library — design once, deploy twice
Shared state management (React Query + Zustand) across both targets
Consistent UX — admin and POS share visual language
Conditional hardware features via Tauri API detection (window.__TAURI__)
One design system (TailwindCSS + shadcn/ui or Radix UI)
Shared authentication flow — JWT tokens work identically in both targets

Cons:

Must carefully abstract hardware-dependent code behind feature checks
Some admin-only views (reporting, user management) not needed on POS (managed via route-based code splitting and lazy loading)
Tauri-specific Rust commands need separate build pipeline alongside TypeScript
Web and desktop share the same online-first data strategy (React Query → Central API) but desktop adds a thin offline fallback (2-table SQLite: product cache + sales queue). The abstraction layer detects connectivity state and routes transparently (see ADR-048).

Implementation Risks

Testing surface doubles — Every data hook needs testing in both Tauri and web mode. Mitigation: CI runs test suite with isTauri() mocked to both true and false.
Feature drift — POS gets hardware features, Admin gets reporting dashboards, shared codebase becomes conditional-heavy. Mitigation: Route-based code splitting, lazy loading, shared components must never import platform-specific code.
Offline cache staleness — SQLite product cache (2 tables) may have stale prices during brief outages. Mitigation: Flag-on-sync detects price discrepancies; cache shows last_refreshed warning after 1 hour offline (see ADR-048, L.10A.1E).
Tauri Rust command maintenance — Custom Rust commands for hardware access need Rust-capable developers. Mitigation: Limit custom commands to thin wrappers; most hardware access via established Tauri plugins.

Supersedes

ADR-007: Admin Portal Framework (Blazor Server) — the separate Admin Portal has been eliminated; administration is now integrated into the Nexus web application
ADR-013: RFID Configuration in Tenant Admin Portal — the “Admin Portal” concept has been replaced by Nexus Admin; RFID configuration is accessed via Nexus Admin > Settings > RFID section

References

ADR-008: POS Client Framework (Tauri 2.0 + React/TypeScript)
ADR-006: Node.js + TypeScript for Central API
ADR-047: Raptag Mobile Framework (React Native)
ADR-048: Online-First POS Data Strategy

ADR-047: Raptag Mobile Framework — React Native

2.47 ADR-047: Raptag Mobile Framework

Field	Value
Status	Accepted
Date	2026-02-28
Decision Makers	Architecture Review Team
Context	The Nexus Raptag RFID counting app runs on Android mobile devices with Zebra RFID readers (TC21/TC26 with RFD40 sleds). It requires Zebra RFID SDK integration, offline-first SQLite storage, barcode scanning, and sync with the Central API. With the tech stack pivot to TypeScript (ADR-006), the mobile framework should align with the unified language strategy.

Context

The Nexus Raptag mobile app is a dedicated RFID inventory counting application used by store staff on Android handheld devices (Zebra TC21/TC26 with RFD40 RFID sleds). It requires integration with the Zebra RFID SDK for bulk tag reading (40+ tags/second), offline-first SQLite storage for counting sessions, barcode scanning for item lookup, and background sync with the Central API for uploading count results.

With the platform standardized on TypeScript (Central API via Node.js, Nexus POS via React per ADR-052), the mobile framework should maintain the unified language strategy to enable code sharing and reduce developer context-switching.

Decision

We will use React Native with Expo for the Nexus Raptag mobile RFID app.

Considered Options

.NET MAUI — Cross-platform .NET mobile framework (rejected: different language ecosystem from TypeScript stack)
React Native + Expo — TypeScript-based mobile framework with native module support (chosen)
Flutter — Dart-based cross-platform framework (rejected: Dart language breaks TypeScript unity)
Kotlin native — Android-only native development (rejected: no code sharing with web/desktop)

Decision Outcome

Chosen: React Native with Expo because it maintains TypeScript as the unified language across the entire platform, enables sharing of business logic, domain types, and validation schemas with the Central API via npm packages, and provides Expo OTA updates for pushing RFID configuration changes to field devices without app store review cycles.

Trade-offs

Pros:

Unified TypeScript — same types, validators (Zod), and API client shared with Central API
Expo OTA updates — critical for deploying RFID configuration changes to field devices without app store review
Shared npm packages — domain models, API types, validation schemas reused across all platform clients
React component patterns familiar to Nexus POS developers — reduced learning curve
Hot reload during development — fast iteration on scanning UI
React Native New Architecture (Fabric + TurboModules) provides near-native performance for scanning UI

Cons:

Zebra RFID SDK bridge requires native Java/Kotlin module maintenance
React Native performance adequate for scanning UI but not compute-heavy tasks (acceptable — RFID counting is I/O-bound)
Expo may need ejection for deep Zebra hardware integration (Expo Dev Client handles this without full ejection)
Larger APK size than pure native (~30MB vs ~10MB) — acceptable for enterprise devices

References

ADR-027: RFID Counting-Only Scope
ADR-052: Unified Web Application (Nexus POS)
Ch 05: Architecture Components, Section 5.16 (RFID Counting Subsystem)

ADR-048: Online-First POS Data Strategy

2.48 ADR-048: Online-First POS Data Strategy

Field	Value
Status	Accepted
Date	2026-03-01
Decision Makers	Architecture Review Team
Context	The Blueprint originally specified offline-first architecture for POS terminals (ADR-002) — 6-table SQLite cache, CRDTs for conflict resolution, sync queue with priority tiers, platform-aware data hooks. Through structured analysis of the target retail environment, this was found to create daily complexity for a scenario (internet outages) that occurs minutes per year.

Context

The POS platform’s data architecture must address two concerns: (1) how POS terminals access product, inventory, and customer data during normal operation; and (2) how sales continue during internet outages.

ADR-002 chose an offline-first strategy where all POS operations run against a local SQLite database first, with background sync to the Central API. This required 6 SQLite tables, CRDTs for conflict resolution, platform-aware data hooks (useLocalFirst() vs useAPI()), sync priority tiers, and complex conflict resolution logic.

Analysis of the target market revealed:

Internet outages are rare and brief (minutes per year) for target tenants
Near real-time sync (1-5 minutes) is required for Shopify inventory accuracy
Immediate config propagation is expected (admin saves → POS reflects within seconds)
Integration flows (Shopify, Amazon, Google Merchant) are simpler when all data flows through the Central API in real-time
The offline-first architecture doubles the testing surface (every data hook tested in both Tauri and web mode) and introduces daily consistency gaps (stale caches, integration timing) for a scenario that barely occurs

Decision

We will use an online-first data strategy for POS terminals, with a thin offline safety net:

ONLINE (99.99% of time): Nexus POS reads/writes directly to the Central API via React Query. WebSocket push delivers real-time config updates and inventory changes. React Query’s in-memory cache provides instant lookups for recently-scanned products.
OFFLINE (rare, brief): Nexus POS falls back to a 2-table SQLite WASM store (sql.js/wa-sqlite + OPFS) — a read-only product cache for pricing lookups and an append-only sales queue for transactions. Sales never stop.

Nexus POS uses React Query → Central API for all data access. The SQLite WASM offline fallback is a thin safety net — not a separate data layer.

Product names: Nexus POS (unified web app, ADR-052), Nexus Raptag (mobile RFID — retains full offline-first per ADR-047, as RFID counting sessions are legitimately disconnected).

Considered Options

Keep offline-first (ADR-002 status quo) — 6-table SQLite, CRDTs, platform-aware hooks, sync priority tiers
Online-first with thin offline fallback — React Query → API, 2-table SQLite WASM (product cache + sales queue), no CRDTs
Online-only (no offline capability) — reject sales during outages; simplest but unacceptable for retail

Decision Outcome

Chosen: Online-first with thin offline fallback because it optimizes for the 99.99% online case (immediate data consistency, simpler integrations, unified data access) while preserving sales continuity for the rare offline scenario. Eliminates CRDTs, reduces SQLite from 6 to 2 tables, removes platform-aware data hooks, and simplifies the sync layer from priority-tiered event queue to simple FIFO sales flush. SQLite runs in the browser via WASM (sql.js/wa-sqlite) with OPFS for persistence.

3-State Connection Monitor

The POS terminal uses a layered detection system to determine connectivity state:

State	Detection	Behavior
ONLINE	WebSocket connected + health ping OK	All reads/writes → Central API via React Query
DEGRADED	WebSocket dropped, health ping intermittent	Reads: try API (2s timeout) → fallback to SQLite cache. Writes: API + local backup queue
OFFLINE	3 consecutive health pings fail (~15 sec)	Reads: SQLite product cache. Writes: local sales queue only

Detection layers (fastest → most reliable):

Socket.io events — instant connect/disconnect signals
Health ping — HTTP GET /health every 5 seconds (catches stale WebSocket)
navigator.onLine — Browser API (instant hint, verified by health ping)

The DEGRADED state prevents rapid flapping between ONLINE and OFFLINE during spotty internet. Components never observe the connection state directly — the data access layer routes transparently.

SQLite Schema (2 Tables)

product_cache — Read-only, server-authoritative (SQLite WASM via OPFS):

Pre-warmed on Nexus POS startup (full catalog download in background)
Updated incrementally via WebSocket push events (product.updated, product.created)
Never written to by Nexus POS — only by sync from Central API
Includes last_refreshed timestamp for staleness detection

sales_queue — Append-only, offline transactions (SQLite WASM via OPFS):

Written only during OFFLINE/DEGRADED states
Each sale has a UUID (sale_id) for idempotent processing
Flushed to Central API on recovery (FIFO order, oldest first)
API uses sale_id for upsert — safe to retry partial flushes

Recovery Sequence

When connectivity restores (OFFLINE → DEGRADED → ONLINE):

Flush sales queue — POST each queued sale to Central API (oldest first, in order)
Idempotent processing — API uses sale_id UUID for upsert; partial retries are safe
Wait for confirmations — each sale acknowledged before moving to next
Refresh product cache — prices/inventory may have changed during outage
Resume WebSocket — re-subscribe to real-time push events
Switch to API mode — data layer routes all reads/writes through Central API

Cashier sees: status indicator transitions from red (offline) → yellow (flushing) → green (online). No manual action required.

Price Discrepancy Handling (Flag-on-Sync)

When offline sales sync, the API compares sale.unit_price against product.current_price:

If prices match: sale accepted normally
If prices differ: sale accepted but flagged as price_discrepancy: true with sold_price, current_price, and difference recorded
Admin sees a “Price Discrepancies” alert in Nexus POS (MANAGER+ role) with options to issue a credit or dismiss
Additional safeguard: if the product cache is older than 1 hour during offline mode, the POS shows a subtle banner: “Product data may be outdated”

Shopify Inventory Protection (Safety Buffers)

The existing BRD safety buffer mechanism (Section 6.x) protects against overselling during outages:

Channel Available = POS Available - Safety Buffer
Buffer absorbs the discrepancy window during brief outages (configurable per product category, default 2-3 units)
On recovery: queue flush → inventory adjusted → integration layer immediately pushes corrected counts to Shopify/Amazon/Google Merchant
Overselling requires: outage + in-store sales exceeding buffer + simultaneous online sales on same SKU (extremely unlikely given minutes/year outages)

Trade-offs

Pros:

Eliminates CRDTs — no two-way merge needed (cache is read-only, queue is append-only)
Reduces SQLite from 6 tables to 2 — dramatically simpler local schema
Removes platform-aware data hooks — Nexus POS uses React Query → API uniformly
Simplifies sync from priority-tiered event queue to simple FIFO sales flush
Real-time config propagation — admin changes reflected on POS within seconds via WebSocket
Simpler integration flows — all data flows through Central API; Shopify/Amazon see real-time inventory
Reduced testing surface — single web deployment target, no platform-conditional code paths for data access

Cons:

API latency affects scan speed when online (mitigated by React Query in-memory cache + Redis server-side cache; <200ms for simple lookups)
Central API becomes a hard dependency when online (mitigated by 3-state fallback; SQLite WASM takes over within 15 seconds)
Product cache may be stale during offline periods (mitigated by flag-on-sync + staleness warning)
Does not support extended offline operation (hours/days) as gracefully as offline-first (accepted: target market has reliable internet)
SQLite WASM has slightly higher overhead than native better-sqlite3 (acceptable for the offline fallback use case; OPFS provides persistence)

Note on Raptag: This ADR does not change the Nexus Raptag mobile app’s data strategy. RFID counting sessions are legitimately disconnected (warehouse floor, intermittent device connectivity) and retain full offline-first per ADR-047.

Supersedes

ADR-002: Offline-First POS Architecture — replaced by online-first with offline fallback

References

ADR-002: Offline-First POS Architecture (superseded)
ADR-052: Unified Web Application (Nexus POS is now a single React web app)
ADR-047: Raptag Mobile Framework (React Native) — retains offline-first
Ch 04: Architecture Styles, Section L.10A.1 (Online-First with Offline Fallback)
Ch 05: Architecture Components, Section 6.x (Safety Buffers for Channel Inventory)

ADR-049: Real-Time Transport — Socket.io

2.49 ADR-049: Real-Time Transport — Socket.io

Field	Value
Status	Accepted
Date	2026-03-01
Decision Makers	Architecture Review Team
Context	The 3-state connection monitor (ONLINE/DEGRADED/OFFLINE from ADR-048) needs a real-time transport for server-push updates, connection health heartbeats, and multi-device coordination. Nexus POS is a unified web application (ADR-052).

Context

ADR-048 defines a 3-state connection monitor (ONLINE/DEGRADED/OFFLINE) that requires a real-time transport for: (1) server-push price and inventory updates to Nexus POS, (2) connection health heartbeats that feed the DEGRADED state detection, and (3) multi-device coordination such as register locking (preventing two users from operating the same register simultaneously). The transport must work reliably in retail network environments and gracefully handle intermittent connectivity.

Decision

We will use Socket.io with WebSocket as the primary transport and HTTP long-polling as the automatic fallback.

Considered Options

Socket.io — Bidirectional, room-based, auto-reconnect, transport fallback
Server-Sent Events (SSE) — Unidirectional server-to-client push over HTTP
Raw WebSocket — Native browser WebSocket API without abstraction layer
Long Polling — Periodic HTTP requests simulating real-time

Decision Outcome

Chosen: Socket.io because it provides bidirectional communication (needed for register lock/unlock commands from server to client), built-in reconnection with exponential backoff (critical for 3-state monitor DEGRADED detection), room-based broadcasting (per-tenant, per-location event routing), and automatic transport fallback (WebSocket → HTTP long-polling) for restrictive network environments.

Trade-offs

Pros:

Bidirectional — server can push updates AND send commands (register lock, force-logout, config refresh)
Built-in reconnection with exponential backoff — feeds directly into ADR-048’s 3-state connection monitor
Room-based broadcasting — events routed per-tenant and per-location without client-side filtering
Automatic transport fallback — WebSocket → HTTP long-polling handles corporate firewalls and proxy servers
Mature ecosystem — well-tested with Node.js/Express, Redis adapter for horizontal scaling

Cons:

Socket.io client dependency adds ~50KB to the client bundle
Sticky sessions required if horizontal scaling (mitigated by Redis adapter: @socket.io/redis-adapter)
Not a standard protocol — custom framing on top of WebSocket (mitigated by widespread adoption and tooling)

References

ADR-048: Online-First POS Data Strategy (3-state connection monitor)
ADR-052: Unified Web Application (Nexus POS)
Ch 04: Architecture Styles, Section L.9A (System Architecture)

ADR-050: Prisma Migrate with Custom RLS Policies

2.50 ADR-050: Prisma Migrate with Custom RLS Policies

Field	Value
Status	Accepted
Date	2026-03-01
Decision Makers	Architecture Review Team
Context	Prisma ORM provides Prisma Migrate for schema management but has no native understanding of PostgreSQL Row-Level Security (RLS) policies required by ADR-001.

Context

Prisma ORM (selected as part of the ADR-046 tech stack) provides Prisma Migrate for schema management — generating migration files from schema changes, tracking migration history, and applying migrations in order. However, Prisma has no native understanding of PostgreSQL Row-Level Security (RLS) policies (ADR-001). Every tenant-scoped table requires both standard DDL (CREATE TABLE, indexes) and custom RLS SQL (CREATE POLICY, ENABLE ROW LEVEL SECURITY). These must be created and updated together as part of the same migration workflow.

Decision

We will use Prisma Migrate for schema DDL with custom SQL migration files for RLS policies. Tenant provisioning uses a dedicated service that: (1) runs Prisma Migrate for schema, (2) executes RLS policy SQL scripts, and (3) seeds tenant configuration.

Considered Options

Prisma Migrate + custom SQL files — Prisma handles DDL, companion .sql files handle RLS
Raw SQL migrations only — Skip Prisma Migrate, manage all DDL and RLS in hand-written SQL
Prisma Migrate with $executeRaw in seed scripts — RLS policies applied outside the migration system
Third-party migration tool (dbmate, golang-migrate) — Replace Prisma Migrate entirely

Decision Outcome

Chosen: Prisma Migrate + custom SQL files because it preserves Prisma’s schema diffing, TypeScript type generation, and migration history while accommodating RLS policies that Prisma cannot generate. Each migration that adds a tenant-scoped table includes a companion RLS policy file in the same migrations folder.

Implementation pattern:

Standard Prisma schema changes generate migration SQL via prisma migrate dev
Developer adds a companion SQL file in the same migration folder for RLS policies
CI validation checks every table with tenant_id has a corresponding RLS policy
Prisma Client middleware calls SET LOCAL app.current_tenant = $tenantId on every connection via $queryRaw

Trade-offs

Pros:

Preserves Prisma’s schema diffing, type generation, and migration history tracking
RLS policies live alongside the DDL migrations they relate to (co-located, not scattered)
CI can validate RLS coverage: every table with tenant_id must have a corresponding policy
Prisma Client middleware provides a clean interception point for SET LOCAL app.current_tenant
Migration rollback includes both DDL and RLS changes

Cons:

Every new tenant-scoped table requires both a Prisma schema change AND a custom RLS SQL file (discipline needed)
Prisma Migrate does not track or diff the custom SQL files — developer must remember to add them
Custom SQL files are not reflected in the Prisma schema (RLS is invisible to schema.prisma)
SET LOCAL app.current_tenant must be called on every connection — forgetting breaks isolation

References

ADR-001: Shared Tables with Row-Level Security Multi-Tenancy
ADR-052: Unified Web Application (Prisma ORM selection, originally ADR-046)
Ch 04: Architecture Styles, Section L.10A.4 (Multi-Tenancy)
Ch 06: Database Strategy

ADR-051: State Management — React Query (Server) + Zustand (Client)

2.51 ADR-051: State Management — React Query + Zustand

Field	Value
Status	Accepted
Date	2026-03-01
Decision Makers	Architecture Review Team
Context	Two client applications (Nexus POS web app, Nexus Raptag mobile) need state management for both server-fetched data and local UI state under the ADR-048 online-first architecture.

Context

Two client applications (Nexus POS unified web app per ADR-052, Nexus Raptag mobile per ADR-047) need state management for both server-fetched data and local UI state. The ADR-048 online-first architecture means most state comes from the Central API — product data, inventory, customer records, and configuration are all server-authoritative. However, the Nexus POS active cart must survive brief offline periods without losing items, and UI state (connection status, preferences, active modals) is purely local.

The state management solution must clearly separate server state (cached API responses) from client state (cart, UI, connection status) to avoid the common pitfall of treating all state identically.

Decision

We will use React Query (TanStack Query) for all server state and Zustand for client-only state. Clear boundary: if data exists on the server, use React Query; if data is local-only or must survive offline, use Zustand with optional persistence.

Considered Options

React Query + Zustand — Dedicated server-state cache + lightweight client-state store
Redux Toolkit (RTK Query + slices) — Unified state management with built-in API cache
React Context API + custom hooks — Built-in React state with no external dependencies
Jotai / Recoil — Atomic state management libraries

Decision Outcome

Chosen: React Query + Zustand because React Query eliminates manual fetch/cache/retry code for the 80% of state that comes from the API, while Zustand provides a minimal, unopinionated store for the 20% that is local-only. The two libraries have no overlap and no conflict.

State boundary rule: “If it has a REST endpoint, use React Query. If it’s local-only, use Zustand.”

React Query manages:

Product catalog, inventory levels, customer records (cached API responses)
Background refetch, optimistic updates, retry logic, pagination
Stale-while-revalidate for instant UI with background freshness checks

Zustand manages:

Active cart items (must survive DEGRADED state without losing items mid-sale)
Register UI state (active modal, selected tab, sidebar collapsed)
3-state connection monitor status (ONLINE/DEGRADED/OFFLINE from ADR-048)
User preferences (theme, receipt format, default payment method)
Cart persistence: Zustand persist middleware writes to localStorage (Nexus POS web) or AsyncStorage (Nexus Raptag mobile). On reconnection, cart syncs via API

Trade-offs

Pros:

React Query handles caching, background refetch, optimistic updates, retry, pagination — eliminates manual fetch/cache code
Zustand is ~1KB, no boilerplate, no reducers, no actions — just a function that returns state
Clear separation prevents the “everything in Redux” anti-pattern
Cart items survive DEGRADED/OFFLINE states via Zustand persist middleware
Both libraries are TypeScript-first with excellent type inference

Cons:

Two state libraries to learn (mitigated by clear boundary rule and small Zustand API surface)
Cart items live in Zustand during a sale but must be persisted to the server on sale completion via React Query mutation (two-step)
Zustand persist middleware uses localStorage which has ~5MB limit (sufficient for cart state, not for catalog)

References

ADR-048: Online-First POS Data Strategy
ADR-052: Unified Web Application (Nexus POS)

ADR-052: Unified Web Application (Nexus POS)

2.52 ADR-052: Unified Web Application

Field	Value
Status	Accepted
Date	2026-03-02
Decision Makers	Architecture Review Team
Context	ADR-046 defined dual deployment (Tauri desktop + React web). Analysis revealed target retailers use standard PCs/tablets with Chrome/Edge, hardware peripherals work via web protocols, and the “Admin Portal” vs “POS Terminal” split creates artificial product complexity when role-based routing achieves the same outcome.

Context

ADR-046 established a dual deployment architecture: “Nexus POS” as a Tauri 2.0 desktop application for store terminals with native hardware access, and “Nexus Admin” as a React web application for browser-based administration. Both shared a single React/TypeScript codebase but required separate build pipelines, platform-conditional code (isTauri() checks), and Rust-based hardware integration for the desktop variant.

Analysis of the target market (small-to-medium multi-location retailers) revealed:

Standard hardware: Target retailers use commodity PCs and tablets running Chrome or Edge — not dedicated POS terminals requiring native desktop wrappers
Web-based peripherals: Modern receipt printers (Star Micronics, Epson) expose HTTP/WebSocket APIs (Star WebPRNT, Epson ePOS SDK); barcode scanners operate as USB HID keyboard wedge devices; cash drawers connect to receipt printers via kick-out cables; payment terminals use browser-compatible SDKs (Stripe Terminal)
Artificial product split: The “Nexus POS” vs “Nexus Admin” distinction created two product names for what is functionally one application with different role-based views. A CASHIER needs the sales terminal; a MANAGER needs reports and configuration; both use the same codebase
Build complexity: Tauri requires a Rust build pipeline, WebView2 dependency management, and platform-specific installers — overhead for minimal benefit when web deployment achieves the same outcome

Decision

We will deploy a single React/TypeScript web application called “Nexus POS”. Users see different menus and pages based on their assigned roles (OWNER, MANAGER, CASHIER, BUYER, AUDITOR). There is no separate “Nexus Admin” product.

Product names: Nexus POS (web app), Nexus Raptag (mobile RFID, unchanged per ADR-047).

Hardware integration via web protocols:

Receipt Printers: Star WebPRNT (HTTP POST to printer’s built-in web server) or Epson ePOS SDK (WebSocket). ESC/POS commands sent over network — no native access required.
Barcode Scanners: USB HID keyboard wedge — scanner outputs keystrokes captured by standard keydown event listeners. Works identically in any browser.
Cash Drawers: Connected to receipt printer via RJ-11 kick-out cable. Drawer opens when receipt printer sends ESC/POS drawer-open command. Solved automatically when receipt printing is solved.
Payment Terminals: Stripe Terminal JavaScript SDK communicates with Verifone/WisePOS reader over local network. Browser-native, SAQ-A compliant (no card data touches our system).

Offline storage: SQLite WASM (sql.js/wa-sqlite) with OPFS for browser-persistent storage. Same 2-table schema from ADR-048 (product_cache + sales_queue), same 3-state connection monitor — different runtime (WASM instead of native better-sqlite3).

Considered Options

Keep dual deployment (Tauri + web) — Maintain ADR-046 architecture with isTauri() conditional code
Unified web application — Single React SPA, role-based routing, web-based hardware integration
Progressive Web App (PWA) — Web app with service worker for offline, installable on desktop
Electron — Desktop wrapper with full Node.js access (rejected: 150MB+ bundle, Chromium overhead)

Decision Outcome

Chosen: Unified web application because it eliminates the Rust build pipeline, removes platform-conditional code, unifies the product naming, and uses web-standard hardware protocols that work across all modern browsers. The SQLite WASM runtime provides the same offline fallback capability as native better-sqlite3 with slightly higher overhead (acceptable for the rare offline scenario).

Trade-offs

Pros:

Single build pipeline — React + Vite, no Rust compilation
No platform-conditional code — eliminates isTauri() checks and all window.__TAURI__ detection
Unified product name — “Nexus POS” for all users regardless of role
Role-based navigation — CASHIER sees sales terminal, MANAGER sees dashboard + reports, OWNER sees configuration
Web-standard hardware — Star WebPRNT and USB HID work across Chrome, Edge, Firefox
Instant deployment — CDN-served SPA, no desktop installer distribution
Simpler testing — single deployment target, no dual-mode test matrix

Cons:

SQLite WASM has ~2-3x overhead vs native better-sqlite3 (acceptable: offline fallback is rare, performance-critical path is online API access)
Web Serial API and WebUSB have limited browser support (Firefox) — mitigated by targeting Chrome/Edge which dominate enterprise retail
No offline application startup — web app requires network to load initially (mitigated: service worker can cache app shell for offline reload)
Browser tab can be accidentally closed — no system tray or always-on-top (mitigated: POS terminals use kiosk mode or dedicated browser profile)

Supersedes

ADR-046: Nexus Dual Deployment Architecture (Tauri Desktop + Web App)
ADR-007: Admin Portal Framework (Blazor Server) — already superseded by ADR-046, now further obsoleted
ADR-013: RFID Configuration in Tenant Admin Portal — “Admin Portal” concept fully eliminated

References

ADR-008: POS Client Framework (React/TypeScript architecture principles remain valid; Tauri-specific parts superseded)
ADR-047: Raptag Mobile Framework (React Native — unchanged)
ADR-048: Online-First POS Data Strategy (unchanged; SQLite runtime changes from native to WASM)
Ch 04: Architecture Styles, Section L.9A (System Architecture)

How to Propose a New ADR

ADR Proposal Process
====================

1. Copy the ADR template
2. Fill in Context, Decision, Consequences
3. Set Status to "proposed"
4. Submit for architecture review
5. Discuss in architecture meeting
6. Update based on feedback
7. Set Status to "accepted" when approved
8. Add to ADR Index

MADR Template (Markdown Any Decision Records)

We use the MADR (Markdown Any Decision Records) format, which is more comprehensive than the basic ADR format and better suited for complex architectural decisions.

Full MADR Template

# ADR-XXX: [Short Title of Solved Problem and Solution]

## Status

[proposed | accepted | deprecated | superseded by ADR-YYY]

## Date

YYYY-MM-DD

## Decision-Makers

- [Name/Role 1]
- [Name/Role 2]

## Technical Story

[Link to ticket/issue: JIRA-123, GitHub Issue #456]

## Context and Problem Statement

[Describe the context and problem statement, e.g., in free form
using two to three sentences or in the form of an illustrative
story. You may want to articulate the problem in form of a question.]

## Decision Drivers

* [Driver 1, e.g., a force, facing concern, …]
* [Driver 2, e.g., a force, facing concern, …]
* [Driver 3, e.g., a force, facing concern, …]

## Considered Options

1. [Option 1]
2. [Option 2]
3. [Option 3]
4. [Option 4]

## Decision Outcome

**Chosen Option**: "[Option X]"

### Justification

[Justification for why this option was chosen. Reference the
decision drivers and explain how this option best addresses them.]

### Positive Consequences

* [e.g., improvement of quality attribute satisfaction, follow-up
  decisions required, …]
* …

### Negative Consequences

* [e.g., compromising quality attribute, follow-up decisions required,
  technical debt introduced, …]
* …

## Pros and Cons of the Options

### [Option 1]

[Example: Schema-per-tenant multi-tenancy]

**Pros:**
* Good, because [argument a]
* Good, because [argument b]

**Cons:**
* Bad, because [argument c]
* Bad, because [argument d]

### [Option 2]

[Example: Row-level multi-tenancy]

**Pros:**
* Good, because [argument a]
* Good, because [argument b]

**Cons:**
* Bad, because [argument c]

### [Option 3]

[Example: Database-per-tenant]

**Pros:**
* Good, because [argument a]

**Cons:**
* Bad, because [argument b]
* Bad, because [argument c]

## Links

* [Link type] [Link to ADR] <!-- example: Refined by ADR-007 -->
* [Link type] [Link to external resource]
* Supersedes ADR-XXX
* Related to ADR-YYY

## Notes

[Any additional notes, discussion points, or future considerations]

MADR Example: Kafka Selection

# ADR-014: Apache Kafka for Event Streaming

## Status

accepted

## Date

2026-01-15

## Decision-Makers

- Architecture Team
- Infrastructure Team

## Technical Story

ARCH-456: Select event streaming platform for POS event sourcing

## Context and Problem Statement

Our POS platform uses event sourcing for the Sales and Inventory
domains. We need an event streaming platform that supports:
- Event replay for new consumers
- Durable storage for audit compliance
- High throughput during peak retail periods (Black Friday)
- Multi-datacenter replication for disaster recovery

Which event streaming platform should we use?

## Decision Drivers

* Replayability - New analytics services must process historical events
* Durability - Events must survive broker failures (PCI compliance)
* Throughput - Handle 10,000+ events/second during peak
* Ecosystem - Good client libraries for .NET
* Operations - Team can manage without dedicated staff

## Considered Options

1. Apache Kafka
2. RabbitMQ with Shovel plugin
3. Amazon Kinesis
4. Redis Streams
5. PostgreSQL LISTEN/NOTIFY

## Decision Outcome

**Chosen Option**: "Apache Kafka (with KRaft mode)"

### Justification

Kafka is the only option that provides true event replayability with
configurable retention. New consumers can start from the beginning
of the log and process all historical events. This is critical for:
- Adding new analytics modules
- Rebuilding projections after bugs
- Audit investigations

KRaft mode eliminates ZooKeeper dependency, simplifying operations.

### Positive Consequences

* Complete replayability for compliance and analytics
* Proven at massive scale (LinkedIn, Uber)
* Strong .NET client (Confluent.Kafka)
* Schema Registry for event versioning

### Negative Consequences

* More complex than RabbitMQ
* Requires understanding of partitioning
* Higher resource usage than simpler queues

## Pros and Cons of the Options

### Apache Kafka

**Pros:**
* Good, because events are retained for configurable duration
* Good, because consumers can replay from any offset
* Good, because it handles 100K+ messages/second
* Good, because KRaft mode simplifies deployment

**Cons:**
* Bad, because it requires more operational knowledge
* Bad, because partition management adds complexity

### RabbitMQ with Shovel

**Pros:**
* Good, because it's simpler to operate
* Good, because team has existing experience

**Cons:**
* Bad, because messages are deleted after consumption
* Bad, because replay requires external archival

### Amazon Kinesis

**Pros:**
* Good, because it's fully managed
* Good, because it has replay capability

**Cons:**
* Bad, because of vendor lock-in
* Bad, because pricing is complex at scale

### Redis Streams

**Pros:**
* Good, because it's simple
* Good, because it's low latency

**Cons:**
* Bad, because durability is limited
* Bad, because it's not designed for long-term storage

### PostgreSQL LISTEN/NOTIFY

**Pros:**
* Good, because no additional infrastructure

**Cons:**
* Bad, because it doesn't scale
* Bad, because messages are ephemeral

## Links

* Refined by ADR-015 (Schema Registry Selection)
* Related to ADR-003 (Event Sourcing for Sales Domain)
* [Kafka Documentation](https://kafka.apache.org/documentation/)

## Notes

Evaluated during Q1 2026 architecture review. Confluent Cloud was
considered but rejected due to cost; self-hosted Kafka preferred.

**UPDATE (v3.0.0)**: Kafka is **deferred to v2.0**. Per the Architecture
Styles analysis (Chapter 04, Section L.4A.2),
v1.0 uses PostgreSQL event tables with LISTEN/NOTIFY for event notification
and Transactional Outbox for guaranteed delivery. This ADR remains valid
for v2.0 planning when scale justifies the Kafka operational overhead.

ADR Tooling & Automation

Recommended Tools

Tool	Purpose	Installation
adr-tools	CLI for creating/managing ADRs	`brew install adr-tools`
Log4brains	ADR documentation site generator	`npm install -g log4brains`
adr-viewer	Web-based ADR viewer	Docker image available

ADR Tools CLI

# Install adr-tools
brew install adr-tools  # macOS
# or
sudo apt install adr-tools  # Ubuntu

# Initialize ADR directory
adr init docs/adr

# Create new ADR
adr new "Use Kafka for Event Streaming"
# Creates: docs/adr/0014-use-kafka-for-event-streaming.md

# Supersede an ADR
adr new -s 3 "Replace Event Sourcing with Outbox Pattern"
# Creates new ADR that supersedes ADR-003

# List all ADRs
adr list

# Generate ADR index
adr generate toc > docs/adr/README.md

Log4brains Integration

Log4brains generates a searchable documentation website from ADRs:

# Install Log4brains
npm install -g log4brains

# Initialize in project
log4brains init

# Start preview server
log4brains preview

# Build static site
log4brains build

# Deploy to GitHub Pages
log4brains build --basePath /pos-platform-adr

# .github/workflows/adr-docs.yml

name: ADR Documentation

on:
  push:
    branches: [main]
    paths:
      - 'docs/adr/**'

jobs:
  build-adr-site:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
        with:
          fetch-depth: 0  # Full history for dates

      - uses: actions/setup-node@v4
        with:
          node-version: '20'

      - name: Install Log4brains
        run: npm install -g log4brains

      - name: Build ADR site
        run: log4brains build --basePath /pos-platform-adr

      - name: Deploy to GitHub Pages
        uses: peaceiris/actions-gh-pages@v3
        with:
          github_token: ${{ secrets.GITHUB_TOKEN }}
          publish_dir: .log4brains/out

ADR Linting

# .github/workflows/adr-lint.yml

name: ADR Lint

on:
  pull_request:
    paths:
      - 'docs/adr/**'

jobs:
  lint-adr:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Validate ADR Format
        run: |
          for file in docs/adr/*.md; do
            # Check required sections
            if ! grep -q "## Status" "$file"; then
              echo "ERROR: $file missing Status section"
              exit 1
            fi
            if ! grep -q "## Context" "$file" && ! grep -q "## Context and Problem Statement" "$file"; then
              echo "ERROR: $file missing Context section"
              exit 1
            fi
            if ! grep -q "## Decision" "$file" && ! grep -q "## Decision Outcome" "$file"; then
              echo "ERROR: $file missing Decision section"
              exit 1
            fi
          done
          echo "All ADRs pass validation"

      - name: Check ADR Numbering
        run: |
          # Ensure sequential numbering
          expected=1
          for file in docs/adr/[0-9]*.md; do
            num=$(basename "$file" | grep -o '^[0-9]*')
            if [ "$num" != "$expected" ]; then
              echo "WARNING: Expected ADR-$expected, found ADR-$num"
            fi
            expected=$((expected + 1))
          done

ADR Review Checklist

# ADR Review Checklist

Before accepting an ADR, verify:

## Structure
- [ ] Uses MADR template
- [ ] Has clear title
- [ ] Status is set correctly
- [ ] Date is current
- [ ] Decision-makers are listed

## Content Quality
- [ ] Context clearly explains the problem
- [ ] Decision drivers are explicit
- [ ] At least 3 options were considered
- [ ] Pros/cons are documented for each option
- [ ] Chosen option justification references drivers

## Completeness
- [ ] Positive consequences listed
- [ ] Negative consequences listed (be honest!)
- [ ] Risks identified
- [ ] Mitigations proposed for risks
- [ ] Links to related ADRs

## Traceability
- [ ] Linked to technical story/ticket
- [ ] References relevant documentation
- [ ] Supersedes/relates to other ADRs if applicable

## Approval
- [ ] Architecture team reviewed
- [ ] Security team reviewed (if applicable)
- [ ] Infrastructure team reviewed (if applicable)

ADR Template

Summary

These Architecture Decision Records capture the foundational technical decisions for the POS Platform:

ADR	Key Decision	Primary Benefit
ADR-001	~~Schema-per-tenant~~ Corrected → Row-Level RLS (Ch 04 L.10A.4)	Tenant isolation via `tenant_id` + PostgreSQL RLS policies
ADR-002	~~Offline-first~~ Superseded by ADR-048	Replaced by online-first with offline fallback
ADR-003	Event sourcing	Complete audit trail and temporal queries
ADR-004	JWT + PIN	Secure API + fast cashier workflow
ADR-005	PostgreSQL	RLS multi-tenancy and JSONB flexibility
ADR-006	Node.js + TypeScript (Central API)	Unified TypeScript stack, Prisma ORM, Socket.io
ADR-007	~~Blazor Server~~ Superseded by ADR-046	Replaced by Nexus dual deployment (React web app)
ADR-008	Tauri 2.0 + React/TypeScript (Nexus POS)	Native hardware access, shared React codebase, lightweight binary
ADR-009	Redis for session & cache	Distributed session, sub-ms cache, pub/sub
ADR-010	Webhook + Polling (Shopify)	Real-time sync with fallback consistency
ADR-011	SAQ-A Semi-Integrated payments	Minimal PCI scope, no card data in system
ADR-012	LGTM Stack (observability)	Open-source, self-hosted, unified dashboards
ADR-013	~~RFID in Admin Portal~~ Superseded by ADR-046	RFID config now in Nexus Admin > Settings > RFID
ADR-014	Pinned major.minor npm versions with lock file	Build reproducibility with security patches
ADR-015	~~Queue-and-Sync with CRDTs~~ Superseded by ADR-048	Replaced by online-first; CRDTs eliminated
ADR-016	ERR-Mxxx error codes	Structured, machine-parseable, module-aligned
ADR-017	Layered Testing Pyramid (Vitest + Playwright + k6)	Fast feedback, real DB tests, contract testing
ADR-018	Affirm BNPL Integration	Third-party financing without PCI scope
ADR-019	SAQ-A Semi-Integrated Payment Scope	Zero card data in POS, minimal PCI burden
ADR-020	Split Tender Payment Support	Multiple payment methods per transaction
ADR-021	Layaway Payment Plans	State-machine-driven installment lifecycle
ADR-022	Tax-Inclusive Display with Compound Calc	Accurate 3-level tax, customer transparency
ADR-023	Compound Tax (3-Level State/County/City)	Jurisdiction-accurate tax computation
ADR-024	Gift Card Compliance (State Escheatment)	Legal compliance with state unclaimed-property laws
ADR-025	6-Status Inventory State Machine	Deterministic status transitions, audit trail
ADR-026	Reservation-Based Inventory Hold Model	Prevents overselling across channels
ADR-027	RFID Counting-Only Scope	Focused RFID value, reduced complexity
ADR-028	Physical Count Freeze Period	Data integrity during full counts
ADR-029	Adjustment Manager Approval	Shrinkage control, accountability
ADR-030	Auto-Suggest Transfers Algorithm	Velocity-based stock redistribution
ADR-031	Shopify Webhook + Polling Dual Sync	Real-time events with polling fallback
ADR-032	Strictest-Rule-Wins Validation	Cross-platform compliance by default
ADR-033	Amazon SP-API Integration	Marketplace reach with OAuth 2.0/LWA auth
ADR-034	Google Merchant Center Feed	Product visibility before Content API EOL
ADR-035	Channel Safety Buffer Calculation	Prevents overselling across channels
ADR-036	POS-Master Default for Channels	Single source of truth for product data
ADR-037	~~Offline Conflict Resolution via CRDTs~~ Superseded by ADR-048	Replaced by online-first; CRDTs eliminated
ADR-038	Transactional Outbox for Events	Reliable event publishing, no dual-write
ADR-039	CQRS Boundary (Sales Domain Only)	Targeted complexity where value is highest
ADR-040	Eventual Consistency SLA	Predictable sync guarantees per channel
ADR-041	6-Gate Security Pyramid	Automated layered security scanning
ADR-042	~~E2E Testing~~ Removed (duplicate of ADR-017)	—
ADR-043	~~LGTM Observability~~ Removed (duplicate of ADR-012)	—
ADR-044	API Performance Targets	SLA-driven p99 latency budgets
ADR-045	Blue-Green Deployment Strategy	Zero-downtime releases with instant rollback
ADR-046	~~Nexus Dual Deployment Architecture~~ Superseded by ADR-052	Replaced by unified web application
ADR-047	Raptag Mobile Framework (React Native)	Unified TypeScript for RFID mobile app
ADR-048	Online-First POS Data Strategy	Online-first API access, 2-table SQLite WASM offline fallback
ADR-049	Real-Time Transport — Socket.io	Bidirectional push, auto-reconnect, room-based routing
ADR-050	Prisma Migrate with Custom RLS Policies	Schema DDL + companion RLS SQL in same migration
ADR-051	State Management — React Query + Zustand	Server-state cache + lightweight client-state store
ADR-052	Unified Web Application (Nexus POS)	Single React web app, role-based navigation, web hardware protocols

These 52 records (43 active, 7 superseded [001, 002, 007, 013, 015, 037, 046], 2 removed [042, 043]) form the architectural foundation upon which the rest of the system is built.

Document Information

Attribute	Value
Version	7.0.0
Created	2025-12-29
Updated	2026-03-02
Author	Claude Code
Status	Active
Part	II - Architecture
Chapter	02 of 9

Change Log

Version	Date	Changes
7.0.0	2026-03-02	Unified Web Application: Added ADR-052 (single React web app replacing Tauri desktop + web admin split). Superseded ADR-046 (Dual Deployment). Updated ADR-048 (better-sqlite3 → SQLite WASM via sql.js/wa-sqlite + OPFS). Updated ADR-008 (Tauri-specific parts superseded by ADR-052). Updated ADR-049 (Socket.io references). Updated ADR-051 (2 apps not 3, localStorage not Tauri). Updated ADR-047 references. Total: 52 records (43 active, 7 superseded [001, 002, 007, 013, 015, 037, 046], 2 removed [042, 043]).
6.3.0	2026-03-01	Online-first consolidation: Superseded ADR-015 (CRDTs) and ADR-037 (CRDT conflict resolution) by ADR-048. Fixed ADR-040 context (offline-first→online-first, ADR-015→ADR-048). Fixed ADR-038 destination (signalr→socketio). Expanded ADR-007 superseded note. Added ADR-049 (Socket.io real-time transport), ADR-050 (Prisma Migrate + RLS), ADR-051 (React Query + Zustand state management). Total: 51 records (43 active, 6 superseded [001, 002, 007, 013, 015, 037], 2 removed [042, 043]).
6.2.0	2026-03-01	Online-first pivot: Added ADR-048 (Online-First POS Data Strategy), superseding ADR-002 (Offline-First). ADR-046 con and Implementation Risk #3 updated for 2-table SQLite. Total: 48 records (42 active, 4 superseded [001, 002, 007, 013], 2 removed [042, 043]).
6.1.0	2026-02-28	Tech stack pivot Phase 1: ADR-006 rewritten (ASP.NET Core → Node.js + TypeScript), ADR-008 rewritten (.NET MAUI → Tauri 2.0 + React), ADR-014 rewritten (NuGet → npm), ADR-017 updated (xUnit → Vitest), ADR-001 corrected (Strategy C → Strategy A RLS), ADR-007 superseded (by ADR-046), ADR-013 superseded (by ADR-046), ADR-042 removed (duplicate of ADR-017), ADR-043 removed (duplicate of ADR-012). Added ADR-046 (Nexus Dual Deployment Architecture) and ADR-047 (Raptag Mobile Framework — React Native). Updated SignalR → Socket.io, MediatR → command/query bus, StackExchange.Redis → ioredis. Total: 47 records (42 active, 3 superseded, 2 removed).
1.0.0	2025-12-29	Initial ADRs (001-006)
2.0.0	2026-01-01	Added ADR-013 (RFID Configuration), MADR template, tooling section
3.0.0	2026-02-22	ADR-001 marked SUPERSEDED (Schema-Per-Tenant replaced by Row-Level RLS per Ch 04 L.10A.4); added Kafka v2.0 deferral note to ADR-014 example (per Ch 04 L.4A.2); fixed Next Chapter link; renumbered chapter references for v3.0.0
5.2.0	2026-02-27	Added 10 new ADRs (007-012, 014-017): Blazor Server (Admin), .NET MAUI Blazor Hybrid (POS), Redis, Shopify Webhook+Polling, SAQ-A Payments, LGTM Stack, NuGet Versioning, Queue-and-Sync CRDTs, ERR-Mxxx Error Codes, Layered Testing Pyramid. Removed Future ADRs table (all now accepted). Updated Summary table with all 17 ADRs.
5.2.1	2026-02-27	Added 28 new ADRs (018-045) covering Payment & Financials, Inventory & Stock Management, Multi-Channel Integration, Data Consistency & Conflict Resolution, and Architecture Patterns & Infrastructure. Sourced from Ch 04 and Ch 05. Total ADRs: 45.

Next Chapter: Chapter 03: Architecture Characteristics

This chapter is part of the POS Blueprint Book. All content is self-contained.

The POS Platform Blueprint