Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Chapter 37: Troubleshooting

Common Issues and Solutions for POS Platform

This chapter provides solutions for common problems encountered during development, deployment, and operation of the POS platform.


Table of Contents

  1. Database Connection Issues
  2. Tenant Isolation Failures
  3. Sync Conflicts
  4. Payment Processing Errors
  5. Offline Mode Problems
  6. Performance Issues
  7. Authentication Failures
  8. Integration Errors
  9. Build and Deployment Failures

1. Database Connection Issues

Issue: Container Cannot Connect to PostgreSQL

Symptoms:

  • Application fails to start
  • Error: “Connection refused” or “Host not found”
  • EF Core throws NpgsqlException

Possible Causes:

  1. PostgreSQL container not running
  2. Container not on correct Docker network
  3. Wrong connection string
  4. Firewall blocking port

Diagnostic Steps:

# Check if postgres16 is running
docker ps | grep postgres16

# Check network connectivity from app container
docker exec <app-container> ping postgres16

# Test port accessibility
docker exec <app-container> nc -zv postgres16 5432

# View PostgreSQL logs
docker logs postgres16 --tail 100

Resolution:

  1. Container not running:

    cd /volume1/docker/postgres
    docker-compose up -d
    
  2. Network misconfiguration:

    # Verify network exists
    docker network ls | grep postgres_default
    
    # Create if missing
    docker network create postgres_default
    
    # Connect container to network
    docker network connect postgres_default <app-container>
    
  3. Wrong connection string:

    # Correct format from container:
    Host=postgres16;Port=5432;Database=pos_db;Username=pos_user;Password=xxx
    
    # Correct format from host:
    Host=localhost;Port=5433;Database=pos_db;Username=pos_user;Password=xxx
    

Prevention:

  • Always specify postgres_default as external network in docker-compose
  • Use environment variables for connection strings
  • Implement connection retry logic with exponential backoff

Issue: “Role does not exist” Error

Symptoms:

  • Error: FATAL: role "pos_user" does not exist

Possible Causes:

  • Database user not created
  • Wrong username in connection string

Resolution:

# Create the user
docker exec -it postgres16 psql -U postgres << EOF
CREATE USER pos_user WITH PASSWORD 'secure_password';
CREATE DATABASE pos_db OWNER pos_user;
GRANT ALL PRIVILEGES ON DATABASE pos_db TO pos_user;
EOF

2. Tenant Isolation Failures

Issue: Data Leaking Between Tenants

Symptoms:

  • User sees data from another tenant
  • Queries return unexpected results
  • Security audit fails

Possible Causes:

  1. Missing TenantId filter in query
  2. Middleware not setting tenant context
  3. Background job not setting tenant
  4. DbContext not configured for tenant

Diagnostic Steps:

-- Check for records missing tenant_id
SELECT table_name
FROM information_schema.columns
WHERE column_name = 'tenant_id'
  AND table_schema = 'public';

-- Find orphaned records
SELECT COUNT(*) FROM orders WHERE tenant_id IS NULL;

Resolution:

  1. Missing filter - Add global query filter:

    // In DbContext.OnModelCreating
    modelBuilder.Entity<Order>()
        .HasQueryFilter(o => o.TenantId == _tenantProvider.TenantId);
    
  2. Middleware issue:

    // Verify middleware order in Program.cs
    app.UseAuthentication();
    app.UseTenantMiddleware();  // Must be after auth
    app.UseAuthorization();
    
  3. Background job:

    // Always set tenant in background jobs
    using (var scope = _scopeFactory.CreateScope())
    {
        var tenantProvider = scope.ServiceProvider.GetRequiredService<ITenantProvider>();
        tenantProvider.SetTenant(tenantId);
        // ... do work
    }
    

Prevention:

  • Enable Row-Level Security in PostgreSQL
  • Add integration tests that verify isolation
  • Review all queries for tenant filtering
  • Use tenant-scoped DbContext factory

Issue: “Invalid TenantId” on Valid Request

Symptoms:

  • 400 Bad Request with tenant errors
  • User cannot access their own data

Possible Causes:

  • Tenant ID not in JWT claims
  • Tenant lookup failing
  • Caching stale tenant data

Resolution:

// Debug: Log tenant resolution
_logger.LogDebug("Resolving tenant from claim: {TenantClaim}",
    context.User.FindFirst("tenant_id")?.Value);

// Clear tenant cache
_cache.Remove($"tenant:{tenantId}");

3. Sync Conflicts

Issue: Offline Changes Overwritten

Symptoms:

  • User makes offline edits, they disappear after sync
  • Error: “Conflict detected”
  • Data reverts to old state

Possible Causes:

  1. Last-write-wins without conflict detection
  2. Version mismatch
  3. Sync order incorrect

Diagnostic Steps:

-- Check version history
SELECT id, version, modified_at
FROM inventory_items
WHERE sku = 'ABC123'
ORDER BY version DESC;

-- Check event log
SELECT * FROM inventory_events
WHERE sku = 'ABC123'
ORDER BY created_at DESC LIMIT 10;

Resolution:

  1. Implement optimistic concurrency:

    public async Task<bool> UpdateAsync(Item item, int expectedVersion)
    {
        var affected = await _db.Items
            .Where(i => i.Id == item.Id && i.Version == expectedVersion)
            .ExecuteUpdateAsync(s => s
                .SetProperty(i => i.Name, item.Name)
                .SetProperty(i => i.Version, expectedVersion + 1));
    
        return affected > 0;  // False if version mismatch
    }
    
  2. Queue offline changes with timestamps:

    // Store in local queue with client timestamp
    _localQueue.Enqueue(new SyncItem
    {
        Operation = "Update",
        ClientTimestamp = DateTimeOffset.UtcNow,
        Data = item
    });
    

Prevention:

  • Use vector clocks or version vectors
  • Implement merge strategies for specific entity types
  • Show user when conflicts occur and let them choose

Issue: Sync Never Completes

Symptoms:

  • “Syncing…” message never goes away
  • Partial data sync
  • Timeout errors

Possible Causes:

  • Network interruption during sync
  • Large payload timeout
  • Server error during sync

Resolution:

// Implement chunked sync
public async Task SyncAsync()
{
    var chunks = _localQueue.Chunk(100);
    foreach (var chunk in chunks)
    {
        try
        {
            await _api.SyncBatchAsync(chunk);
            _localQueue.MarkSynced(chunk);
        }
        catch (TimeoutException)
        {
            // Will retry next sync
            break;
        }
    }
}

4. Payment Processing Errors

Issue: Payment Gateway Timeout

Symptoms:

  • Payment hangs for 30+ seconds
  • Error: “Request timeout”
  • Uncertain if payment processed

Possible Causes:

  1. Network latency
  2. Gateway overloaded
  3. Invalid timeout configuration

Diagnostic Steps:

# Test gateway connectivity
curl -X GET https://api.paymentgateway.com/health -w "\nTime: %{time_total}s\n"

# Check recent payment attempts in logs
grep "payment" /var/log/pos/*.log | tail -50

Resolution:

  1. Implement idempotency:

    public async Task<PaymentResult> ProcessPaymentAsync(
        PaymentRequest request,
        string idempotencyKey)
    {
        // Check if already processed
        var existing = await _db.Payments
            .FirstOrDefaultAsync(p => p.IdempotencyKey == idempotencyKey);
        if (existing != null)
            return existing.ToResult();
    
        // Process with gateway
        var result = await _gateway.ChargeAsync(request);
    
        // Save with idempotency key
        await _db.Payments.AddAsync(new Payment
        {
            IdempotencyKey = idempotencyKey,
            Status = result.Status
        });
    
        return result;
    }
    
  2. Add timeout with retry:

    var policy = Policy
        .Handle<TimeoutException>()
        .RetryAsync(3, onRetry: (ex, count) =>
        {
            _logger.LogWarning("Payment retry {Count}: {Message}", count, ex.Message);
        });
    
    await policy.ExecuteAsync(() => _gateway.ChargeAsync(request));
    

Prevention:

  • Always use idempotency keys
  • Set reasonable timeouts (15-30 seconds)
  • Implement circuit breaker for gateway calls
  • Queue payments if offline

Issue: Card Declined

Symptoms:

  • Payment rejected
  • Error code from gateway

Common Decline Codes:

CodeMeaningAction
insufficient_fundsNot enough balanceTry different card
card_declinedGeneric declineContact card issuer
expired_cardCard expiredUse different card
incorrect_cvcWrong CVVRe-enter
processing_errorGateway issueRetry

Resolution:

public string GetUserFriendlyMessage(string errorCode)
{
    return errorCode switch
    {
        "insufficient_funds" => "Card declined. Please try a different payment method.",
        "expired_card" => "This card has expired. Please use a different card.",
        "incorrect_cvc" => "The security code is incorrect. Please verify and try again.",
        _ => "Payment could not be processed. Please try again or use a different card."
    };
}

5. Offline Mode Problems

Issue: Application Won’t Start Offline

Symptoms:

  • App requires internet to launch
  • Loading screen indefinitely
  • Error: “Network request failed”

Possible Causes:

  1. Missing service worker
  2. No cached data
  3. API call in startup

Diagnostic Steps:

  • Check browser DevTools > Application > Service Workers
  • Check IndexedDB for cached data
  • Monitor Network tab for failed requests

Resolution:

  1. Ensure service worker registered:

    if ('serviceWorker' in navigator) {
      navigator.serviceWorker.register('/sw.js')
        .then(reg => console.log('SW registered'))
        .catch(err => console.error('SW failed', err));
    }
    
  2. Add offline fallback in startup:

    public async Task InitializeAsync()
    {
        try
        {
            await _api.FetchInitialData();
        }
        catch (HttpRequestException)
        {
            _logger.LogWarning("Offline - using cached data");
            await LoadFromCache();
        }
    }
    

Prevention:

  • Cache essential data proactively
  • Implement offline-first architecture
  • Test app startup with network disabled

Issue: Offline Queue Growing Too Large

Symptoms:

  • Local storage filling up
  • App slowing down
  • “Storage quota exceeded”

Possible Causes:

  • Extended offline period
  • Sync failing silently
  • No queue size limit

Resolution:

// Implement queue management
public async Task AddToQueue(SyncItem item)
{
    var queueSize = await _localDb.SyncQueue.CountAsync();

    if (queueSize >= MAX_QUEUE_SIZE)
    {
        // Warn user
        await _notifications.ShowAsync(
            "Sync queue is full. Please connect to internet.");

        // Optional: Remove oldest low-priority items
        await _localDb.SyncQueue
            .Where(q => q.Priority == Priority.Low)
            .OrderBy(q => q.CreatedAt)
            .Take(100)
            .ExecuteDeleteAsync();
    }

    await _localDb.SyncQueue.AddAsync(item);
}

6. Performance Issues

Issue: Slow API Responses

Symptoms:

  • API calls taking > 1 second
  • Users complaining of lag
  • Timeouts occurring

Possible Causes:

  1. N+1 query problem
  2. Missing database indexes
  3. Large payloads
  4. No caching

Diagnostic Steps:

-- Find slow queries
SELECT query, calls, mean_time, total_time
FROM pg_stat_statements
ORDER BY mean_time DESC
LIMIT 10;

-- Check missing indexes
SELECT relname, seq_scan, idx_scan
FROM pg_stat_user_tables
WHERE seq_scan > idx_scan
ORDER BY seq_scan DESC;

Resolution:

  1. Fix N+1 queries:

    // Bad
    var orders = await _db.Orders.ToListAsync();
    foreach (var order in orders)
        order.Items = await _db.OrderItems.Where(...).ToListAsync();
    
    // Good
    var orders = await _db.Orders
        .Include(o => o.Items)
        .ToListAsync();
    
  2. Add missing indexes:

    CREATE INDEX idx_orders_tenant_date
    ON orders (tenant_id, created_at DESC);
    
    CREATE INDEX idx_inventory_sku
    ON inventory_items (sku);
    
  3. Implement caching:

    public async Task<Product> GetProductAsync(string sku)
    {
        return await _cache.GetOrCreateAsync($"product:{sku}", async entry =>
        {
            entry.AbsoluteExpirationRelativeToNow = TimeSpan.FromMinutes(5);
            return await _db.Products.FindAsync(sku);
        });
    }
    

Prevention:

  • Enable query logging in development
  • Set up performance monitoring
  • Establish response time budgets

Issue: Memory Usage Growing

Symptoms:

  • Container memory increasing over time
  • Out of memory errors
  • Slow garbage collection

Possible Causes:

  • Memory leak in code
  • Unbounded caches
  • Event handler accumulation
  • Large objects in memory

Diagnostic Steps:

# Monitor container memory
docker stats <container-name>

# Get memory dump (if dotnet-dump installed)
dotnet-dump collect -p <process-id>

Resolution:

  1. Dispose resources properly:

    // Use 'using' for disposables
    await using var connection = new NpgsqlConnection(connectionString);
    await connection.OpenAsync();
    
  2. Limit cache size:

    services.AddMemoryCache(options =>
    {
        options.SizeLimit = 1000;  // Max entries
    });
    
    _cache.Set(key, value, new MemoryCacheEntryOptions
    {
        Size = 1,
        SlidingExpiration = TimeSpan.FromMinutes(10)
    });
    
  3. Unsubscribe from events:

    public class MyComponent : IDisposable
    {
        public MyComponent(IEventBus bus)
        {
            _subscription = bus.Subscribe<OrderCreated>(HandleOrder);
        }
    
        public void Dispose()
        {
            _subscription?.Dispose();
        }
    }
    

7. Authentication Failures

Issue: JWT Token Rejected

Symptoms:

  • 401 Unauthorized responses
  • “Invalid token” errors
  • User suddenly logged out

Possible Causes:

  1. Token expired
  2. Wrong signing key
  3. Clock skew between servers
  4. Token issued for different audience

Diagnostic Steps:

# Decode JWT (don't do this with sensitive tokens in production)
echo "<token>" | cut -d. -f2 | base64 -d | jq

# Check claims
# Look for: exp, iss, aud

Resolution:

  1. Token expired - Implement refresh flow:

    if (response.StatusCode == HttpStatusCode.Unauthorized)
    {
        var newToken = await RefreshTokenAsync();
        // Retry with new token
    }
    
  2. Clock skew - Add tolerance:

    services.AddAuthentication().AddJwtBearer(options =>
    {
        options.TokenValidationParameters = new TokenValidationParameters
        {
            ClockSkew = TimeSpan.FromMinutes(5)
        };
    });
    
  3. Wrong key - Verify signing key matches:

    # Both services must use same key
    echo $JWT_SIGNING_KEY | base64
    

Issue: User Cannot Log In

Symptoms:

  • Login fails with valid credentials
  • “Invalid username or password”
  • Account not locked

Possible Causes:

  1. Password hashing mismatch
  2. User account disabled
  3. Tenant not active
  4. Case sensitivity issues

Resolution:

public async Task<LoginResult> LoginAsync(string email, string password)
{
    // Case-insensitive email lookup
    var user = await _db.Users
        .FirstOrDefaultAsync(u => u.Email.ToLower() == email.ToLower());

    if (user == null)
    {
        _logger.LogWarning("Login failed: user not found for {Email}", email);
        return LoginResult.Failed("Invalid credentials");
    }

    if (!user.IsActive)
    {
        _logger.LogWarning("Login failed: user {Email} is inactive", email);
        return LoginResult.Failed("Account is disabled");
    }

    if (!_hasher.Verify(password, user.PasswordHash))
    {
        _logger.LogWarning("Login failed: wrong password for {Email}", email);
        return LoginResult.Failed("Invalid credentials");
    }

    return LoginResult.Success(GenerateToken(user));
}

8. Integration Errors

Issue: Shopify Webhook Not Received

Symptoms:

  • Orders not appearing in POS
  • Inventory not syncing
  • Webhook endpoint returning errors

Possible Causes:

  1. Webhook not registered
  2. HMAC verification failing
  3. Endpoint not accessible
  4. SSL certificate issues

Diagnostic Steps:

# Check webhook registration
curl -X GET "https://{store}.myshopify.com/admin/api/2024-01/webhooks.json" \
  -H "X-Shopify-Access-Token: {token}"

# Test endpoint accessibility
curl -X POST https://your-domain.com/webhooks/shopify \
  -H "Content-Type: application/json" \
  -d '{"test": true}'

Resolution:

  1. Register webhook:

    curl -X POST "https://{store}.myshopify.com/admin/api/2024-01/webhooks.json" \
      -H "X-Shopify-Access-Token: {token}" \
      -H "Content-Type: application/json" \
      -d '{
        "webhook": {
          "topic": "orders/create",
          "address": "https://your-domain.com/webhooks/shopify",
          "format": "json"
        }
      }'
    
  2. Fix HMAC verification:

    public bool VerifyWebhook(HttpRequest request)
    {
        var hmacHeader = request.Headers["X-Shopify-Hmac-SHA256"];
        using var reader = new StreamReader(request.Body);
        var body = await reader.ReadToEndAsync();
    
        using var hmac = new HMACSHA256(Encoding.UTF8.GetBytes(_secret));
        var hash = Convert.ToBase64String(hmac.ComputeHash(Encoding.UTF8.GetBytes(body)));
    
        return hash == hmacHeader;
    }
    

Issue: Bridge Not Connecting

Symptoms:

  • Bridge status shows “Offline”
  • Commands stuck in pending
  • Heartbeats not received

Possible Causes:

  1. Tailscale VPN not connected
  2. Wrong server URL in bridge config
  3. Firewall blocking
  4. Bridge service not running

Diagnostic Steps:

# Check Tailscale status
tailscale status

# Test connectivity from bridge machine
curl http://100.124.10.65:2500/health

# Check bridge logs
Get-Content C:\ProgramData\StanlyBridge\logs\*.log -Tail 50

Resolution:

  1. Reconnect Tailscale:

    tailscale up
    
  2. Verify bridge configuration:

    // appsettings.json
    {
      "ServerUrl": "http://100.124.10.65:2500",
      "StoreCode": "GM"
    }
    
  3. Restart bridge service:

    Restart-Service StanlyBridge
    

9. Build and Deployment Failures

Issue: Docker Build Fails

Symptoms:

  • docker-compose up --build errors
  • Missing dependencies
  • “No such file or directory”

Possible Causes:

  1. Dockerfile syntax error
  2. Missing files in context
  3. Network issues downloading packages
  4. Incompatible base image

Resolution:

  1. Check .dockerignore:

    # Make sure necessary files aren't ignored
    # Bad:
    *.json
    
    # Good:
    *.log
    node_modules
    
  2. Multi-stage build issues:

    # Ensure COPY --from references correct stage
    FROM mcr.microsoft.com/dotnet/sdk:8.0 AS build
    WORKDIR /src
    COPY ["src/App/App.csproj", "src/App/"]
    RUN dotnet restore "src/App/App.csproj"
    
    FROM mcr.microsoft.com/dotnet/aspnet:8.0 AS final
    COPY --from=build /app/publish .  # 'build' must match stage name
    
  3. Clear Docker cache:

    docker builder prune
    docker-compose build --no-cache
    

Issue: Migration Fails on Deployment

Symptoms:

  • Container starts but crashes
  • “Database migration failed”
  • Schema out of sync

Possible Causes:

  1. Migration order issue
  2. Conflicting migrations
  3. Database connection during migration

Resolution:

  1. Run migrations separately:

    # Don't auto-migrate on startup
    # Instead, run migrations explicitly
    docker exec <container> dotnet ef database update
    
  2. Check migration history:

    SELECT * FROM "__EFMigrationsHistory" ORDER BY "MigrationId";
    
  3. Reset if needed (dev only!):

    # Remove all migrations and recreate
    dotnet ef database drop
    dotnet ef database update
    

Prevention:

  • Test migrations on copy of production data
  • Never modify published migrations
  • Keep migrations small and focused

Quick Reference: Error Codes

Error CodeMeaningFirst Step
400Bad RequestCheck request body/params
401UnauthorizedCheck token validity
403ForbiddenCheck user permissions
404Not FoundCheck ID/resource exists
409ConflictCheck version/concurrency
422Validation ErrorCheck input constraints
500Server ErrorCheck application logs
502Bad GatewayCheck upstream services
503Service UnavailableCheck service health
504Gateway TimeoutCheck network/timeouts

When All Else Fails

  1. Check the logs: docker logs <container> --tail 500
  2. Check the database: Direct query to verify data
  3. Check the network: docker network inspect
  4. Restart the container: Sometimes it just works
  5. Ask for help: Post in team chat with:
    • Exact error message
    • Steps to reproduce
    • What you’ve already tried
    • Relevant log snippets

The best debugging tool is a good night’s sleep. But if you need to fix it now, use these guides.