Chapter 37: Troubleshooting
Common Issues and Solutions for POS Platform
This chapter provides solutions for common problems encountered during development, deployment, and operation of the POS platform.
Table of Contents
- Database Connection Issues
- Tenant Isolation Failures
- Sync Conflicts
- Payment Processing Errors
- Offline Mode Problems
- Performance Issues
- Authentication Failures
- Integration Errors
- Build and Deployment Failures
1. Database Connection Issues
Issue: Container Cannot Connect to PostgreSQL
Symptoms:
- Application fails to start
- Error: “Connection refused” or “Host not found”
- EF Core throws
NpgsqlException
Possible Causes:
- PostgreSQL container not running
- Container not on correct Docker network
- Wrong connection string
- Firewall blocking port
Diagnostic Steps:
# Check if postgres16 is running
docker ps | grep postgres16
# Check network connectivity from app container
docker exec <app-container> ping postgres16
# Test port accessibility
docker exec <app-container> nc -zv postgres16 5432
# View PostgreSQL logs
docker logs postgres16 --tail 100
Resolution:
-
Container not running:
cd /volume1/docker/postgres docker-compose up -d -
Network misconfiguration:
# Verify network exists docker network ls | grep postgres_default # Create if missing docker network create postgres_default # Connect container to network docker network connect postgres_default <app-container> -
Wrong connection string:
# Correct format from container: Host=postgres16;Port=5432;Database=pos_db;Username=pos_user;Password=xxx # Correct format from host: Host=localhost;Port=5433;Database=pos_db;Username=pos_user;Password=xxx
Prevention:
- Always specify
postgres_defaultas external network in docker-compose - Use environment variables for connection strings
- Implement connection retry logic with exponential backoff
Issue: “Role does not exist” Error
Symptoms:
- Error:
FATAL: role "pos_user" does not exist
Possible Causes:
- Database user not created
- Wrong username in connection string
Resolution:
# Create the user
docker exec -it postgres16 psql -U postgres << EOF
CREATE USER pos_user WITH PASSWORD 'secure_password';
CREATE DATABASE pos_db OWNER pos_user;
GRANT ALL PRIVILEGES ON DATABASE pos_db TO pos_user;
EOF
2. Tenant Isolation Failures
Issue: Data Leaking Between Tenants
Symptoms:
- User sees data from another tenant
- Queries return unexpected results
- Security audit fails
Possible Causes:
- Missing
TenantIdfilter in query - Middleware not setting tenant context
- Background job not setting tenant
- DbContext not configured for tenant
Diagnostic Steps:
-- Check for records missing tenant_id
SELECT table_name
FROM information_schema.columns
WHERE column_name = 'tenant_id'
AND table_schema = 'public';
-- Find orphaned records
SELECT COUNT(*) FROM orders WHERE tenant_id IS NULL;
Resolution:
-
Missing filter - Add global query filter:
// In DbContext.OnModelCreating modelBuilder.Entity<Order>() .HasQueryFilter(o => o.TenantId == _tenantProvider.TenantId); -
Middleware issue:
// Verify middleware order in Program.cs app.UseAuthentication(); app.UseTenantMiddleware(); // Must be after auth app.UseAuthorization(); -
Background job:
// Always set tenant in background jobs using (var scope = _scopeFactory.CreateScope()) { var tenantProvider = scope.ServiceProvider.GetRequiredService<ITenantProvider>(); tenantProvider.SetTenant(tenantId); // ... do work }
Prevention:
- Enable Row-Level Security in PostgreSQL
- Add integration tests that verify isolation
- Review all queries for tenant filtering
- Use tenant-scoped DbContext factory
Issue: “Invalid TenantId” on Valid Request
Symptoms:
- 400 Bad Request with tenant errors
- User cannot access their own data
Possible Causes:
- Tenant ID not in JWT claims
- Tenant lookup failing
- Caching stale tenant data
Resolution:
// Debug: Log tenant resolution
_logger.LogDebug("Resolving tenant from claim: {TenantClaim}",
context.User.FindFirst("tenant_id")?.Value);
// Clear tenant cache
_cache.Remove($"tenant:{tenantId}");
3. Sync Conflicts
Issue: Offline Changes Overwritten
Symptoms:
- User makes offline edits, they disappear after sync
- Error: “Conflict detected”
- Data reverts to old state
Possible Causes:
- Last-write-wins without conflict detection
- Version mismatch
- Sync order incorrect
Diagnostic Steps:
-- Check version history
SELECT id, version, modified_at
FROM inventory_items
WHERE sku = 'ABC123'
ORDER BY version DESC;
-- Check event log
SELECT * FROM inventory_events
WHERE sku = 'ABC123'
ORDER BY created_at DESC LIMIT 10;
Resolution:
-
Implement optimistic concurrency:
public async Task<bool> UpdateAsync(Item item, int expectedVersion) { var affected = await _db.Items .Where(i => i.Id == item.Id && i.Version == expectedVersion) .ExecuteUpdateAsync(s => s .SetProperty(i => i.Name, item.Name) .SetProperty(i => i.Version, expectedVersion + 1)); return affected > 0; // False if version mismatch } -
Queue offline changes with timestamps:
// Store in local queue with client timestamp _localQueue.Enqueue(new SyncItem { Operation = "Update", ClientTimestamp = DateTimeOffset.UtcNow, Data = item });
Prevention:
- Use vector clocks or version vectors
- Implement merge strategies for specific entity types
- Show user when conflicts occur and let them choose
Issue: Sync Never Completes
Symptoms:
- “Syncing…” message never goes away
- Partial data sync
- Timeout errors
Possible Causes:
- Network interruption during sync
- Large payload timeout
- Server error during sync
Resolution:
// Implement chunked sync
public async Task SyncAsync()
{
var chunks = _localQueue.Chunk(100);
foreach (var chunk in chunks)
{
try
{
await _api.SyncBatchAsync(chunk);
_localQueue.MarkSynced(chunk);
}
catch (TimeoutException)
{
// Will retry next sync
break;
}
}
}
4. Payment Processing Errors
Issue: Payment Gateway Timeout
Symptoms:
- Payment hangs for 30+ seconds
- Error: “Request timeout”
- Uncertain if payment processed
Possible Causes:
- Network latency
- Gateway overloaded
- Invalid timeout configuration
Diagnostic Steps:
# Test gateway connectivity
curl -X GET https://api.paymentgateway.com/health -w "\nTime: %{time_total}s\n"
# Check recent payment attempts in logs
grep "payment" /var/log/pos/*.log | tail -50
Resolution:
-
Implement idempotency:
public async Task<PaymentResult> ProcessPaymentAsync( PaymentRequest request, string idempotencyKey) { // Check if already processed var existing = await _db.Payments .FirstOrDefaultAsync(p => p.IdempotencyKey == idempotencyKey); if (existing != null) return existing.ToResult(); // Process with gateway var result = await _gateway.ChargeAsync(request); // Save with idempotency key await _db.Payments.AddAsync(new Payment { IdempotencyKey = idempotencyKey, Status = result.Status }); return result; } -
Add timeout with retry:
var policy = Policy .Handle<TimeoutException>() .RetryAsync(3, onRetry: (ex, count) => { _logger.LogWarning("Payment retry {Count}: {Message}", count, ex.Message); }); await policy.ExecuteAsync(() => _gateway.ChargeAsync(request));
Prevention:
- Always use idempotency keys
- Set reasonable timeouts (15-30 seconds)
- Implement circuit breaker for gateway calls
- Queue payments if offline
Issue: Card Declined
Symptoms:
- Payment rejected
- Error code from gateway
Common Decline Codes:
| Code | Meaning | Action |
|---|---|---|
insufficient_funds | Not enough balance | Try different card |
card_declined | Generic decline | Contact card issuer |
expired_card | Card expired | Use different card |
incorrect_cvc | Wrong CVV | Re-enter |
processing_error | Gateway issue | Retry |
Resolution:
public string GetUserFriendlyMessage(string errorCode)
{
return errorCode switch
{
"insufficient_funds" => "Card declined. Please try a different payment method.",
"expired_card" => "This card has expired. Please use a different card.",
"incorrect_cvc" => "The security code is incorrect. Please verify and try again.",
_ => "Payment could not be processed. Please try again or use a different card."
};
}
5. Offline Mode Problems
Issue: Application Won’t Start Offline
Symptoms:
- App requires internet to launch
- Loading screen indefinitely
- Error: “Network request failed”
Possible Causes:
- Missing service worker
- No cached data
- API call in startup
Diagnostic Steps:
- Check browser DevTools > Application > Service Workers
- Check IndexedDB for cached data
- Monitor Network tab for failed requests
Resolution:
-
Ensure service worker registered:
if ('serviceWorker' in navigator) { navigator.serviceWorker.register('/sw.js') .then(reg => console.log('SW registered')) .catch(err => console.error('SW failed', err)); } -
Add offline fallback in startup:
public async Task InitializeAsync() { try { await _api.FetchInitialData(); } catch (HttpRequestException) { _logger.LogWarning("Offline - using cached data"); await LoadFromCache(); } }
Prevention:
- Cache essential data proactively
- Implement offline-first architecture
- Test app startup with network disabled
Issue: Offline Queue Growing Too Large
Symptoms:
- Local storage filling up
- App slowing down
- “Storage quota exceeded”
Possible Causes:
- Extended offline period
- Sync failing silently
- No queue size limit
Resolution:
// Implement queue management
public async Task AddToQueue(SyncItem item)
{
var queueSize = await _localDb.SyncQueue.CountAsync();
if (queueSize >= MAX_QUEUE_SIZE)
{
// Warn user
await _notifications.ShowAsync(
"Sync queue is full. Please connect to internet.");
// Optional: Remove oldest low-priority items
await _localDb.SyncQueue
.Where(q => q.Priority == Priority.Low)
.OrderBy(q => q.CreatedAt)
.Take(100)
.ExecuteDeleteAsync();
}
await _localDb.SyncQueue.AddAsync(item);
}
6. Performance Issues
Issue: Slow API Responses
Symptoms:
- API calls taking > 1 second
- Users complaining of lag
- Timeouts occurring
Possible Causes:
- N+1 query problem
- Missing database indexes
- Large payloads
- No caching
Diagnostic Steps:
-- Find slow queries
SELECT query, calls, mean_time, total_time
FROM pg_stat_statements
ORDER BY mean_time DESC
LIMIT 10;
-- Check missing indexes
SELECT relname, seq_scan, idx_scan
FROM pg_stat_user_tables
WHERE seq_scan > idx_scan
ORDER BY seq_scan DESC;
Resolution:
-
Fix N+1 queries:
// Bad var orders = await _db.Orders.ToListAsync(); foreach (var order in orders) order.Items = await _db.OrderItems.Where(...).ToListAsync(); // Good var orders = await _db.Orders .Include(o => o.Items) .ToListAsync(); -
Add missing indexes:
CREATE INDEX idx_orders_tenant_date ON orders (tenant_id, created_at DESC); CREATE INDEX idx_inventory_sku ON inventory_items (sku); -
Implement caching:
public async Task<Product> GetProductAsync(string sku) { return await _cache.GetOrCreateAsync($"product:{sku}", async entry => { entry.AbsoluteExpirationRelativeToNow = TimeSpan.FromMinutes(5); return await _db.Products.FindAsync(sku); }); }
Prevention:
- Enable query logging in development
- Set up performance monitoring
- Establish response time budgets
Issue: Memory Usage Growing
Symptoms:
- Container memory increasing over time
- Out of memory errors
- Slow garbage collection
Possible Causes:
- Memory leak in code
- Unbounded caches
- Event handler accumulation
- Large objects in memory
Diagnostic Steps:
# Monitor container memory
docker stats <container-name>
# Get memory dump (if dotnet-dump installed)
dotnet-dump collect -p <process-id>
Resolution:
-
Dispose resources properly:
// Use 'using' for disposables await using var connection = new NpgsqlConnection(connectionString); await connection.OpenAsync(); -
Limit cache size:
services.AddMemoryCache(options => { options.SizeLimit = 1000; // Max entries }); _cache.Set(key, value, new MemoryCacheEntryOptions { Size = 1, SlidingExpiration = TimeSpan.FromMinutes(10) }); -
Unsubscribe from events:
public class MyComponent : IDisposable { public MyComponent(IEventBus bus) { _subscription = bus.Subscribe<OrderCreated>(HandleOrder); } public void Dispose() { _subscription?.Dispose(); } }
7. Authentication Failures
Issue: JWT Token Rejected
Symptoms:
- 401 Unauthorized responses
- “Invalid token” errors
- User suddenly logged out
Possible Causes:
- Token expired
- Wrong signing key
- Clock skew between servers
- Token issued for different audience
Diagnostic Steps:
# Decode JWT (don't do this with sensitive tokens in production)
echo "<token>" | cut -d. -f2 | base64 -d | jq
# Check claims
# Look for: exp, iss, aud
Resolution:
-
Token expired - Implement refresh flow:
if (response.StatusCode == HttpStatusCode.Unauthorized) { var newToken = await RefreshTokenAsync(); // Retry with new token } -
Clock skew - Add tolerance:
services.AddAuthentication().AddJwtBearer(options => { options.TokenValidationParameters = new TokenValidationParameters { ClockSkew = TimeSpan.FromMinutes(5) }; }); -
Wrong key - Verify signing key matches:
# Both services must use same key echo $JWT_SIGNING_KEY | base64
Issue: User Cannot Log In
Symptoms:
- Login fails with valid credentials
- “Invalid username or password”
- Account not locked
Possible Causes:
- Password hashing mismatch
- User account disabled
- Tenant not active
- Case sensitivity issues
Resolution:
public async Task<LoginResult> LoginAsync(string email, string password)
{
// Case-insensitive email lookup
var user = await _db.Users
.FirstOrDefaultAsync(u => u.Email.ToLower() == email.ToLower());
if (user == null)
{
_logger.LogWarning("Login failed: user not found for {Email}", email);
return LoginResult.Failed("Invalid credentials");
}
if (!user.IsActive)
{
_logger.LogWarning("Login failed: user {Email} is inactive", email);
return LoginResult.Failed("Account is disabled");
}
if (!_hasher.Verify(password, user.PasswordHash))
{
_logger.LogWarning("Login failed: wrong password for {Email}", email);
return LoginResult.Failed("Invalid credentials");
}
return LoginResult.Success(GenerateToken(user));
}
8. Integration Errors
Issue: Shopify Webhook Not Received
Symptoms:
- Orders not appearing in POS
- Inventory not syncing
- Webhook endpoint returning errors
Possible Causes:
- Webhook not registered
- HMAC verification failing
- Endpoint not accessible
- SSL certificate issues
Diagnostic Steps:
# Check webhook registration
curl -X GET "https://{store}.myshopify.com/admin/api/2024-01/webhooks.json" \
-H "X-Shopify-Access-Token: {token}"
# Test endpoint accessibility
curl -X POST https://your-domain.com/webhooks/shopify \
-H "Content-Type: application/json" \
-d '{"test": true}'
Resolution:
-
Register webhook:
curl -X POST "https://{store}.myshopify.com/admin/api/2024-01/webhooks.json" \ -H "X-Shopify-Access-Token: {token}" \ -H "Content-Type: application/json" \ -d '{ "webhook": { "topic": "orders/create", "address": "https://your-domain.com/webhooks/shopify", "format": "json" } }' -
Fix HMAC verification:
public bool VerifyWebhook(HttpRequest request) { var hmacHeader = request.Headers["X-Shopify-Hmac-SHA256"]; using var reader = new StreamReader(request.Body); var body = await reader.ReadToEndAsync(); using var hmac = new HMACSHA256(Encoding.UTF8.GetBytes(_secret)); var hash = Convert.ToBase64String(hmac.ComputeHash(Encoding.UTF8.GetBytes(body))); return hash == hmacHeader; }
Issue: Bridge Not Connecting
Symptoms:
- Bridge status shows “Offline”
- Commands stuck in pending
- Heartbeats not received
Possible Causes:
- Tailscale VPN not connected
- Wrong server URL in bridge config
- Firewall blocking
- Bridge service not running
Diagnostic Steps:
# Check Tailscale status
tailscale status
# Test connectivity from bridge machine
curl http://100.124.10.65:2500/health
# Check bridge logs
Get-Content C:\ProgramData\StanlyBridge\logs\*.log -Tail 50
Resolution:
-
Reconnect Tailscale:
tailscale up -
Verify bridge configuration:
// appsettings.json { "ServerUrl": "http://100.124.10.65:2500", "StoreCode": "GM" } -
Restart bridge service:
Restart-Service StanlyBridge
9. Build and Deployment Failures
Issue: Docker Build Fails
Symptoms:
docker-compose up --builderrors- Missing dependencies
- “No such file or directory”
Possible Causes:
- Dockerfile syntax error
- Missing files in context
- Network issues downloading packages
- Incompatible base image
Resolution:
-
Check .dockerignore:
# Make sure necessary files aren't ignored # Bad: *.json # Good: *.log node_modules -
Multi-stage build issues:
# Ensure COPY --from references correct stage FROM mcr.microsoft.com/dotnet/sdk:8.0 AS build WORKDIR /src COPY ["src/App/App.csproj", "src/App/"] RUN dotnet restore "src/App/App.csproj" FROM mcr.microsoft.com/dotnet/aspnet:8.0 AS final COPY --from=build /app/publish . # 'build' must match stage name -
Clear Docker cache:
docker builder prune docker-compose build --no-cache
Issue: Migration Fails on Deployment
Symptoms:
- Container starts but crashes
- “Database migration failed”
- Schema out of sync
Possible Causes:
- Migration order issue
- Conflicting migrations
- Database connection during migration
Resolution:
-
Run migrations separately:
# Don't auto-migrate on startup # Instead, run migrations explicitly docker exec <container> dotnet ef database update -
Check migration history:
SELECT * FROM "__EFMigrationsHistory" ORDER BY "MigrationId"; -
Reset if needed (dev only!):
# Remove all migrations and recreate dotnet ef database drop dotnet ef database update
Prevention:
- Test migrations on copy of production data
- Never modify published migrations
- Keep migrations small and focused
Quick Reference: Error Codes
| Error Code | Meaning | First Step |
|---|---|---|
| 400 | Bad Request | Check request body/params |
| 401 | Unauthorized | Check token validity |
| 403 | Forbidden | Check user permissions |
| 404 | Not Found | Check ID/resource exists |
| 409 | Conflict | Check version/concurrency |
| 422 | Validation Error | Check input constraints |
| 500 | Server Error | Check application logs |
| 502 | Bad Gateway | Check upstream services |
| 503 | Service Unavailable | Check service health |
| 504 | Gateway Timeout | Check network/timeouts |
When All Else Fails
- Check the logs:
docker logs <container> --tail 500 - Check the database: Direct query to verify data
- Check the network:
docker network inspect - Restart the container: Sometimes it just works
- Ask for help: Post in team chat with:
- Exact error message
- Steps to reproduce
- What you’ve already tried
- Relevant log snippets
The best debugging tool is a good night’s sleep. But if you need to fix it now, use these guides.