Docker and Logs Review - Gitinext-Golang (1-Month Analysis)

Date: 2025-01-31
Scope: Docker Swarm stack, service logs, and codebase error patterns.

1. What Was Reviewed

  • Docker: Dockerfile, docker-compose.swarm.yml, stack services and env vars.
  • Logs: No live log files in repo; this doc explains how to fetch and what to look for.
  • Code: Startup panic/Fatal patterns, error logging, known issues from CRITICAL_ISSUES_REPORT.md, monitoring alerts, config mismatches.

2. How to Fetch Logs (Docker Swarm)

Stack name: gitinext-golang.

docker service ls | grep gitinext-golang
make aux-logs SVC=gateway
make aux-logs SVC=wallet
make aux-logs SVC=payment
make aux-logs SVC=watcher-ton
make aux-logs SVC=watcher-tron
make aux-logs SVC=telegrambot
docker service logs --tail 500 gitinext-golang_gateway
docker service logs --tail 2000 gitinext-golang_gateway 2>&1 | grep -iE 'error|panic|fatal|failed'
docker service logs --tail 2000 gitinext-golang_watcher-ton 2>&1 | grep -iE 'error|circuit|RPC|402|429|422'

3. Known Issues

3.1 RPC / Watchers (see docs/CRITICAL_ISSUES_REPORT.md)

TON/TRON watchers: GetBlock 402, NowNodes 422404, public 429405. Circuit breakers open; no deposit detection. Fix: renew GetBlock/NowNodes, add TonCenter/TronGrid fallbacks, restart watchers.

3.2 RabbitMQ URL - Telegrambot (fixed in repo)

Telegrambot default was unencoded (gitinext@rabbitmq). In AMQP URLs @ and ! must be percent-encoded. Default updated to amqp://gitinext%40rabbitmq:GitiNext%40Rabbit2025%21Secure@rabbitmq:5672/. Check telegrambot logs for RabbitMQ/connection/auth errors.

3.3 PostgreSQL 18 Volume

Compose uses postgres_data:/var/lib/postgresql/18/docker. PG 18 image typically uses /var/lib/postgresql/18/main or mount on /var/lib/postgresql. If you see Postgres init errors, try postgres_data:/var/lib/postgresql.

3.4 Startup Failures

Gateway, account, wallet, ledger, docs, storage, payment, telegrambot, voucher, withdrawal use panic/log.Fatal/os.Exit(1) on missing config or DB/Redis/proxy failure. Ensure .env and Postgres/Redis/RabbitMQ are correct. Check docker service ps <svc> --no-trunc for Exit 1.

3.5 Payment

See monitoring/alerts/payment_service_alerts.yml. Logs: PayStar, API key, refresh, verification, deposit creation.

3.6 Wallet

TWC: ENABLE_TWC_PLUGIN=true with missing library -> panic. Sweeps: TWC returned EMPTY signed tx, broadcast failed. Vault: VAULT_ENABLED=true and unreachable/wrong secrets -> startup or decrypt failures.

3.7 Withdrawal

Missing/invalid hot wallet encryption key -> exit. Exonyx/gRPC errors in logs.

4. Config Checklist

POSTGRES*, REDIS*, RABBITMQ_URL (percent-encoded), JWT_SECRET, ACCOUNT_JWT_SECRET, TELEGRAM_BOT_TOKEN, WALLET_ENC_KEY, Vault secrets if VAULT_ENABLED=true. For external APIs (QuickNode, NowNodes, TonCenter, PayStar, etc.) see docs/EXTERNAL_APIS_AUDIT.md and set valid keys in .env. Run ./scripts/check-external-apis.sh to test endpoints.

5. One-Month Log Review Checklist

  1. Fetch errors per service: grep error/panic/fatal/failed on last 2k lines.
  2. docker service ps for restarts and Failed state.
  3. Watchers: circuit, 402, 429, 422, non-200, RPC.
  4. Payment: PayStar, refresh, verification.
  5. RabbitMQ: payment, watchers, telegrambot, gateway.
  6. DB/Redis: gateway, account, wallet, ledger, withdrawal.
  7. Prometheus: PaymentServiceUnavailable, PayStarAPIHighErrorRate, PaymentServiceAPIKeyRefreshFailure.

6. Files Touched

  • docs/DOCKER_AND_LOGS_REVIEW.md: New.
  • docker-compose.swarm.yml: Telegrambot RABBITMQ_URL default set to percent-encoded value.

7. Mentor Tips

  1. Understood: Full Docker + logs review after ~1 month; find errors and issues.
  2. If unclear: Specify environment or priority services.
  3. Files considered: Dockerfile, docker-compose.swarm.yml, Makefile aux-logs, CRITICAL_ISSUES_REPORT.md, payment_service_alerts.yml, main.go error paths, packages/rpc/provider.go.
  4. Changes: New DOCKER_AND_LOGS_REVIEW.md; fixed telegrambot RABBITMQ_URL in docker-compose.swarm.yml.
  5. Per-file: DOCKER_AND_LOGS_REVIEW.md new; docker-compose.swarm.yml one-line RABBITMQ_URL fix.

8. Service-by-Service Log Review (Run 2025-01-31)

Logs and task status were reviewed one by one starting from gateway. Summary below.

8.1 Gateway

  • Replicas: 22 Running.
  • Task history: Past failures ~13 days ago (exit 2, exit 255). Current replicas stable.
  • Recent logs: Mostly GET /healthz 200 (Traefik health checks). No ERROR/WARN in sampled tail.

8.2 Wallet

  • Replicas: 12 — one replica not running.
  • Task history: wallet.1 is Pending 4 weeks with “no suitable node (insufficient resources on 1 node)”. Only one node; wallet resource limits prevent a second replica.
  • Action: Add a worker node, or reduce wallet resource requests, or run with 1 replica by design.

8.3 Payment

  • Replicas: 11 Running.
  • Task history: Multiple Failed (exit 1) ~12 days ago. Now stable.
  • Log errors (current): PayStar statement API error, status 400, message (Persian): “توکن احراز هویت اشتباه است” (authentication token is wrong). Deposit sync fails every cycle.
  • Action: Fix PayStar credentials (PAYSTAR_APPLICATION_ID, PAYSTAR_ACCESS_PASSWORD, PAYSTAR_REFRESH_TOKEN). Refresh token in PayStar dashboard and update env.

8.4 Watcher-TON

  • Replicas: 11 Running.
  • Task history: Past Failed (exit 1, 255) ~13 days ago. Now running.
  • Log errors (current): Circuit breaker open for provider toncenter; RPC status 429 (rate limited). “All RPC providers failed” for getBlockTransactions — TON deposit detection not working.
  • Action: Add/rotate TON RPC providers; reduce rate or add paid tier so circuit breakers can close.

8.5 Watcher-TRON

  • Replicas: 11 Running.
  • Task history: Past Failed (exit 1, 255); once Rejected 6 weeks ago: “No such image: registry.nextgiti.cloud/watcher:latest”. Resolved.

8.6 Telegrambot

  • Replicas: 11 Running.
  • Task history: Past Failed (exit 1, 255) ~13 days ago and 2–7 weeks ago. Now stable.

8.7 Account

  • Replicas: 22 Running.
  • Task history: Past Failed (exit 2, 255). Now stable.

8.8 Withdrawal

  • Replicas: 22 Running.
  • Task history: Past Failed (exit 1, 255) ~13 days ago. Now stable.

8.9 Ledger

  • Replicas: 0/2 — service not running.
  • Task history: Rejected“No such image: registry.nextgiti.cloud/ledger:latest”. Image missing from registry.
  • Action: Build and push ledger image, then update service or redeploy stack.

8.10 Priority Summary

Priority Issue Service Action
P0 Ledger image missing ledger Build, push, update service
P0 PayStar token invalid payment Update PAYSTAR_* credentials; refresh token
P1 TON deposits broken watcher-ton Fix RPC 429; add/rotate providers
P1 Wallet 12 replicas wallet Add node or reduce resource request
P2 Historical exit 1/2/255 multiple Monitor; already recovered

© 2025 GitiNext - Enterprise Crypto Infrastructure | GitHub | Website