WhatIsUp.dev
Comenzar
Esta página está disponible solo en inglés por ahora.

Architecture

WhatIsUp.dev is one Node process, one Postgres, one Redis. That's the whole runtime topology for v1, and it's a deliberate choice: every additional service is something you have to deploy, migrate, observe, and pay for.

The three pieces

┌─────────────────────────────────────────────────────┐
│   Backend (Fastify)                                 │
│   • REST API — instances, messages, webhooks        │
│   • Baileys session manager (in-process sockets)    │
│   • SSE stream for QR + state events                │
│   • BullMQ queue producer                           │
│   • Webhook delivery worker (same process for v1)   │
└────────────┬────────────────────────────┬───────────┘
             │                            │
             ▼                            ▼
        ┌─────────┐                 ┌─────────┐
        │ Postgres│                 │  Redis  │
        │ (Kysely)│                 │ (BullMQ)│
        └─────────┘                 └─────────┘
  • Postgres holds customers, API keys, instances, webhook endpoints, webhook deliveries (with payload bodies for replay), and an append-only audit log. Migrations run via a tiny in-house runner with an advisory lock and a _migrations ledger — no Prisma, no schema generators, no surprise drift.
  • Redis is BullMQ's job board. Outbound webhook deliveries land there, the worker drains them with retries (exponential backoff, jitter), and the queue keeps the API path snappy — POST /v1/messages returns the moment the row is staged, not the moment WhatsApp acks.
  • Baileys is loaded inside the Fastify process. Each connected instance owns a WebSocket to WhatsApp Web. We persist auth-state to disk per instance so a restart can resume without a fresh QR scan.

Process Model A vs B

We picked "Process Model A": webhook worker runs in-process. Pros: single binary, single deploy, no Redis-cross-process visibility issues. Cons: a slow customer endpoint can pin Node event-loop time that should be servicing API requests.

The mitigation is the per-host concurrency cap — a customer with a sluggish endpoint can only block their own deliveries, not the worker globally. When this stops being enough, splitting the worker into its own process is a one-file change because the queue, the delivery repo, and the Baileys session manager are all already isolated behind interfaces.

Customer / instance / API key model

customer
   ├── api_key (one or more)
   ├── instance
   │     ├── webhook_endpoint  (delivery target)
   │     │     └── webhook_delivery (per-event row)
   │     └── audit_event
   └── audit_event

A customer is a billing-and-isolation unit — every row in every other table FKs back to a customer. An instance is one phone number / WhatsApp connection. An API key is scoped to a customer; it can optionally be bound to a single instance for least-privilege use cases (e.g. a marketing app that should only be able to send from one number).

Trust boundaries

BoundaryWhat's enforced
API key → customerHMAC-SHA256(prefix|key) with env pepper; no plaintext stored.
Outbound webhook URLPublic-DNS only; loopback / link-local / RFC1918 / cloud-metadata IPs all rejected at create and delivery time (DNS-rebind defense). HTTPS-only enforced when NODE_ENV=production.
Webhook signing secretAES-256-GCM at rest with key rotation (SECRETS_KEY + optional SECRETS_KEY_PREVIOUS).
Cross-customer accessEvery query filters by customerId. No "global admin" path.
Audit logAppend-only audit_events table with ON DELETE SET NULL so history outlives the row it referenced.

What's deliberately not here

  • Multi-region. Single primary Postgres, single Redis. Adding a read-replica is a small DNS swap; multi-region writes is a different product.
  • A blob store for media. Inbound attachments live in an in-process LRU cache and are served via signed proxy URLs that expire. Past v1, this becomes S3/R2.
  • A separate worker process. See above. The seams are in place; the split is deferred.