Architecture
WhatIsUp.dev is one Node process, one Postgres, one Redis. That's the whole runtime topology for v1, and it's a deliberate choice: every additional service is something you have to deploy, migrate, observe, and pay for.
The three pieces
βββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Backend (Fastify) β
β β’ REST API β instances, messages, webhooks β
β β’ Baileys session manager (in-process sockets) β
β β’ SSE stream for QR + state events β
β β’ BullMQ queue producer β
β β’ Webhook delivery worker (same process for v1) β
ββββββββββββββ¬βββββββββββββββββββββββββββββ¬ββββββββββββ
β β
βΌ βΌ
βββββββββββ βββββββββββ
β Postgresβ β Redis β
β (Kysely)β β (BullMQ)β
βββββββββββ βββββββββββ
- Postgres holds customers, API keys, instances, webhook endpoints, webhook deliveries (with payload bodies for replay), and an append-only audit log. Migrations run via a tiny in-house runner with an advisory lock and a
_migrationsledger β no Prisma, no schema generators, no surprise drift. - Redis is BullMQ's job board. Outbound webhook deliveries land there, the worker drains them with retries (exponential backoff, jitter), and the queue keeps the API path snappy β
POST /v1/messagesreturns the moment the row is staged, not the moment WhatsApp acks. - Baileys is loaded inside the Fastify process. Each connected instance owns a WebSocket to WhatsApp Web. We persist auth-state to disk per instance so a restart can resume without a fresh QR scan.
Process Model A vs B
We picked "Process Model A": webhook worker runs in-process. Pros: single binary, single deploy, no Redis-cross-process visibility issues. Cons: a slow customer endpoint can pin Node event-loop time that should be servicing API requests.
The mitigation is the per-host concurrency cap β a customer with a sluggish endpoint can only block their own deliveries, not the worker globally. When this stops being enough, splitting the worker into its own process is a one-file change because the queue, the delivery repo, and the Baileys session manager are all already isolated behind interfaces.
Customer / instance / API key model
customer
βββ api_key (one or more)
βββ instance
β βββ webhook_endpoint (delivery target)
β β βββ webhook_delivery (per-event row)
β βββ audit_event
βββ audit_event
A customer is a billing-and-isolation unit β every row in every other table FKs back to a customer. An instance is one phone number / WhatsApp connection. An API key is scoped to a customer; it can optionally be bound to a single instance for least-privilege use cases (e.g. a marketing app that should only be able to send from one number).
Trust boundaries
| Boundary | What's enforced |
|---|---|
| API key β customer | HMAC-SHA256(prefix|key) with env pepper; no plaintext stored. |
| Outbound webhook URL | Public-DNS only; loopback / link-local / RFC1918 / cloud-metadata IPs all rejected at create and delivery time (DNS-rebind defense). HTTPS-only enforced when NODE_ENV=production. |
| Webhook signing secret | AES-256-GCM at rest with key rotation (SECRETS_KEY + optional SECRETS_KEY_PREVIOUS). |
| Cross-customer access | Every query filters by customerId. No "global admin" path. |
| Audit log | Append-only audit_events table with ON DELETE SET NULL so history outlives the row it referenced. |
What's deliberately not here
- Multi-region. Single primary Postgres, single Redis. Adding a read-replica is a small DNS swap; multi-region writes is a different product.
- A blob store for media. Inbound attachments live in an in-process LRU cache and are served via signed proxy URLs that expire. Past v1, this becomes S3/R2.
- A separate worker process. See above. The seams are in place; the split is deferred.