Architecture Decisions
The load-bearing choices behind Fabriq — what was decided, and the reasoning that makes each one hold.
Fabriq's behaviour rests on a few deliberate choices. Each is stated below with the reasoning that makes it hold; the concept and operations pages cover how they work in code.
Standalone, strictly layered module
Fabriq is its own Go module (github.com/xraph/fabriq), not a package inside a larger application. The core/ kernel, the adapters/ engine dialects, and the domain/ entity packs are separated by hard boundaries that depguard enforces in CI: core/ may not import an adapter, and adapters may not import one another. The kernel stays engine-agnostic and independently testable, and any single engine can be swapped without touching application code.
Tenancy is enforced in three layers
Cross-tenant isolation is structural, not a convention reviewers must remember. Three layers, strongest first:
- Stamped transactions + RLS. Every tenant-table operation — reads included — runs in a transaction stamped with
set_config('app.tenant_id', …).FORCErow-level-security policies key on that setting, so even the raw-SQL escape hatch cannot cross tenants, and an unstamped session sees zero rows. The application connects as a non-superuser role, since RLS never constrains superusers. - Structural stamping. The command executor takes
id,tenant_id, andversionfrom context, never from the payload; channel, graph, index, and cache names are all derived from the context tenant in exactly one place. - A backstop hook. A pre-query hook denies any pool-path query against a tenant table outright (
ErrTenantHookTripped, with a metric). In this design, pool-path access to a tenant table is always a bug — denying is simpler and stronger than inspecting predicates.
See Tenancy for the mechanics.
go-redis lives in adapters/redis
The fan-out plane drives Redis directly through go-redis, fenced to adapters/ by depguard. The stream contract needs capabilities beyond a generic key-value layer: approximate MAXLEN ~ trimming on XADD, XAUTOCLAIM for crashed-consumer recovery, pipelined multi-stream publishes (the event stream plus one change channel per envelope), and exclusive XRANGE reads for Last-Event-ID resume. Containing the dependency in one adapter keeps the option to swap it later.
Singleton runners use advisory-lock leadership
The outbox relay and the reconciler must have exactly one active instance across worker replicas. Each campaigns for a session-level pg_advisory_lock on a dedicated Postgres connection; Postgres lock semantics make split-brain impossible. Because the lock dies with the session, a liveness watchdog abdicates the instant the connection is gone, and a replica wins the next campaign. Lock keys are fixed and never reused: 1001 for the relay, 1002 for the reconciler.
The relay is woken by LISTEN/NOTIFY, but polling is the contract
The relay drains the outbox with FOR UPDATE SKIP LOCKED. A pg_notify issued inside the command transaction wakes it, so a notification can never arrive before the data it refers to is committed. The notification is purely a latency optimization — the interval poll is the correctness mechanism. A broken listener degrades latency, never delivery.
No RLS on the telemetry hypertable
TimescaleDB's columnstore refuses tables with row security, and compressed telemetry is the entire reason Timescale is in the stack — so the tag_readings hypertable keeps compression and drops RLS. It is protected by compensating controls instead: it is reachable only through the time-series port (which stamps tenant_id structurally and validates the series name), it is not a registry entity (so the generic relational port cannot name it), and the raw-SQL guard rejects any query touching it without a literal tenant_id. Cross-tenant isolation here is integration-tested.
Horizontal scale is sharding by tenant — routing, not consensus
The tenant→shard routing layer, multi-shard Open, and the per-shard outbox relay are implemented. A catalog-backed placement directory (with controlled tenant moves), online rebalancing, and a shard-aware migrate CLI are not yet — see Sharding.
When the source of truth needs to scale past a single Postgres, Fabriq shards by tenant: a tenant's entire history, event log, and outbox live on exactly one shard. Two existing invariants turn this into a routing problem rather than a distributed-systems one. Every read carries a tenant and there are no cross-tenant queries, so there is no scatter-gather. A command never spans tenants, so one tenant routes to one shard's local ACID transaction — no two-phase commit, no sagas. Sharding enters as a routing adapter behind the existing ports plus a per-shard leader loop; core/, the facade, and every call site stay unchanged. And because projections are rebuildable from Postgres, moving a tenant means copying only its rows and log — projections are rebuilt on the far side.