← Hub Strategy Technical Problems
Senior SWE · Domain Knowledge

Payroll, GDPR & Multi-tenancy

The domain depth that separates senior candidates from generalists. Payroll-correctness is unfamiliar to most engineers. Show you understand why these patterns exist, not just that they exist.

Payroll Correctness

Payroll is the highest-stakes Personio domain. Errors here = wrong paycheck = real human pain + legal exposure. Personio engineers think about correctness, idempotency, and replayability constantly.

The 8 commandments

  1. Never use float for money. Postgres NUMERIC, Java BigDecimal, Python Decimal. Construct from strings.
  2. Idempotency keys on every write. Same input twice = same output once.
  3. Snapshot inputs at run start. Late data → new corrective run, not in-flight mutation.
  4. Immutable results. Corrections create new rows referencing old ones. No UPDATE.
  5. Versioned calculation rules. Store tax brackets with valid_from / valid_to. Pick by period.
  6. Replay-safe. Event log + snapshots → reproduce any run later, byte-for-byte.
  7. Banker's rounding (ROUND_HALF_EVEN) for tax. Quantize at the LAST step.
  8. Currency captured per result. FX rate stamped at the moment of payroll close.

Why float is dangerous — the senior signal

Float vs Decimal in payroll context
# Float — what JS / naïve Python does
>>> 0.1 + 0.2
0.30000000000000004

>>> 4825.50 * 0.215     # tax on salary
1037.4824999999998           # actually € 1037.4825

>>> sum([0.1] * 100000)
9999.999999998355             # NOT 10000

# Multiplied across 100k payslips, this silently shifts cents.
# Across 6 months: tens of thousands of euros of drift.
# Personio engineers obsess over this.

# Decimal — the right way
from decimal import Decimal, ROUND_HALF_EVEN

salary  = Decimal('4825.50')
tax_pct = Decimal('0.215')
tax     = (salary * tax_pct).quantize(Decimal('0.01'), ROUND_HALF_EVEN)
# → Decimal('1037.48')   exact, deterministic, auditable

Idempotency demo

Simulate a payroll run where the same task is fired multiple times (retries, network glitches). The idempotency key prevents double-charging.

Payroll Idempotency Simulator
Attempts
0
Executed
0
Deduped
0
Total paid
€ 0.00

Versioned calculation rules

Tax brackets change. A payroll for January 2026 must use January 2026's rules, even if you re-run it in 2028.

CREATE TABLE tax_rules (
  country     CHAR(2) NOT NULL,
  valid_from  DATE NOT NULL,
  valid_to    DATE,                       -- NULL = currently active
  brackets    JSONB NOT NULL,           -- [{up_to: 12000, rate: 0.15}, ...]
  PRIMARY KEY (country, valid_from)
);

-- Pick rule for a payroll period
SELECT brackets
FROM tax_rules
WHERE country = 'DE'
  AND valid_from <= '2026-01-31'
  AND (valid_to IS NULL OR valid_to >= '2026-01-01')
ORDER BY valid_from DESC LIMIT 1;

Snapshot pattern for in-flight runs

Start payroll run │ ▼ atomically copy current state into a frozen snapshot: ┌──────────────────────────────────────────────────┐ │ run_id: 'r-2026-04-acme' │ │ started_at: 2026-05-01 02:00 UTC │ │ snapshot { │ │ employees: [...as of 2026-04-30 23:59 UTC...] │ │ rules_version: 'DE-2026-Q2' │ │ fx_rates: { EUR_USD: 1.087, ... } │ │ } │ └──────────────────────────────────────────────────┘ │ ▼ workers run against snapshot │ ▼ late edits to live employees? → NEW corrective run (referencing this run as parent), not a mutation

What to say in the interview

The senior-bar answer: "I'd build the payroll engine around three invariants: (1) all monetary math in Decimal, never float; (2) idempotent on (run_id, employee_id); (3) immutable results — corrections are new rows. Inputs are snapshotted at run start so late data doesn't corrupt the in-flight calc. Rules are versioned so re-running 2026's January payroll in 2028 produces identical output."

Common pitfalls to avoid

GDPR Patterns

GDPR is not optional for Personio — it's foundational. Every senior engineer there must have an opinion on right-to-erasure, data residency, and audit logs.

The right-to-erasure decision tree

ApproachWhat happensReversible?GDPR-compliant?Use for
Soft delete deleted_at timestamp set. Row still exists. Yes ❌ alone Internal "undo." Useful but not sufficient for GDPR erase.
Anonymize PII fields replaced with deterministic hash or "ERASED USER." Non-PII (salary history, dates) preserved. No Default for HR. Preserves referential integrity in payroll/audit history.
Hard delete Row physically removed. No Free-text comments, draft data. Not for transactional records where FK matters.

The actual erasure pipeline

User requests erasure (verified identity) │ ▼ ErasureRequest row created (request_id, subject_id, requested_at, status='pending') │ ▼ Orchestrator iterates a CATALOG of all subject-data locations: - employees, time_off_requests, payslips, performance_reviews, documents (S3), audit_log, event_log, search_index, caches, backups │ ▼ Per location: apply per-table erasure policy ├─ employees: anonymize PII fields ├─ time_off: keep (no PII besides employee_id, which is now anonymous) ├─ payslips: keep (financial record, has legal retention) ├─ documents/S3: hard delete PDFs older than legal retention; anonymize newer ├─ audit_log: KEEP — Art. 17(3) exempts records needed for legal claims ├─ search_index: re-index with anonymized fields ├─ caches: invalidate └─ backups: schedule erasure for next rotation cycle │ ▼ Mark ErasureRequest.status = 'completed', emit AuditLog entry (the audit log records the erasure but cannot be erased itself)

Why anonymize beats hard delete in HR

If you hard-delete an employee, you break:

Anonymization preserves the integrity of the record while removing the personal data:

-- Before
employees: { id: emp-42, name: 'Anna Schmidt', email: 'anna@acme.com',
             salary: 4825.50, manager_id: emp-12 }

-- After anonymize
employees: { id: emp-42, name: 'ERASED USER', email: '<erased>',
             salary: 4825.50, manager_id: emp-12,
             erased_at: '2026-05-15', erased_request_id: 'req-9' }

-- Payroll history STILL VALID — emp-42 still exists as a record.
-- Auditor can verify "salary 4825.50 was paid to emp-42 in April 2026"
-- without knowing who that person was.

Data residency strategies

EU customers must have their data stored in EU. Three approaches:

StrategyHowCost
Region-specific deploymentsWhole stack duplicated per region. Customer routed to their region's stack.High — but bulletproof.
Sharded with region flagSingle stack; shards live in different regions. Routing layer reads data_region from tenant config.Medium — complex but cheaper.
Tenant pinningTenant assigned to region at signup. Reads/writes for that tenant stay in-region.Medium — most common.

The audit-log exemption (Art. 17(3))

GDPR allows you to retain personal data when needed for legal claims, regulatory compliance, or to defend rights. Audit logs typically qualify. But: you must have a documented legal basis, limit access, and not use it for marketing or other purposes. Personio audit log includes the erasure event itself, recording the hash of erased data as proof of compliance.

SLAs and timelines

Multi-tenancy Patterns

Personio runs 10k+ tenants on shared infrastructure. Multi-tenancy is in the DNA of every design decision.

The three models

ModelSchemaIsolationCostMigration
PoolShared, tenant_id col on every tableRow-level (Postgres RLS)LowSingle migration, all tenants
BridgeSchema per tenant in shared DBSchema-levelMediumMigration per schema (10k schemas = pain)
SiloDB per tenantStrongestHighPer-DB migration; backup/restore per tenant is simple

Personio default: Pool with Postgres RLS as a safety net. Move large/regulated tenants to silo as needed.

Pool model essentials

-- Every table has tenant_id as part of the primary key
CREATE TABLE employees (
  tenant_id    UUID NOT NULL,
  employee_id  UUID NOT NULL,
  ...
  PRIMARY KEY (tenant_id, employee_id)
);

-- Postgres Row-Level Security as a safety net
ALTER TABLE employees ENABLE ROW LEVEL SECURITY;

CREATE POLICY tenant_isolation ON employees
  USING (tenant_id = current_setting('app.current_tenant')::uuid);

-- Application sets the tenant context per request
SET LOCAL app.current_tenant = 'tenant-acme-uuid';
SELECT * FROM employees;  -- only Acme's rows
RLS as defense-in-depth, not primary defense. Application still filters by tenant_id explicitly. RLS catches bugs where a developer forgot to filter. Belt + suspenders.

The "tenant_id everywhere" discipline

Noisy neighbor mitigation

One slow tenant can exhaust resources for everyone. Strategies:

Per-tenant rate limits

Token bucket per tenant in Redis. Slow callers get 429, don't drag others down.

Connection pool quotas

Per-tenant max DB connections via PgBouncer pools or app-level semaphores.

Query timeouts

Postgres statement_timeout set per session. A 30s query won't lock the pool.

Dedicated worker pools

Large tenants get their own worker queue partition. Small tenants share.

Tenant move to silo

When a tenant exceeds a threshold (size, RPS, $$), migrate them to a dedicated DB.

Async heavy work

Anything over 100ms moves to a queue. Sync requests stay fast.

What to say in the interview

The senior-bar answer: "For Personio's scale (10k tenants, mostly small), Pool with row-level security is the right default. We tag every table with tenant_id in the PK, set the current tenant in Postgres session context per request, and rely on RLS as a defense-in-depth check. Large or regulated tenants can be migrated to a Silo model individually. Caches, queues, and search indexes all carry tenant_id."

Audit Log Design

Personio is a system of record. Auditors will query the audit log years later. Get the design right once.

Schema

CREATE TABLE audit_log (
  id            UUID DEFAULT gen_random_uuid(),
  tenant_id     UUID NOT NULL,
  ts            TIMESTAMPTZ NOT NULL DEFAULT NOW(),
  actor_id      UUID,            -- who did it (NULL for system)
  actor_type    VARCHAR(16),    -- user / system / integration
  resource_type VARCHAR(32) NOT NULL,  -- employee, payslip, timeoff…
  resource_id   UUID NOT NULL,
  action        VARCHAR(32) NOT NULL,  -- create, update, delete, approve…
  before_state  JSONB,
  after_state   JSONB,
  metadata      JSONB,           -- ip, user-agent, request_id
  prev_hash     BYTEA,           -- hash chain for tamper evidence
  hash          BYTEA NOT NULL,
  PRIMARY KEY (tenant_id, ts, id)
) PARTITION BY RANGE (ts);

-- Monthly partitions for fast retention management
CREATE TABLE audit_log_2026_05 PARTITION OF audit_log
  FOR VALUES FROM ('2026-05-01') TO ('2026-06-01');

-- Query: all changes to one employee, newest first
CREATE INDEX ON audit_log (tenant_id, resource_type, resource_id, ts DESC);

Tamper evidence via hash chain

Each row's hash = sha256(prev_hash + payload). If anyone alters a past row, all subsequent hashes break. Auditors can verify the chain without trusting your DB.

def append_audit(tenant_id, payload):
    with txn:
        prev = db.execute(
            "SELECT hash FROM audit_log WHERE tenant_id=%s ORDER BY ts DESC LIMIT 1",
            (tenant_id,)
        ).fetchone()
        prev_hash = prev[0] if prev else bytes(32)  # genesis
        h = hashlib.sha256(prev_hash + json.dumps(payload, sort_keys=True).encode())
        db.execute("INSERT INTO audit_log (..., prev_hash, hash) VALUES (...)",
                   ..., prev_hash, h.digest())

Write path — never block user writes

User action → API │ ▼ COMMIT transaction (user data) │ ▼ ASYNCHRONOUSLY emit audit event to Kafka │ ▼ Audit Consumer → INSERT into audit_log partition │ ▼ Periodically: archive older partitions to S3 Parquet, detach from hot DB, mount cold via Athena.

Why async? Audit insert latency shouldn't impact user request latency. If audit-log DB is down, user writes continue; audit catches up later via Kafka retention.

Retention by category

CategoryRetentionWhy
Payroll-adjacent7–10 years (DE: 10y)Statutory — tax authority audit
HR records3–7 years post-employmentLabor law statute of limitations
Access logs (logins)~1 yearSecurity investigation window
System events (deploys, etc.)90 days hot, 1 year coldOperational forensics

What to query for

Common mistakes to avoid

Reading this in the interview

One-liner: "Audit log is append-only, partitioned by month, hash-chained for tamper evidence, written async via Kafka so it doesn't block user requests. Retention by category — payroll-adjacent records get 10 years, access logs 1 year."