Senior SWE · Domain Knowledge

Payroll, GDPR & Multi-tenancy

The domain depth that separates senior candidates from generalists. Payroll-correctness is unfamiliar to most engineers. Show you understand why these patterns exist, not just that they exist.

Payroll Correctness

Payroll is the highest-stakes Personio domain. Errors here = wrong paycheck = real human pain + legal exposure. Personio engineers think about correctness, idempotency, and replayability constantly.

The 8 commandments

Never use float for money. Postgres NUMERIC, Java BigDecimal, Python Decimal. Construct from strings.
Idempotency keys on every write. Same input twice = same output once.
Snapshot inputs at run start. Late data → new corrective run, not in-flight mutation.
Immutable results. Corrections create new rows referencing old ones. No UPDATE.
Versioned calculation rules. Store tax brackets with valid_from / valid_to. Pick by period.
Replay-safe. Event log + snapshots → reproduce any run later, byte-for-byte.
Banker's rounding (ROUND_HALF_EVEN) for tax. Quantize at the LAST step.
Currency captured per result. FX rate stamped at the moment of payroll close.

Why float is dangerous — the senior signal

Float vs Decimal in payroll context

# Float — what JS / naïve Python does
>>> 0.1 + 0.2
0.30000000000000004

>>> 4825.50 * 0.215     # tax on salary
1037.4824999999998           # actually € 1037.4825

>>> sum([0.1] * 100000)
9999.999999998355             # NOT 10000

# Multiplied across 100k payslips, this silently shifts cents.
# Across 6 months: tens of thousands of euros of drift.
# Personio engineers obsess over this.

# Decimal — the right way
from decimal import Decimal, ROUND_HALF_EVEN

salary  = Decimal('4825.50')
tax_pct = Decimal('0.215')
tax     = (salary * tax_pct).quantize(Decimal('0.01'), ROUND_HALF_EVEN)
# → Decimal('1037.48')   exact, deterministic, auditable

Idempotency demo

Simulate a payroll run where the same task is fired multiple times (retries, network glitches). The idempotency key prevents double-charging.

Payroll Idempotency Simulator

Attempts

Executed

Deduped

Total paid

€ 0.00

Versioned calculation rules

Tax brackets change. A payroll for January 2026 must use January 2026's rules, even if you re-run it in 2028.

CREATE TABLE tax_rules (
  country     CHAR(2) NOT NULL,
  valid_from  DATE NOT NULL,
  valid_to    DATE,                       -- NULL = currently active
  brackets    JSONB NOT NULL,           -- [{up_to: 12000, rate: 0.15}, ...]
  PRIMARY KEY (country, valid_from)
);

-- Pick rule for a payroll period
SELECT brackets
FROM tax_rules
WHERE country = 'DE'
  AND valid_from <= '2026-01-31'
  AND (valid_to IS NULL OR valid_to >= '2026-01-01')
ORDER BY valid_from DESC LIMIT 1;

Snapshot pattern for in-flight runs

Start payroll run │ ▼ atomically copy current state into a frozen snapshot: ┌──────────────────────────────────────────────────┐ │ run_id: 'r-2026-04-acme' │ │ started_at: 2026-05-01 02:00 UTC │ │ snapshot { │ │ employees: [...as of 2026-04-30 23:59 UTC...] │ │ rules_version: 'DE-2026-Q2' │ │ fx_rates: { EUR_USD: 1.087, ... } │ │ } │ └──────────────────────────────────────────────────┘ │ ▼ workers run against snapshot │ ▼ late edits to live employees? → NEW corrective run (referencing this run as parent), not a mutation

What to say in the interview

The senior-bar answer: "I'd build the payroll engine around three invariants: (1) all monetary math in Decimal, never float; (2) idempotent on (run_id, employee_id); (3) immutable results — corrections are new rows. Inputs are snapshotted at run start so late data doesn't corrupt the in-flight calc. Rules are versioned so re-running 2026's January payroll in 2028 produces identical output."

Common pitfalls to avoid

Banker's rounding inconsistency. Different libraries default to different modes. Always specify.
Pro-rata math. Joining mid-month? Salary × (days_worked / days_in_month). Days_in_month varies — don't hardcode 30.
Time zones in payroll period. "End of month" depends on the company's legal jurisdiction. Always store with TZ.
FX timing. Convert at the moment of period-close, not at payment-execution. Store the rate used.
Negative net pay. Deductions can exceed gross. Cap at zero with carry-forward, or fail loudly — never silently.

GDPR Patterns

GDPR is not optional for Personio — it's foundational. Every senior engineer there must have an opinion on right-to-erasure, data residency, and audit logs.

The right-to-erasure decision tree

Approach	What happens	Reversible?	GDPR-compliant?	Use for
Soft delete	`deleted_at` timestamp set. Row still exists.	Yes	❌ alone	Internal "undo." Useful but not sufficient for GDPR erase.
Anonymize	PII fields replaced with deterministic hash or "ERASED USER." Non-PII (salary history, dates) preserved.	No	✓	Default for HR. Preserves referential integrity in payroll/audit history.
Hard delete	Row physically removed.	No	✓	Free-text comments, draft data. Not for transactional records where FK matters.

The actual erasure pipeline

User requests erasure (verified identity) │ ▼ ErasureRequest row created (request_id, subject_id, requested_at, status='pending') │ ▼ Orchestrator iterates a CATALOG of all subject-data locations: - employees, time_off_requests, payslips, performance_reviews, documents (S3), audit_log, event_log, search_index, caches, backups │ ▼ Per location: apply per-table erasure policy ├─ employees: anonymize PII fields ├─ time_off: keep (no PII besides employee_id, which is now anonymous) ├─ payslips: keep (financial record, has legal retention) ├─ documents/S3: hard delete PDFs older than legal retention; anonymize newer ├─ audit_log: KEEP — Art. 17(3) exempts records needed for legal claims ├─ search_index: re-index with anonymized fields ├─ caches: invalidate └─ backups: schedule erasure for next rotation cycle │ ▼ Mark ErasureRequest.status = 'completed', emit AuditLog entry (the audit log records the erasure but cannot be erased itself)

Why anonymize beats hard delete in HR

If you hard-delete an employee, you break:

Payroll history (FK employee_id dangling).
Audit trail ("who approved this time off?" → null).
Manager-report tree (employees who reported to this person).
Statutory reporting (DE requires 10-year retention of payroll records).

Anonymization preserves the integrity of the record while removing the personal data:

-- Before
employees: { id: emp-42, name: 'Anna Schmidt', email: 'anna@acme.com',
             salary: 4825.50, manager_id: emp-12 }

-- After anonymize
employees: { id: emp-42, name: 'ERASED USER', email: '<erased>',
             salary: 4825.50, manager_id: emp-12,
             erased_at: '2026-05-15', erased_request_id: 'req-9' }

-- Payroll history STILL VALID — emp-42 still exists as a record.
-- Auditor can verify "salary 4825.50 was paid to emp-42 in April 2026"
-- without knowing who that person was.

Data residency strategies

EU customers must have their data stored in EU. Three approaches:

Strategy	How	Cost
Region-specific deployments	Whole stack duplicated per region. Customer routed to their region's stack.	High — but bulletproof.
Sharded with region flag	Single stack; shards live in different regions. Routing layer reads `data_region` from tenant config.	Medium — complex but cheaper.
Tenant pinning	Tenant assigned to region at signup. Reads/writes for that tenant stay in-region.	Medium — most common.

The audit-log exemption (Art. 17(3))

GDPR allows you to retain personal data when needed for legal claims, regulatory compliance, or to defend rights. Audit logs typically qualify. But: you must have a documented legal basis, limit access, and not use it for marketing or other purposes. Personio audit log includes the erasure event itself, recording the hash of erased data as proof of compliance.

SLAs and timelines

Acknowledge erasure request: within 1 month.
Complete erasure: "without undue delay," typically 30 days. Extensions allowed for complex cases.
Inform third parties: if you shared data with sub-processors (payroll providers, ATS integrations), you must propagate the erasure request to them.

Multi-tenancy Patterns

Personio runs 10k+ tenants on shared infrastructure. Multi-tenancy is in the DNA of every design decision.

The three models

Model	Schema	Isolation	Cost	Migration
Pool	Shared, `tenant_id` col on every table	Row-level (Postgres RLS)	Low	Single migration, all tenants
Bridge	Schema per tenant in shared DB	Schema-level	Medium	Migration per schema (10k schemas = pain)
Silo	DB per tenant	Strongest	High	Per-DB migration; backup/restore per tenant is simple

Personio default: Pool with Postgres RLS as a safety net. Move large/regulated tenants to silo as needed.

Pool model essentials

-- Every table has tenant_id as part of the primary key
CREATE TABLE employees (
  tenant_id    UUID NOT NULL,
  employee_id  UUID NOT NULL,
  ...
  PRIMARY KEY (tenant_id, employee_id)
);

-- Postgres Row-Level Security as a safety net
ALTER TABLE employees ENABLE ROW LEVEL SECURITY;

CREATE POLICY tenant_isolation ON employees
  USING (tenant_id = current_setting('app.current_tenant')::uuid);

-- Application sets the tenant context per request
SET LOCAL app.current_tenant = 'tenant-acme-uuid';
SELECT * FROM employees;  -- only Acme's rows

RLS as defense-in-depth, not primary defense. Application still filters by tenant_id explicitly. RLS catches bugs where a developer forgot to filter. Belt + suspenders.

The "tenant_id everywhere" discipline

Caches: never key by entity_id alone. Always (tenant_id, entity_id).
Search indexes: include tenant_id as a mandatory filter.
Queues / events: partition by tenant_id; events carry tenant in payload.
Logs / metrics: tag with tenant_id (but be careful about cardinality blowup).
Cross-tenant analytics: ETL to a separate warehouse. Never JOIN across tenants in OLTP.

Noisy neighbor mitigation

One slow tenant can exhaust resources for everyone. Strategies:

Per-tenant rate limits

Token bucket per tenant in Redis. Slow callers get 429, don't drag others down.

Connection pool quotas

Per-tenant max DB connections via PgBouncer pools or app-level semaphores.

Query timeouts

Postgres statement_timeout set per session. A 30s query won't lock the pool.

Dedicated worker pools

Large tenants get their own worker queue partition. Small tenants share.

Tenant move to silo

When a tenant exceeds a threshold (size, RPS, $$), migrate them to a dedicated DB.

Async heavy work

Anything over 100ms moves to a queue. Sync requests stay fast.

What to say in the interview

The senior-bar answer: "For Personio's scale (10k tenants, mostly small), Pool with row-level security is the right default. We tag every table with tenant_id in the PK, set the current tenant in Postgres session context per request, and rely on RLS as a defense-in-depth check. Large or regulated tenants can be migrated to a Silo model individually. Caches, queues, and search indexes all carry tenant_id."

Audit Log Design

Personio is a system of record. Auditors will query the audit log years later. Get the design right once.

Schema

CREATE TABLE audit_log (
  id            UUID DEFAULT gen_random_uuid(),
  tenant_id     UUID NOT NULL,
  ts            TIMESTAMPTZ NOT NULL DEFAULT NOW(),
  actor_id      UUID,            -- who did it (NULL for system)
  actor_type    VARCHAR(16),    -- user / system / integration
  resource_type VARCHAR(32) NOT NULL,  -- employee, payslip, timeoff…
  resource_id   UUID NOT NULL,
  action        VARCHAR(32) NOT NULL,  -- create, update, delete, approve…
  before_state  JSONB,
  after_state   JSONB,
  metadata      JSONB,           -- ip, user-agent, request_id
  prev_hash     BYTEA,           -- hash chain for tamper evidence
  hash          BYTEA NOT NULL,
  PRIMARY KEY (tenant_id, ts, id)
) PARTITION BY RANGE (ts);

-- Monthly partitions for fast retention management
CREATE TABLE audit_log_2026_05 PARTITION OF audit_log
  FOR VALUES FROM ('2026-05-01') TO ('2026-06-01');

-- Query: all changes to one employee, newest first
CREATE INDEX ON audit_log (tenant_id, resource_type, resource_id, ts DESC);

Tamper evidence via hash chain

Each row's hash = sha256(prev_hash + payload). If anyone alters a past row, all subsequent hashes break. Auditors can verify the chain without trusting your DB.

def append_audit(tenant_id, payload):
    with txn:
        prev = db.execute(
            "SELECT hash FROM audit_log WHERE tenant_id=%s ORDER BY ts DESC LIMIT 1",
            (tenant_id,)
        ).fetchone()
        prev_hash = prev[0] if prev else bytes(32)  # genesis
        h = hashlib.sha256(prev_hash + json.dumps(payload, sort_keys=True).encode())
        db.execute("INSERT INTO audit_log (..., prev_hash, hash) VALUES (...)",
                   ..., prev_hash, h.digest())

Write path — never block user writes

User action → API │ ▼ COMMIT transaction (user data) │ ▼ ASYNCHRONOUSLY emit audit event to Kafka │ ▼ Audit Consumer → INSERT into audit_log partition │ ▼ Periodically: archive older partitions to S3 Parquet, detach from hot DB, mount cold via Athena.

Why async? Audit insert latency shouldn't impact user request latency. If audit-log DB is down, user writes continue; audit catches up later via Kafka retention.

Retention by category

Category	Retention	Why
Payroll-adjacent	7–10 years (DE: 10y)	Statutory — tax authority audit
HR records	3–7 years post-employment	Labor law statute of limitations
Access logs (logins)	~1 year	Security investigation window
System events (deploys, etc.)	90 days hot, 1 year cold	Operational forensics

What to query for

"Show all changes to employee X" → (tenant_id, resource_type='employee', resource_id=X), ts DESC.
"Who approved payslip Y?" → (tenant_id, resource_type='payslip', resource_id=Y, action='approve').
"Compliance report: all PII access in Q1" → action='read', resource_type IN (...) partitioned scan.
"Erasure verification: hash chain integrity from 2024-01-01 to today" → recompute hashes, compare.

Common mistakes to avoid

Audit log in main DB without partitioning. Becomes a multi-billion-row monster. Partition by month from day 1.
Synchronous audit write inside user transaction. User writes slow down, fail when audit fails.
Storing full request bodies. PII explosion. Store before/after of changed fields only.
Forgetting tenant_id. Cross-tenant audit query = leak.
No tamper evidence. If audit can be edited, it's not audit — it's a log.

Reading this in the interview

One-liner: "Audit log is append-only, partitioned by month, hash-chained for tamper evidence, written async via Kafka so it doesn't block user requests. Retention by category — payroll-adjacent records get 10 years, access logs 1 year."