The domain depth that separates senior candidates from generalists. Payroll-correctness is unfamiliar to most engineers. Show you understand why these patterns exist, not just that they exist.
Payroll is the highest-stakes Personio domain. Errors here = wrong paycheck = real human pain + legal exposure. Personio engineers think about correctness, idempotency, and replayability constantly.
NUMERIC, Java BigDecimal, Python Decimal. Construct from strings.# Float — what JS / naïve Python does >>> 0.1 + 0.2 0.30000000000000004 >>> 4825.50 * 0.215 # tax on salary 1037.4824999999998 # actually € 1037.4825 >>> sum([0.1] * 100000) 9999.999999998355 # NOT 10000 # Multiplied across 100k payslips, this silently shifts cents. # Across 6 months: tens of thousands of euros of drift. # Personio engineers obsess over this. # Decimal — the right way from decimal import Decimal, ROUND_HALF_EVEN salary = Decimal('4825.50') tax_pct = Decimal('0.215') tax = (salary * tax_pct).quantize(Decimal('0.01'), ROUND_HALF_EVEN) # → Decimal('1037.48') exact, deterministic, auditable
Simulate a payroll run where the same task is fired multiple times (retries, network glitches). The idempotency key prevents double-charging.
Tax brackets change. A payroll for January 2026 must use January 2026's rules, even if you re-run it in 2028.
CREATE TABLE tax_rules ( country CHAR(2) NOT NULL, valid_from DATE NOT NULL, valid_to DATE, -- NULL = currently active brackets JSONB NOT NULL, -- [{up_to: 12000, rate: 0.15}, ...] PRIMARY KEY (country, valid_from) ); -- Pick rule for a payroll period SELECT brackets FROM tax_rules WHERE country = 'DE' AND valid_from <= '2026-01-31' AND (valid_to IS NULL OR valid_to >= '2026-01-01') ORDER BY valid_from DESC LIMIT 1;
(run_id, employee_id); (3) immutable results — corrections are new rows. Inputs are snapshotted at run start so late data doesn't corrupt the in-flight calc. Rules are versioned so re-running 2026's January payroll in 2028 produces identical output."GDPR is not optional for Personio — it's foundational. Every senior engineer there must have an opinion on right-to-erasure, data residency, and audit logs.
| Approach | What happens | Reversible? | GDPR-compliant? | Use for |
|---|---|---|---|---|
| Soft delete | deleted_at timestamp set. Row still exists. |
Yes | ❌ alone | Internal "undo." Useful but not sufficient for GDPR erase. |
| Anonymize | PII fields replaced with deterministic hash or "ERASED USER." Non-PII (salary history, dates) preserved. | No | ✓ | Default for HR. Preserves referential integrity in payroll/audit history. |
| Hard delete | Row physically removed. | No | ✓ | Free-text comments, draft data. Not for transactional records where FK matters. |
If you hard-delete an employee, you break:
employee_id dangling).Anonymization preserves the integrity of the record while removing the personal data:
-- Before employees: { id: emp-42, name: 'Anna Schmidt', email: 'anna@acme.com', salary: 4825.50, manager_id: emp-12 } -- After anonymize employees: { id: emp-42, name: 'ERASED USER', email: '<erased>', salary: 4825.50, manager_id: emp-12, erased_at: '2026-05-15', erased_request_id: 'req-9' } -- Payroll history STILL VALID — emp-42 still exists as a record. -- Auditor can verify "salary 4825.50 was paid to emp-42 in April 2026" -- without knowing who that person was.
EU customers must have their data stored in EU. Three approaches:
| Strategy | How | Cost |
|---|---|---|
| Region-specific deployments | Whole stack duplicated per region. Customer routed to their region's stack. | High — but bulletproof. |
| Sharded with region flag | Single stack; shards live in different regions. Routing layer reads data_region from tenant config. | Medium — complex but cheaper. |
| Tenant pinning | Tenant assigned to region at signup. Reads/writes for that tenant stay in-region. | Medium — most common. |
Personio runs 10k+ tenants on shared infrastructure. Multi-tenancy is in the DNA of every design decision.
| Model | Schema | Isolation | Cost | Migration |
|---|---|---|---|---|
| Pool | Shared, tenant_id col on every table | Row-level (Postgres RLS) | Low | Single migration, all tenants |
| Bridge | Schema per tenant in shared DB | Schema-level | Medium | Migration per schema (10k schemas = pain) |
| Silo | DB per tenant | Strongest | High | Per-DB migration; backup/restore per tenant is simple |
Personio default: Pool with Postgres RLS as a safety net. Move large/regulated tenants to silo as needed.
-- Every table has tenant_id as part of the primary key CREATE TABLE employees ( tenant_id UUID NOT NULL, employee_id UUID NOT NULL, ... PRIMARY KEY (tenant_id, employee_id) ); -- Postgres Row-Level Security as a safety net ALTER TABLE employees ENABLE ROW LEVEL SECURITY; CREATE POLICY tenant_isolation ON employees USING (tenant_id = current_setting('app.current_tenant')::uuid); -- Application sets the tenant context per request SET LOCAL app.current_tenant = 'tenant-acme-uuid'; SELECT * FROM employees; -- only Acme's rows
(tenant_id, entity_id).One slow tenant can exhaust resources for everyone. Strategies:
Token bucket per tenant in Redis. Slow callers get 429, don't drag others down.
Per-tenant max DB connections via PgBouncer pools or app-level semaphores.
Postgres statement_timeout set per session. A 30s query won't lock the pool.
Large tenants get their own worker queue partition. Small tenants share.
When a tenant exceeds a threshold (size, RPS, $$), migrate them to a dedicated DB.
Anything over 100ms moves to a queue. Sync requests stay fast.
tenant_id in the PK, set the current tenant in Postgres session context per request, and rely on RLS as a defense-in-depth check. Large or regulated tenants can be migrated to a Silo model individually. Caches, queues, and search indexes all carry tenant_id."Personio is a system of record. Auditors will query the audit log years later. Get the design right once.
CREATE TABLE audit_log ( id UUID DEFAULT gen_random_uuid(), tenant_id UUID NOT NULL, ts TIMESTAMPTZ NOT NULL DEFAULT NOW(), actor_id UUID, -- who did it (NULL for system) actor_type VARCHAR(16), -- user / system / integration resource_type VARCHAR(32) NOT NULL, -- employee, payslip, timeoff… resource_id UUID NOT NULL, action VARCHAR(32) NOT NULL, -- create, update, delete, approve… before_state JSONB, after_state JSONB, metadata JSONB, -- ip, user-agent, request_id prev_hash BYTEA, -- hash chain for tamper evidence hash BYTEA NOT NULL, PRIMARY KEY (tenant_id, ts, id) ) PARTITION BY RANGE (ts); -- Monthly partitions for fast retention management CREATE TABLE audit_log_2026_05 PARTITION OF audit_log FOR VALUES FROM ('2026-05-01') TO ('2026-06-01'); -- Query: all changes to one employee, newest first CREATE INDEX ON audit_log (tenant_id, resource_type, resource_id, ts DESC);
Each row's hash = sha256(prev_hash + payload). If anyone alters a past row, all subsequent hashes break. Auditors can verify the chain without trusting your DB.
def append_audit(tenant_id, payload): with txn: prev = db.execute( "SELECT hash FROM audit_log WHERE tenant_id=%s ORDER BY ts DESC LIMIT 1", (tenant_id,) ).fetchone() prev_hash = prev[0] if prev else bytes(32) # genesis h = hashlib.sha256(prev_hash + json.dumps(payload, sort_keys=True).encode()) db.execute("INSERT INTO audit_log (..., prev_hash, hash) VALUES (...)", ..., prev_hash, h.digest())
Why async? Audit insert latency shouldn't impact user request latency. If audit-log DB is down, user writes continue; audit catches up later via Kafka retention.
| Category | Retention | Why |
|---|---|---|
| Payroll-adjacent | 7–10 years (DE: 10y) | Statutory — tax authority audit |
| HR records | 3–7 years post-employment | Labor law statute of limitations |
| Access logs (logins) | ~1 year | Security investigation window |
| System events (deploys, etc.) | 90 days hot, 1 year cold | Operational forensics |
(tenant_id, resource_type='employee', resource_id=X), ts DESC.(tenant_id, resource_type='payslip', resource_id=Y, action='approve').action='read', resource_type IN (...) partitioned scan.