The actual Personio take-home (their public repo), live coding round patterns, system design canonical questions, and domain deep-dives. ★ = critical · D = interactive demo.
Take-Home Coding 3 sections
Personio's actual challenge is public on GitHub. This is the silent dealbreaker round — bad README/tests kills the offer even with clean code.
Yesterday's "Buy milk" not done? Today's still shows. Both visible until each marked done.
Data model (sketch)
CREATE TABLE reminders (
id UUID PRIMARY KEY,
employee_id UUID NOT NULL,
text VARCHAR(512) NOT NULL,
start_date DATE NOT NULL,
send_at_time TIME NULL, -- optional, 5-min precision
recur_freq VARCHAR(10) NULL, -- DAILY/WEEKLY/MONTHLY/YEARLY/NULL
recur_step INT DEFAULT1,
timezone VARCHAR(64) NOT NULL, -- e.g. 'Europe/Berlin'
created_at TIMESTAMPTZ DEFAULT NOW()
);
CREATE TABLE reminder_occurrences (
reminder_id UUID REFERENCES reminders(id) ON DELETE CASCADE,
occurs_on DATE NOT NULL, -- specific occurrence
done_at TIMESTAMPTZ NULL,
email_sent_at TIMESTAMPTZ NULL, -- idempotency anchorPRIMARY KEY (reminder_id, occurs_on)
);
Why a separate reminder_occurrences table? Because "marked done" and "email sent" are per-occurrence state. Don't bake recurrence expansion into queries — materialize occurrences lazily as you encounter them. (Discuss alternative: expand-on-read for sparse recurrences.)
Architecture (clean architecture, matches their scaffold)
Separation of I/O from logic. Domain has no DB or SMTP. Use cases take repos as deps.
Timezone handling. Store UTC, convert with employee's ZoneId. Test for DST transitions.
Idempotent email.email_sent_at column. Send loop: UPDATE … WHERE email_sent_at IS NULL RETURNING … — atomic claim.
Tests for the core rule. Recurrence expansion, timezone edge cases (DST spring-forward), "marked done before send" path.
1-page README. Assumptions · scope · how to run · what you'd do next.
Invite critique. List 2–3 things you'd change in the follow-up. Coachability signal.
Common mistakes
Using LocalDateTime instead of ZonedDateTime / OffsetDateTime.
Materializing all future occurrences (infinite recursion risk).
Using a boolean email_sent column instead of a timestamp — can't tell when.
One giant service class with controller + DB + email all mixed.
No tests for the "past reminder not yet done" case (it's in the spec).
The 80/20 of this challenge: if you spend 50% on architecture/tests/README and 50% on features, you pass. Most candidates spend 95% on features. Don't be those candidates.
2
Frontend: Candidate Table (ATS view)
MediumReact / FE
If you're interviewing for frontend, expect a live-coding round building a candidate table: name, position, application date, status, email, years of experience, age.
Features they typically ask for
Render a table from given JSON.
Filter (text search on name + position).
Sort (clickable column headers, asc/desc).
Pagination or virtualized scroll for 1000+ rows.
Status updates inline (dropdown that PATCHes the server).
What graders look for
Component decomposition. Don't put filter + sort + pagination all in one component.
State location. Lift only what needs to be shared. URL params for filter/sort = bonus.
Accessibility. Semantic <table>, keyboard-navigable sort headers, ARIA labels on actions.
Error/empty states. Empty filtered result, loading skeleton, fetch error.
Tests. At least one for sort, one for filter.
3
README template (steal this)
★ CriticalCommunication
The README is graded harder than the code. A junior should clone, run, and review your repo in 15 minutes.
# Reminder Service
## What this does (1 line)
A backend service that lets employees create reminders that
appear on their dashboard and optionally email them.
## Assumptions
- All times stored in UTC; displayed in employee's timezone.
- Recurrence is expanded lazily (on read) — see DECISIONS.md.
- Email delivery is at-least-once with idempotency via
`email_sent_at` claim. SMTP failures retry up to 3 times.
## What's NOT in scope
- Authentication (assumes upstream API gateway handles this).
- Multi-tenant isolation (single-tenant for this exercise).
- Push notifications (only email).
## How to run
```bash
docker-compose up postgres
./gradlew bootRun
curl localhost:8080/reminders -d '...'
```
## How to test
```bash
./gradlew test
```
## Trade-offs I made
1. Picked **lazy expansion** over materialized occurrences.
Pros: no future-occurrence storage. Cons: more complex
"next due" query.
2. Used a single `reminders` + `reminder_occurrences` schema
instead of separate `recurring_reminders`. Simpler joins,
one source of truth for done state.
3. Email loop polls every 1 minute, not event-driven. Fine for
this scale; would switch to per-reminder scheduled job at
10x scale.
## What I'd do next
- Add a `next_due_at` cached column to avoid scanning all
reminders every minute.
- Add an event bus for done/email-sent so other services
(audit log) can subscribe.
- Add retry-with-backoff for SMTP failures.
- More tests for DST edge cases.
## Tech choices
- Kotlin + Spring Boot — scaffold, fluent with the stack.
- PostgreSQL — relational fits the recurrence/done model.
- JOOQ — type-safe SQL, no JPA magic.
Why this works: it pre-empts every question the interviewer would ask. They open the README, see assumptions, trade-offs, what you'd do next — they're already nodding before they read the code.
Coding Round 7 problems · 7 demos
Live coding via CoderPad. LeetCode-medium difficulty but framed in HR/payroll language. Data structure choice signals seniority.
1
Date range overlap (time-off conflicts)
LikelyMediumSweep / Sort
Problem
Given a list of time-off requests [(start, end), …] for a team, find all overlapping pairs (potential staffing conflicts).
Live demo
Vacation Overlap Detector
Approach (O(n log n) sweep)
deffind_overlaps(requests):
# requests: list of (id, start, end) tuples
events = []
for r in requests:
events.append((r.start, 0, r)) # 0 = start
events.append((r.end, 1, r)) # 1 = end (sorts after start)
events.sort()
active = set()
overlaps = []
for day, kind, r in events:
if kind == 0: # startfor a in active:
overlaps.append((a, r))
active.add(r)
else: # end
active.discard(r)
return overlaps
Follow-ups
Half-day requests. Switch to hour-precision events. Same sweep.
Approval status filter. Only count status='approved' against the limit.
Team capacity rule. "Max 2 people off at once" → reject when len(active) >= 2.
Holidays as conflicts. Add public holidays as everyone-occupied events.
2
Recurring event expansion (matches take-home)
★ CriticalMediumDate arithmetic
Problem
Given a recurrence rule (frequency + interval + start date), return all occurrences within a time window. This is the core of their reminder challenge — and very likely a live-coding follow-up.
Live demo
Recurrence Expander
Code
from datetime import date
from dateutil.relativedelta import relativedelta
defexpand(start, freq, interval, window_end):
occurrences = []
cur = start
deltas = {
'DAILY': relativedelta(days=interval),
'WEEKLY': relativedelta(weeks=interval),
'MONTHLY': relativedelta(months=interval),
'YEARLY': relativedelta(years=interval),
}
delta = deltas[freq]
while cur <= window_end:
occurrences.append(cur)
cur = cur + delta
return occurrences
Gotchas Personio will probe
MONTHLY edge case: start on Jan 31 → next MONTHLY occurrence is Feb 28/29 (not "Mar 3"). Use a date library that handles this (e.g., dateutil.relativedelta in Python, java.time.LocalDate.plusMonths in Java).
DST in WEEKLY/DAILY: if the recurrence has a time, the same wall-clock time can land on different UTC instants. Store wall time + ZoneId, compute UTC at expansion.
Unbounded expansion: always require a window bound. Don't return an infinite generator without a stop condition.
Memoization for read-heavy: cache materialized occurrences in a table once you've expanded them.
Library shoutout: mention RFC 5545 (iCalendar) — the spec for recurrence rules. Tools like rrule (Python) or biweekly (Java) implement it. If asked "how would you scale this?" → "I'd adopt iCalendar RRULE strings and a battle-tested expander."
3
Org chart traversal
LikelyMediumTree / BFS
Problem
Each employee has a manager_id. Implement:
chain_of_command(employee) — list of managers up to CEO
all_reports(employee) — all direct + indirect reports
common_manager(a, b) — lowest common ancestor in the org tree
Live demo — click any employee
Org Chart Explorer
Mode:
Code
defchain_of_command(emp):
chain = []
cur = emp
while cur.manager:
cur = cur.manager
chain.append(cur)
return chain
defall_reports(emp):
result = []
queue = [emp]
while queue:
e = queue.pop(0)
for r in e.direct_reports:
result.append(r)
queue.append(r)
return result
defcommon_manager(a, b):
# LCA via set intersection on chains
chain_a = set(chain_of_command(a) + [a])
cur = b
while cur:
if cur in chain_a: return cur
cur = cur.manager
returnNone# different trees
SQL recursive CTE. For a database-backed query: WITH RECURSIVE … UNION ALL …. Mention you'd index on manager_id.
Performance for "all reports at scale." Materialized path or nested-set model. Trade-offs: write-heavy hurts these.
4
RBAC permission resolver
LikelyMediumSets / Graphs
Problem
A user has roles. Roles have permissions. Roles can inherit from other roles. Implement can(user, action, resource).
Live demo
RBAC Resolver — Click a user
Users
Effective roles (transitive)
Effective permissions
Approach
defeffective_permissions(user, role_graph, role_perms):
# BFS through role inheritance
visited = set()
queue = list(user.roles)
perms = set()
while queue:
role = queue.pop()
if role in visited: continue
visited.add(role)
perms |= role_perms.get(role, set())
queue.extend(role_graph.get(role, [])) # parent rolesreturn perms
defcan(user, action, resource_type):
perms = effective_permissions(user, ...)
return (action, resource_type) in perms or \
('*', resource_type) in perms or \
('*', '*') in perms
Senior-level follow-ups
Resource-level perms: "user can edit only HER team's records" → policy includes a filter, not just allow/deny.
Negative permissions: "deny overrides allow" semantics. Walk both lists.
Caching: compute effective_permissions(user) once at login; invalidate on role change.
Query-time vs check-time: for list endpoints, include the user's effective ACL set in the SQL WHERE. Don't fetch + filter.
5
Decimal money arithmetic (NEVER use float)
★ CriticalEasy conceptNumeric Correctness
Why this matters at Personio
Payroll. Floating-point arithmetic silently drops cents in ways that compound over thousands of payslips. Using float for money is a senior-level red flag.
Live proof
Float vs Decimal — see the bug live
The right way
# Pythonfrom decimal import Decimal, ROUND_HALF_EVEN, getcontext
getcontext().prec = 28
salary = Decimal('4825.50')
tax_rate = Decimal('0.215')
tax = (salary * tax_rate).quantize(Decimal('0.01'), rounding=ROUND_HALF_EVEN)
net = salary - tax
# Always:# 1. Construct from STRING, not float. Decimal(0.1) ≠ Decimal('0.1').# 2. Quantize at the LAST step, not intermediates.# 3. Use Banker's rounding (ROUND_HALF_EVEN) for tax/payroll.# 4. Store as DECIMAL(precision, scale) in DB — never FLOAT.
classMoney:
def__init__(self, amount: Decimal, currency: str):
self.amount = amount; self.currency = currency
def__add__(self, other):
assertself.currency == other.currency, "can't add EUR + USD"returnMoney(self.amount + other.amount, self.currency)
# Multiplication only by a scalar (Decimal), never by another Moneydef__mul__(self, scalar: Decimal):
returnMoney((self.amount * scalar).quantize(Decimal('0.01')),
self.currency)
Drop this in casually during code review: "I'd wrap money values in a Money type to make currency mismatches a compile error." Instant senior signal in HR-tech.
6
Timezone-aware scheduler (reminder edge cases)
★ CriticalMediumTime / DST
Problem
Schedule "14:05 Europe/Berlin" reliably even during DST transitions. The take-home spec says emails are time-zone aware — they'll quiz you on this.
Live demo — DST gotcha
DST Edge Case Simulator
Wall time:Zone:
The right pattern
Store wall time + zone (e.g., 14:05 + Europe/Berlin), not a UTC instant. Otherwise DST changes shift the user's intended time.
Resolve to UTC at expansion time, per occurrence.
Handle the "gap" and "overlap":
Spring forward (clock jumps 2→3 AM): if reminder is 2:30, it doesn't exist that day. Pick policy: skip, or shift to 3:00.
Fall back (clock repeats 2→3 AM): if reminder is 2:30, it happens twice. Pick policy: first occurrence wins.
Use the tzdata library, never hardcode offsets. UTC offsets change (Russia, Brazil have).
from zoneinfo import ZoneInfo
from datetime import datetime, time, date
defresolve_to_utc(occurrence_date: date, wall_time: time, zone: str):
local = datetime.combine(occurrence_date, wall_time, tzinfo=ZoneInfo(zone))
return local.astimezone(ZoneInfo('UTC'))
7
Approval workflow state machine
MediumState Machine
Problem
Time-off request flows: draft → submitted → manager_approved → hr_approved → done. Plus side-states: rejected, cancelled. Implement state transitions with validation.
Explicit transition table beats giant if/else. Easy to audit, easy to render visually for HR users.
Audit log per transition — for GDPR / compliance.
Idempotency: if "manager_approve" is called twice, second call is a no-op (same final state) — not an error.
Authorization at transition: manager_approve requires actor ∈ {manager_of(req.employee)}.
System Design 6 problems
Personio system design is not FAANG-scale. They want correctness + compliance + multi-tenancy. Lead with constraints (GDPR, tenant isolation, payroll correctness), not topology.
The 7-step framework, always: Clarify → Data model → APIs/flows → Storage → Scale → Reliability → Security. Verbalize it. Score 2× for leading with compliance + multi-tenancy before drawing boxes.
1
Payroll run engine ⭐⭐ (THE canonical question)
★ CriticalHardWorkflow / Correctness
Prompt
Design a system to run monthly payroll for 10,000 European companies. It must be: idempotent, auditable, tolerant of late data, and accurate to the cent.
Clarifying questions (always ask first)
What's the time SLA — same day? Within an hour?
How are calculation rules versioned? (Tax law changes mid-year.)
Can a payroll be re-run? Edited after-the-fact? Locked?
Multiple currencies?
Are we generating PDFs (payslips) or just numbers?
Architecture
┌─────────────────┐
│ Pre-flight check│ validates all employees have required data,
│ (idempotent) │ no missing tax IDs, etc. → emits PreflightOK event
└────────┬────────┘
▼
┌──────────────────────────────────────────────────┐
│ Payroll Run Orchestrator │
│ - generates run_id (UUID, idempotency anchor) │
│ - snapshots calc_rules_version + employee_data │
│ - emits PayrollRunStarted (event log) │
└────────┬─────────────────────────────────────────┘
│ partitions by tenant + employee batch
▼
┌──────────────────────────────────────────────────┐
│ Calc Worker Pool (per partition, stateless) │
│ - reads frozen snapshot │
│ - applies versioned rules │
│ - writes results (idempotent on (run_id, emp_id))│
└────────┬─────────────────────────────────────────┘
▼
┌──────────────────────────────────────────────────┐
│ Results store (immutable, append-only) │
│ - never overwritten; corrections = new run │
└────────┬─────────────────────────────────────────┘
▼
┌──────────────────────────────────────────────────┐
│ PDF generator + audit log + payouts integration │
└──────────────────────────────────────────────────┘
Key design decisions (verbalize these)
Idempotency anchor:run_id. Same run re-executed = same output (or skip if already done).
Snapshot at start. Freeze employee data + calc rules version. Late edits don't corrupt the in-flight run.
Immutable results. Errors → new "corrective run" referencing the old. Never UPDATE a payroll record. Auditors thank you.
Decimal arithmetic everywhere. Postgres NUMERIC(15,4); Kotlin BigDecimal. Document it.
Versioned rules. Tax brackets stored as tax_rules(country, valid_from, valid_to, rule_json). Calculator picks by date.
Late data: if a salary change arrives after the run, you have two choices: corrective run (preferred), or block until cutoff. Discuss the trade-off.
Per-tenant partitioning. No cross-tenant rows. Each calc worker only ever sees one tenant at a time.
Replayable: from the event log + snapshots, you can reproduce any run bit-for-bit, months later, for audit.
Failure modes to discuss
Worker dies mid-employee: partial writes are OK because results table has (run_id, emp_id) as primary key. Next worker retries.
Calc rule changes mid-run: snapshot prevents this. Run uses the version captured at start.
Bank payout fails: separate phase from calculation. Calculation succeeds independently; payout is retried via DLQ.
Don't get cute. They want PostgreSQL + Kafka + workers, not Cassandra + Spark + a service mesh. Boring + auditable wins.
2
Multi-tenant employee directory with RBAC
LikelyHardMulti-tenant
Prompt
Design an employee directory: 10k tenants × ~500 employees each. Sub-100ms reads. Permissions vary per user.
Multi-tenancy model: pick one and defend
Model
Isolation
Cost
Use case
Pool (shared DB, tenant_id col)
Row-level via Postgres RLS
Low
Most Personio data — default choice.
Bridge (schema per tenant)
Schema-level
Medium
Tenants with heavy customization.
Silo (DB per tenant)
Hardest
High
Enterprise tier; regulated; large tenants.
Schema highlights
CREATE TABLE employees (
tenant_id UUID NOT NULL,
employee_id UUID NOT NULL,
email TEXT,
full_name TEXT,
manager_id UUID,
data_region VARCHAR(8) NOT NULL, -- 'EU', 'UK', etc.PRIMARY KEY (tenant_id, employee_id)
);
CREATE INDEX ON employees (tenant_id, manager_id); -- org tree-- Postgres Row-Level Security: every query auto-filters by tenantCREATE POLICY tenant_isolation ON employees
USING (tenant_id = current_setting('app.current_tenant')::uuid);
What to discuss
RLS as a safety net, not the primary defense. App still filters explicitly.
Tenant-aware cache keys. Never key by just employee_id; always (tenant_id, employee_id).
Noisy neighbor: per-tenant rate limits + connection pool quotas.
Data residency: EU customers must have data in EU. Either route by region (region-specific deployments) or shard with explicit data_region col + policy.
RBAC at query time: include user's effective permission set in the WHERE, not post-fetch filter.
3
GDPR right-to-erasure flow
★ CriticalHardCompliance
Prompt
An employee leaves a company; ~7 years later they invoke their right to be forgotten. Implement.
The decision tree (have this memorized)
Approach
Reversibility
GDPR-compliant
Use for
Soft delete (deleted_at flag)
Reversible
❌ alone — data still exists
Internal undo. Not GDPR erase.
Anonymize (PII → tombstone)
One-way
✓
Most fields. Preserves referential integrity for audit/payroll history.
Hard delete
Irreversible
✓
Only when legal hold doesn't apply. Risk: breaks referential integrity in payroll/audit tables.
Live demo
GDPR Erasure Simulator
Architecture
User requests erasure ──▶ Erasure Request Service
│
├─▶ Verifies identity, logs request
▼
Erasure Orchestrator
│ enumerates all subject-data locations
▼ (catalog-driven)
┌─────────────────┼──────────────────┬────────────────┐
▼ ▼ ▼ ▼
Postgres employees S3 documents Kafka event log Backup snapshots
anonymize PII delete + tomb redact + reindex schedule erase
Audit log writes: "erased at 2026-05-20, by request #X, hash of data: H"
(audit log itself is exempt — Art. 17(3))
What to nail
Anonymize, don't hard-delete — preserves referential integrity (payroll history needs employee_id) while removing PII.
Replace PII fields with deterministic hashes or "DELETED USER" + a tombstone ID. Salaries/dates kept for audit.
Catalog of subject data. You can only erase what you know about. Have a registry of every table/column containing PII.
Audit log itself is exempt. GDPR Art. 17(3) — log the erasure event, including a non-reversible hash for proof.
Backups. Schedule erasure for next backup-rotation cycle (you can't reach into a 30-day-old snapshot easily).
SLAs. GDPR mandates "without undue delay" — typically 30 days max.
4
Document generator (payslips at scale)
MediumAsync / PDF
Prompt
Generate 500k payslips / contracts a month. PDFs go to S3 with signed URLs. Each is templated per tenant.
Pipeline
Payroll run completes
│
▼
DocumentRequested event (Kafka, partitioned by tenant_id)
│
▼
Renderer worker pool (stateless, autoscaled)
- loads tenant template (cached, versioned)
- merges data, renders PDF (headless Chrome / wkhtmltopdf / iText)
- writes to S3: s3://docs/{tenant}/{run}/{employee}.pdf
- emits DocumentReady event
│
▼
Notification service → email link / dashboard badge
(Signed S3 URLs, 7d expiry, RBAC at download endpoint)
Things to bring up
Template versioning: document references which template version was used (for auditability). Old payslips re-render identically.
Idempotency: dedup key (run_id, employee_id, doc_type). Re-emitted event = same S3 key = no double work.
Renderer cost: headless Chrome is slow. Pool warm browsers, batch by tenant template.
Localization: templates per (tenant, locale, doc_type). Match employee preferred language.
Retention: contracts vs payslips have different legal retention periods (7y vs 10y in DE). Lifecycle policies on S3 buckets.
5
Time-off approval workflow service
MediumWorkflow
Prompt
Employee submits time off → manager approves → HR approves → calendar synced → payroll informed → email confirmation. Design.
State machine, not event sourcing for this. The state space is small; auditors want to query "current state of X."
Side effects after commit. Persist new state THEN emit events. Otherwise you can email a confirmation that doesn't reflect reality.
Conflict detection at submit time. See the date-overlap problem above.
Cancellation: always allowed from any non-final state. Cascades to downstream (un-sync calendar, reverse payroll adjustment).
Accrual math: separate read model. Source of truth = "approved time-off entries" + "policy."
6
Reminder service at scale (extend the take-home)
MediumScheduling
Why this might come up
If they liked your take-home, the follow-up is "now make it serve 10M users across 4 EU timezones." Tests your scaling intuition without throwing away your domain model.
Scaling moves
Cache next_due_at. Don't scan all reminders every minute. Each reminder has a cached "next occurrence" → indexed.
Dispatch via Redis ZSET (sorted set scored by UTC fire timestamp). ZRANGEBYSCORE for due tasks. Sub-millisecond.
Lease-based worker claims for at-least-once delivery (see monday.com worker-pool problem if interested).
Partition by tenant_id — one slow tenant doesn't starve others.
Idempotency key per occurrence:(reminder_id, occurs_on) — SMTP retries don't double-send.
Email provider with bulk APIs (SES, Sendgrid) batched.
Decimal not float. Postgres NUMERIC, Java BigDecimal, Python Decimal. Money type wrapper.
Versioned rules. Tax brackets stored with valid_from/valid_to. Calculator picks by employee's payroll period.
Idempotency:(run_id, employee_id) as PK on results.
Snapshot inputs. Freeze employee data + rules at run start. Late edits → new corrective run.
Immutable results. No UPDATEs. Corrections = new rows referencing the old.
Replay safety. From event log + snapshots, reproduce any run bit-for-bit later.
Rounding: Banker's rounding (ROUND_HALF_EVEN) for tax. Quantize at the LAST step.
Currency conversion: rate captured at payroll-period close, stored per result.
2
Multi-tenancy patterns
Pool (shared schema, tenant_id col) — Personio's default. Lowest cost. Use Postgres RLS as safety net.
Bridge (schema per tenant) — when tenants need custom fields. Migration becomes a nightmare with 10k schemas.
Silo (DB per tenant) — only for enterprise/regulated. Highest cost, simplest backup/restore per tenant.
Tenant-aware caches: never key by entity ID alone — always (tenant_id, entity_id).
Noisy neighbor: per-tenant rate limits, connection pool quotas, query timeouts.
Cross-tenant analytics: ETL to separate warehouse. Never JOIN across tenants in OLTP.
3
Audit log design
Append-only. No UPDATEs, no DELETEs. Use partitioning by month for hot/cold tiering.
Async write path. Kafka → ClickHouse or BigQuery. Never block user writes on audit insert.
Tamper evidence: hash chain — each entry includes hash of previous + current payload.
What to log: who, when, action, resource, before-state, after-state. PII is OK in audit logs (exempt from erasure under Art. 17(3) if you have legal basis).
Retention by category: payroll-adjacent logs 7–10y; access logs typically 1y; security events sometimes longer.
Querying: "show all changes to employee X" — index on (tenant_id, resource_type, resource_id, ts DESC).