1418 tagged practice problems for data engineering interviews. SQL, Python, schema design, pipeline architecture. Each problem links to a runnable browser sandbox.
SQL · Python · Schema design · Pipeline architecture · Companion repos
| Section | Count | Browse |
|---|---|---|
| SQL | 854 | datadriven.io/sql-interview-questions |
| Python | 388 | datadriven.io/python-interview-questions |
| Schema design | 56 | datadriven.io/data-modeling-interview-questions |
| Pipeline architecture | 120 | datadriven.io/data-pipeline-interview-questions |
| Total | 1418 |
Every problem runs in a browser sandbox with the schema preloaded. No local setup. Each question is tagged with difficulty, what it tests, and the common trap.
Topics: joins, aggregating, window functions, filtering, dates, conditional aggregation, CTEs, performance reasoning. Topic browser at datadriven.io/sql-interview-questions.
| Problem | Difficulty | Tests | Trap |
|---|---|---|---|
| 10 Lowest Uptime Services | Easy | TOP N with ties | LIMIT 10 drops tied rows |
| 2FA Confirmation Rate | Easy | Conditional aggregation | Divide by zero |
| 2nd Most Common Content Type | Easy | Tie breaking | LIMIT 1 OFFSET 1 ignores ties |
| 30 Day Page View Counts | Easy | Date filtering | Timezone boundaries |
| 7 Day Onboarding Conversion | Medium | Funnel analysis | Anchoring on the wrong event |
| 7 Check Rolling Average | Medium | Rolling window | ROWS vs RANGE when days are missing |
| Active Users by Month | Hard | Cohort logic | Double counting users active in multiple months |
Window functions appear in most senior DE SQL screens. Timed practice at datadriven.io/sql-window-functions-practice.
DE Python is data manipulation, not LeetCode. Common patterns: chunking, sessionization, hash partitioning, interval merging, dedup with tie breaking, streaming aggregation, retries with backoff, schema evolution. Browse at datadriven.io/python-interview-questions.
| Problem | Difficulty | Pattern |
|---|---|---|
| Batch Records | Easy | Chunking iterables |
| Column Sum | Easy | Dict aggregation |
| Activity Time Ledger | Medium | Interval merging |
| Batch Partitioner | Medium | Hash bucketing |
| Batch With Metadata | Medium | Stateful iteration |
| Caesar Shift Check | Hard | String transforms |
| Character Occurrence Map | Hard | Counting tradeoffs |
Senior loops are won here. Reward: pick the right grain for fact tables, defend an SCD type, validate the schema with sample queries. Browse at datadriven.io/data-modeling-interview-questions.
| Problem | Tests |
|---|---|
| A/B Experiment Assignment Schema | SCD type 2, sticky bucketing |
| Customer Address History | Effective dates, history preservation |
| Insurance Claims Lifecycle | State machine modeling |
| Clickstream and Session Schema | Sessionization, late events |
| E Commerce Supply Chain Tracking | Multi entity tracking |
| Loan Management Schema | Bridge tables, party roles |
| Cloud File Storage Metadata Schema | Recursive hierarchies |
| Financial Trading Warehouse | Time series, late arriving facts |
| Content Engagement Data Model | Fact table grain |
| B2B Invoicing Data Model | Many to many with attributes |
End to end design questions. Use the eight beat framework on every one. Browse at datadriven.io/data-pipeline-interview-questions.
| Case study | Domain |
|---|---|
| Card Transaction Streaming Pipeline | Real time, exactly once |
| Cellular Connectivity and App Log Data Warehouse | High cardinality |
| AWS Pipeline Auto Scaling for Variable Volume | Cost optimization |
| Connected Vehicle Telemetry Pipeline | High volume IoT |
| Capital Markets Intraday Risk Pipeline | Regulatory lineage |
| Database Replication and Schema Normalization Pipeline | CDC |
| Cost Optimized Clickstream Data Lake | Storage tradeoffs |
| Databricks Pipeline with Spark Performance Optimization | Spark internals |
About 100 medium and 25 hard, distributed across the four sections. Past that, returns diminish. Below that, gaps remain.
- data-engineering-interview-handbook. The flagship handbook with chapter by chapter coverage.
- data-engineer-interview-handbook. 7 day sprint version.
- awesome-data-engineering-interviews. The DataDriven 75 focused subset.
- awesome-data-engineering-interview. Curated resource list.
- system-design-for-data-engineers. 120 long form pipeline case studies.
- data-engineer-interview-prep. 8 week structured practice schedule.
- data-engineering-cheatsheet. One page recall reference.
Open an issue with: question text, schema, expected output, what it tests, the common trap. Reviewed and added with attribution.
CC BY-SA 4.0. Sandboxes hosted at datadriven.io.