Data Engineering Interview Questions

1418 tagged practice problems for data engineering interviews. SQL, Python, schema design, pipeline architecture. Each problem links to a runnable browser sandbox.

SQL · Python · Schema design · Pipeline architecture · Companion repos

Section	Count	Browse
SQL	854	datadriven.io/sql-interview-questions
Python	388	datadriven.io/python-interview-questions
Schema design	56	datadriven.io/data-modeling-interview-questions
Pipeline architecture	120	datadriven.io/data-pipeline-interview-questions
Total	1418

Every problem runs in a browser sandbox with the schema preloaded. No local setup. Each question is tagged with difficulty, what it tests, and the common trap.

SQL (854 problems)

Topics: joins, aggregating, window functions, filtering, dates, conditional aggregation, CTEs, performance reasoning. Topic browser at datadriven.io/sql-interview-questions.

Problem	Difficulty	Tests	Trap
10 Lowest Uptime Services	Easy	TOP N with ties	`LIMIT 10` drops tied rows
2FA Confirmation Rate	Easy	Conditional aggregation	Divide by zero
2nd Most Common Content Type	Easy	Tie breaking	`LIMIT 1 OFFSET 1` ignores ties
30 Day Page View Counts	Easy	Date filtering	Timezone boundaries
7 Day Onboarding Conversion	Medium	Funnel analysis	Anchoring on the wrong event
7 Check Rolling Average	Medium	Rolling window	`ROWS` vs `RANGE` when days are missing
Active Users by Month	Hard	Cohort logic	Double counting users active in multiple months

Window functions drill

Window functions appear in most senior DE SQL screens. Timed practice at datadriven.io/sql-window-functions-practice.

Python (388 problems)

DE Python is data manipulation, not LeetCode. Common patterns: chunking, sessionization, hash partitioning, interval merging, dedup with tie breaking, streaming aggregation, retries with backoff, schema evolution. Browse at datadriven.io/python-interview-questions.

Problem	Difficulty	Pattern
Batch Records	Easy	Chunking iterables
Column Sum	Easy	Dict aggregation
Activity Time Ledger	Medium	Interval merging
Batch Partitioner	Medium	Hash bucketing
Batch With Metadata	Medium	Stateful iteration
Caesar Shift Check	Hard	String transforms
Character Occurrence Map	Hard	Counting tradeoffs

Schema design (56 problems)

Senior loops are won here. Reward: pick the right grain for fact tables, defend an SCD type, validate the schema with sample queries. Browse at datadriven.io/data-modeling-interview-questions.

Problem	Tests
A/B Experiment Assignment Schema	SCD type 2, sticky bucketing
Customer Address History	Effective dates, history preservation
Insurance Claims Lifecycle	State machine modeling
Clickstream and Session Schema	Sessionization, late events
E Commerce Supply Chain Tracking	Multi entity tracking
Loan Management Schema	Bridge tables, party roles
Cloud File Storage Metadata Schema	Recursive hierarchies
Financial Trading Warehouse	Time series, late arriving facts
Content Engagement Data Model	Fact table grain
B2B Invoicing Data Model	Many to many with attributes

Pipeline architecture (120 problems)

End to end design questions. Use the eight beat framework on every one. Browse at datadriven.io/data-pipeline-interview-questions.

Top case studies

Case study	Domain
Card Transaction Streaming Pipeline	Real time, exactly once
Cellular Connectivity and App Log Data Warehouse	High cardinality
AWS Pipeline Auto Scaling for Variable Volume	Cost optimization
Connected Vehicle Telemetry Pipeline	High volume IoT
Capital Markets Intraday Risk Pipeline	Regulatory lineage
Database Replication and Schema Normalization Pipeline	CDC
Cost Optimized Clickstream Data Lake	Storage tradeoffs
Databricks Pipeline with Spark Performance Optimization	Spark internals

How many problems to be ready

About 100 medium and 25 hard, distributed across the four sections. Past that, returns diminish. Below that, gaps remain.

Companion repos

data-engineering-interview-handbook. The flagship handbook with chapter by chapter coverage.
data-engineer-interview-handbook. 7 day sprint version.
awesome-data-engineering-interviews. The DataDriven 75 focused subset.
awesome-data-engineering-interview. Curated resource list.
system-design-for-data-engineers. 120 long form pipeline case studies.
data-engineer-interview-prep. 8 week structured practice schedule.
data-engineering-cheatsheet. One page recall reference.

Contributing

Open an issue with: question text, schema, expected output, what it tests, the common trap. Reviewed and added with attribution.

License

CC BY-SA 4.0. Sandboxes hosted at datadriven.io.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Data Engineering Interview Questions

SQL (854 problems)

Top problems to know cold

Window functions drill

Python (388 problems)

Top problems

Schema design (56 problems)

Top problems

Pipeline architecture (120 problems)

Top case studies

How many problems to be ready

Companion repos

Contributing

License

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

Data Engineering Interview Questions

SQL (854 problems)

Top problems to know cold

Window functions drill

Python (388 problems)

Top problems

Schema design (56 problems)

Top problems

Pipeline architecture (120 problems)

Top case studies

How many problems to be ready

Companion repos

Contributing

License