Data Platform Engineer
I work on production data systems for financial platforms, where migration, validation, governance, and reliability meet real operational constraints.
This site collects field notes from that boundary: correctness, runtime, data ownership, and system design in practice.
FEATURED CASE STUDY
Stabilizing a Data Pipeline Migration Under Changing Conditions
Long validation cycles, output mismatches, and limited production visibility made the migration less about moving code and more about making each mismatch explainable.
Read →Engineering Themes
Where these field notes focus.
Runtime Reliability
Debugging production workflows where correctness, reruns, and operational cost interact.
02Data Contracts
Turning implicit assumptions in support files, schemas, and runtime artifacts into visible contracts.
03Spark Execution Boundaries
Understanding where Python, Spark-native execution, skew, and materialization costs meet.
04Migration & Modernization
Moving legacy workflows toward cloud-native, version-controlled, and observable data systems.
Notes
Recent notes on production data movement, Spark migration boundaries, and rewrite validation.
From Green Runs to Trustworthy Data Movement
A production data pipeline could run green, but green did not always mean data had moved.
Engineering Note · Spark Rewrite & ValidationWhen Final Output Diff Is Not Rewrite Diff
Freezing input boundaries to separate rewrite behavior from upstream identity drift.
Engineering Note · Spark Migration SemanticsShape Parity Is Not Semantic Parity
How null-safe arithmetic exposed the difference between matching row counts and matching meaning.