Tag Index
Tag: Spark
Spark execution behavior, workload shape, and runtime diagnostics.
- When Final Output Diff Is Not Rewrite Diff
Freezing input boundaries to separate rewrite behavior from upstream identity drift.
- Shape Parity Is Not Semantic Parity
When base group counts stay stable but derived totals drift, downstream filters can collapse the output without a failed job.
- When Python UDF Becomes the Memory Boundary
Why grouped Python logic works until workload shape stops being bounded.
- When Support Data Becomes Runtime Infrastructure
How a correct output hid repeated work in a production data pipeline.