Engineering Note · Runtime Infrastructure

When Support Data Becomes Runtime Infrastructure

How source-of-truth drift turned a support file into repeated execution.

Details are generalized and sanitized to preserve confidentiality while keeping the engineering lesson accurate.

A duplicate support-data row did not duplicate the output. It turned one logical slice into multiple physical executions.

The file looked like support data: a small participant lookup maintained outside the main transactional dataset. In practice, it behaved as a workload-control input.

I ran into this pattern while investigating a scoped rerun in a production batch pipeline I had inherited. The immediate question was narrow: a large slice had previously hit a memory boundary in a grouped Python aggregation, and I added execution-path logging to confirm which aggregation path the rerun was using.

The log was meant to answer one question. Instead, it showed repeated processing of the same logical slice.

Source-of-truth drift

At first, the behavior looked like a retry, a logging artifact, or a path-resolution issue. The confusing part was that the reviewed source copy of the support file showed one row for the slice, while the runtime job behaved as if multiple rows existed.

The explanation was source-of-truth drift. The job was not reading the copy I first inspected. It was reading a runtime support artifact in object storage. That runtime artifact still contained multiple rows for the same logical slice, while the reviewed copy had already been cleaned up.

That changed the investigation. The issue was not simply duplicate support data. The deeper issue was that a runtime support artifact could drift from the reviewed source copy, and the pipeline had no runtime contract enforcing the intended logical unit of work.

Output-idempotent, compute-non-idempotent

The runtime artifact controlled execution. Each physical row became one unit of work in the pipeline's processing loop. If the same logical slice appeared multiple times, the pipeline read, validated, parsed, transformed, and wrote the same slice multiple times.

The output stayed correct because the write was partition-scoped. But the real defect was earlier: the pipeline was scheduling work from an unvalidated runtime artifact.

The output was idempotent, but the execution was not free. The pipeline still re-paid the cost of reading, validation, transform, and write work more than once.

The functional unit of work

The duplicate participant rows were subtle because they were not all byte-identical. Some descriptive fields differed or were missing. But for this transform, those fields were not part of the execution semantics. The pipeline only consumed a small functional key. From the pipeline's point of view, the rows represented the same execution unit.

That meant the inherited execution loop was using a physical support-data row as the unit of work, while the logical unit should have been defined by the functional key consumed by the transform.

Once a support file determines which slices run, how many times they run, which partitions are written, and how much data is scanned, it is no longer passive reference data. It has become a workload-control input.

Runtime infrastructure

A workload-control input needs stronger guarantees than "the file exists and has the expected columns." It needs uniqueness rules, conflict detection, runtime validation, and traceable deployment.

The architectural response was to move the support file into the same release boundary as the pipeline code. Instead of relying on an independently maintained runtime copy in object storage, the pipeline would read a deployed local artifact produced from the reviewed source.

The important change was not the storage location itself. It was the release boundary: the runtime input became version-controlled, reviewable, and deployed together with the code that consumed it.

The immediate fix boundary was small: deduplicate functionally equivalent rows, log the duplicate count, and fail fast if duplicates conflicted on fields actually consumed by the transform.

Reusable pattern

The reusable pattern is to treat execution-control support data as runtime infrastructure: define the logical unit of work by the functional key the transform consumes, validate uniqueness before scheduling work, fail fast on conflicting duplicates, and deploy runtime support artifacts from reviewed source artifacts.

Takeaway

Not all production defects appear as wrong outputs. Some appear as unexpected repetition, excessive runtime, memory pressure, or operational cost.

In this case, correctness hid the defect. Observability exposed the inherited contract gap. The deeper issue was not the duplicate rows themselves. It was that a support file had become part of the execution plan before it had a runtime contract.