Month Index
Published: 2026-05
Engineering notes published in 2026-05.
- Designing Multi-Tenant dbt Models Without Forking Everything
A practical pattern for keeping shared dbt transformation logic centralized while allowing tenant-specific extensions without full project forks or long-term model drift.
- From Green Runs to Trustworthy Data Movement
A production data pipeline could run green, but green did not always mean data had moved. This note traces how clearer summaries, reason codes, and fail-fast ingestion checks turned a partially migrated workflow into a more trustworthy production system.
- When Final Output Diff Is Not Rewrite Diff
Freezing input boundaries to separate rewrite behavior from upstream identity drift.
- Shape Parity Is Not Semantic Parity
When base group counts stay stable but derived totals drift, downstream filters can collapse the output without a failed job.
- Stabilizing a Data Pipeline Migration Under Changing Conditions
How validation throughput, scoped reruns, failure handling, metadata semantics, and observability shaped a production pipeline migration.
- What the OSI Model Taught Me About System Boundaries
How a networking model became a way to reason about debugging, ownership, and failure boundaries in production systems.
- When Python UDF Becomes the Memory Boundary
Why grouped Python logic works until workload shape stops being bounded.
- When Support Data Becomes Runtime Infrastructure
How a correct output hid repeated work in a production data pipeline.
- Fail Fast in Ingestion, Rerun Narrowly After Fix
Why strict required-input failures and narrow reruns should be designed together.
- File Path Metadata in Lakehouse Pipelines
Use ingestion-source metadata for lineage instead of runtime context assumptions.
- When Scheduled Parameters Override Interactive Defaults
A short checklist for debugging parameter precedence between interactive and scheduled runs.
- Inspecting Legacy SQL Before Migration
Using parsing and dependency mapping to make legacy SQL easier to review before Databricks migration.
- Rerun Scope Is Part of the Data Contract
Why rerun scope, support-data snapshots, and processing windows must be explicit in batch workflows.
- Treating Access Control as a Reviewable Workflow
How request planning, review, audit records, and human approval improve access-control operations.