Engineering Notes | Xiaoyu Yan

Notes: Engineering Notes · Study Notes · Debug Archive

Field notes from production systems, migration work, and graduate study.

Engineering Notes

Production data systems, migration reliability, validation, and observability.

Engineering Note · Data Platform Architecture

Designing Multi-Tenant dbt Models Without Forking Everything

A practical pattern for keeping shared dbt transformation logic centralized while allowing tenant-specific extensions without full project forks or long-term model drift.

2026-05 Data Platform Data Governance Data Contracts Metadata

Engineering Note · Data Movement Reliability

From Green Runs to Trustworthy Data Movement

A production data pipeline could run green, but green did not always mean data had moved. This note traces how clearer summaries, reason codes, and fail-fast ingestion checks turned a partially migrated workflow into a more trustworthy production system.

2026-05 Data Systems Reliability Migration Observability

Case Study · Migration & Reliability

Stabilizing a Data Pipeline Migration Under Changing Conditions

How validation throughput, scoped reruns, failure handling, metadata semantics, and observability shaped a production pipeline migration.

2026-05 Data Platform Databricks Data Migration Pipeline Reliability Observability

Engineering Note · Migration & Validation

Shape Parity Is Not Semantic Parity

When base group counts stay stable but derived totals drift, downstream filters can collapse the output without a failed job.

2026-05 Data Reliability Databricks Spark Data Migration Data Validation Systems Thinking

Engineering Note · Spark Rewrite & Validation

When Final Output Diff Is Not Rewrite Diff

Freezing input boundaries to separate rewrite behavior from upstream identity drift.

2026-05 Data Reliability Spark Data Migration Data Validation Systems Thinking

Engineering Note · Spark Execution Boundary

When Python UDF Becomes the Memory Boundary

Why grouped Python logic works until workload shape stops being bounded.

2026-05 Spark Data Reliability Pipeline Reliability

Engineering Note · Runtime Infrastructure

When Support Data Becomes Runtime Infrastructure

How a correct output hid repeated work in a production data pipeline.

2026-05 Data Infrastructure Spark Data Reliability Data Contracts

Engineering Note · Migration & Reliability

Rerun Scope Is Part of the Data Contract

Why rerun scope, support-data snapshots, and processing windows must be explicit in batch workflows.

2026-05 Data Contracts Batch Pipelines Rerun Scope Data Reliability

Engineering Note · Governance & Access Workflows

Treating Access Control as a Reviewable Workflow

How request planning, review, audit records, and human approval improve access-control operations.

2026-05 Data Governance Access Control Auditability Workflow Automation

Engineering Note · Developer Tooling

Inspecting Legacy SQL Before Migration

Using parsing and dependency mapping to make legacy SQL easier to review before Databricks migration.

2026-05 SQL Modernization Databricks CTEs Dependency Graph Developer Tooling

Engineering Note · SQL Modernization · Validation

Making SQL Migration Review Deterministic Before Rewriting Anything

Before AI rewrites legacy SQL, make review inspectable first: deterministic payloads, graphs, and readiness cues—not assumption-heavy diffs alone.

2026-05 SQL Modernization Data Migration Developer Tooling Data Validation AI-Assisted Engineering

Study Notes

Coursework reframed through systems thinking.

Study Note · Network Management

What the OSI Model Taught Me About System Boundaries

How a networking model became a way to reason about debugging, ownership, and failure boundaries in production systems.

2026-05 Network Management Systems Thinking Data Reliability

Debug Archive

Short debugging notes and checklists.

Debug Note · Migration & Reliability

Fail Fast in Ingestion, Rerun Narrowly After Fix

Why strict required-input failures and narrow reruns should be designed together.

2026-05 Ingestion Failure Handling Production Debugging Pipeline Reliability

Debug Note · Governance & Access Workflows

File Path Metadata in Lakehouse Pipelines

Use ingestion-source metadata for lineage instead of runtime context assumptions.

2026-05 Lakehouse Metadata Auto Loader Data Lineage

Debug Note · Developer Tooling

When Scheduled Parameters Override Interactive Defaults

A short checklist for debugging parameter precedence between interactive and scheduled runs.

2026-05 Databricks Jobs Workflow Parameters Debugging Runtime Configuration