Projects | Xiaoyu Yan

Projects & Case Studies

Selected field notes from production data systems: migration risk, validation boundaries, runtime behavior, and operational reliability.

Selected Work

Case notes and project directions.

DATA PIPELINE RELIABILITY

Stabilizing a Data Pipeline Migration Under Changing Conditions

How validation throughput, scoped reruns, failure handling, metadata semantics, and observability shaped a production pipeline migration.

production reliability · rerun semantics · cloud workflows

Read →

RUNTIME ARTIFACT GOVERNANCE

Runtime Support Artifact Governance

A support file looked like reference data, but it controlled execution. I traced repeated processing to drift between reviewed source artifacts and runtime artifacts, then reframed the file as workload-control infrastructure requiring validation and release-boundary ownership.

data contracts · runtime artifacts · workload control

Read →

SPARK EXECUTION BOUNDARY

Memory-Bound Python Aggregation

A grouped Python aggregation worked for normal slices but failed on a large skewed workload. The investigation clarified when Python UDFs are practical, when group cardinality becomes the memory boundary, and why a feature-flagged Spark-native fallback can be safer than a universal rewrite.

Spark · Python UDFs · workload skew

Read →

MULTI-TENANT TRANSFORMATIONS

Multi-Tenant dbt Architecture

A multi-tenant transformation system needs a careful boundary between shared models and tenant-specific overrides. The architecture focused on preserving a common transformation layer while allowing controlled customization where business logic diverged.

dbt · multi-tenant design · platform maintainability

Read →

Applied Side Projects

Small systems used to test ideas in workflow design and diagnostics.

SQL MIGRATION · REVIEW HARNESS

AgenticSqlConverter

A local-first SQL migration review harness for making legacy query structure inspectable before rewrite automation.

It parses SQL, extracts CTE structure, builds dependency graphs, runs parse-readiness checks, and surfaces deterministic migration-risk signals for review.

SQL migration · CTE graphing · validation checks

Browse AgenticSqlConverter →

Side Project · Birding Travel · Nature Data

BirdingBuddy / Field Notes

Production web application for destination-aware birding planning.

BirdingBuddy connects recent observations, habitat context, seasonal signals, and recommendation logic to help birders plan where and when to look.

eBird Data · Seasonality · Decision Support · Trip Planning

Visit BirdingBuddy →