Engineering Note · SQL Migration & Validation

Making SQL Migration Review Deterministic Before Rewriting Anything

Before AI rewrites legacy SQL, migration review needs one simpler answer first: what exactly are we looking at?

This note stays generalized so the reasoning reads clearly without relying on situational particulars.

Opening

Legacy SQL may look readable while hiding dependency shape, dialect assumptions, and target-readiness risks. Once parsing, graph structure, rewrite readiness, and risk judgment collapse into one opaque step, reviewers lose the ability to compare evidence. They argue from memory, preference, or one-off diffs instead of rerunnable artifacts.

Core thesis: Before AI rewrites SQL, make the source structure inspectable and reproducible.

Problem

Assistive rewriting does not remove the trust problem. A generated query can look plausible while carrying hidden assumptions from prompts, examples, or incomplete context. Reviewers still need deterministic context: parse health, CTE structure, dependency graph, and readiness cues they can rerun.

Principle

Make the migration surface inspectable before rewrite automation.

Method

  • Freeze the input boundary before interpreting outcomes.
  • Parse first; failed or partial parses should be visible.
  • Extract structure: CTEs, dependencies, cyclic hints, and parse-readiness notes.
  • Classify deterministic risks with explicit rule sets.
  • Keep “Can we explain this SQL?” separate from “Should we rewrite it?”

AgenticSqlConverter

AgenticSqlConverter is a validation-first SQL migration toolkit. It currently implements a narrow, offline slice of that workflow: SQLGlot parsing, CTE extraction, lightweight dependency graphing, parse-readiness checks, deterministic readiness signals, and reproducible JSON/Markdown summaries.

The bundled examples/migration_case/ demo shows the workflow on synthetic SQL fixtures. It is not a turnkey dialect converter; rewrite functionality belongs in later layers with separate tests, contracts, and review boundaries.

Why deterministic artifacts matter

Deterministic artifacts make review repeatable. They let teams compare structural payloads, run snapshot checks in CI, and discuss migration risk from stable evidence rather than screenshots or memory.

They also create a safer boundary for AI-assisted work. Models can propose transformations, but reviewers still need to know what changed, which assumptions moved, and whether the target surface is ready—names, schemas, referenced tables, and compatibility are separate validation layers.

Boundaries

This note does not claim:

  • end-to-end cross-dialect conversion,
  • live analytic validation,
  • semantic equivalence proof,
  • or autonomous migration approval.

Those can be future layers. The first useful layer is deterministic inspectability.

Closing

AI can help propose transformations, but migration trust starts with deterministic evidence. Before asking tooling to rewrite SQL wholesale, expose baseline structure and readiness cues clearly enough for humans and automated checks to reason about.

Related project: AgenticSqlConverterhttps://github.com/korsakowii/AgenticSqlConverter

← Back