Oxidant — Agentic TypeScript-to-Rust Translation Harness

Why Oxidant exists

There is no mature tool for translating a TypeScript codebase to Rust. Naive file-by-file LLM prompting produces code that compiles in isolation but fails at integration — type boundaries collide, ownership is invented rather than derived, and algorithms get silently simplified.

Oxidant's approach is borrowed from the academic TS→Rust literature (ORBIT, ENCRUST, SACTOR): extract a complete dependency graph first, translate in topological order so every dependency is already converted when a node is processed, verify each snippet with the real Rust compiler before accepting it, and separate correctness from idiomaticity into distinct phases.

The first test corpus is msagl-js — Microsoft's Automatic Graph Layout library in TypeScript. Before Oxidant existed, this translation was done manually: 236 commits of painstaking hand-translation, acting as the agentic harness by hand. Oxidant automates that process.

The four-phase pipeline

The pipeline is sequential at the top level. Each phase produces artifacts consumed by the next. Phase B's internal loop is highly iterative — a single node may be attempted multiple times at escalating model tiers before being accepted or queued for human review.

Phase A

Analysis & Preparation

ts-morph AST extraction → idiom detection → topological sort → Haiku tier classification → compilable Rust skeleton with todo!() stubs.

Phase B

Translation Loop

LangGraph state graph processes nodes in topological order. Each node is translated by Claude Code, verified with cargo check, and retried at higher model tiers on failure.

Phase C

Idiomatic Refinement

Runs cargo clippy --all-targets -W clippy::pedantic. Mechanical warnings are auto-fixed. Structural and human-judgment warnings are reported for review.

Phase D

Integration & Verification

Full cargo build --release. Integration errors (type boundary mismatches between assembled modules) are isolated and flagged for re-translation.

The conversion manifest

conversion_manifest.json is the central artifact shared by all phases. Every translatable unit in the source codebase — class, method, function, interface, enum, type alias — becomes a node in the manifest with full metadata.

4,820

total nodes (msagl-js)

420

auto-converted structural

1,196

haiku-tier nodes

3,476

sonnet-tier nodes

148

opus-tier nodes

Field	Type	Description
node_id	string	Unique key: `module__file__ClassName__methodName`
source_file	string	Relative TypeScript path, e.g. `modules/drawing/src/color.ts`
source_text	string	Full TypeScript source of this node (up to 2000 chars sent to Claude)
node_kind	enum	`class \| method \| free_function \| constructor \| enum \| interface \| type_alias`
type_dependencies	string[]	Node IDs this node references by type — forms the dependency graph
call_dependencies	string[]	Node IDs this node calls at runtime
topological_order	int	Processing priority — lower is processed first
cyclomatic_complexity	int	Number of independent paths through the code
idioms_needed	string[]	TS patterns requiring special Rust handling (see Idiom Detection)
tier	haiku \| sonnet \| opus	Which Claude model to use for translation
status	enum	`not_started \| in_progress \| converted \| human_review`
snippet_path	string?	Path to the saved `.rs` snippet once converted
attempt_count	int	Number of translation attempts made
last_error	string?	Last cargo check error for debugging and retry context

Phase A: Analysis pipeline

A1 — AST Extraction (ts-morph)

ts-morph was chosen over raw TypeScript compiler API or Babel specifically for its cross-file type resolution. When a method in GeomGraph accepts a parameter of type GeomNode from another module, ts-morph resolves that type fully and records it as a dependency edge in the manifest. This is what makes topological ordering accurate.

Every translatable unit is extracted with its full source text, parameter types, return type, and both type-level and runtime call dependencies. The output is a single conversion_manifest.json.

A2 — Idiom Detection (ts-morph)

A second ts-morph pass scans each node's AST for 14 patterns known to require non-trivial Rust translation. Each detected idiom is stored in idioms_needed[] on the node and used in Phase B to inject relevant translation guidance into the prompt.

A3 — Topological Sort

Nodes are sorted so that when a node is translated, every node it depends on has already been translated. The resulting Rust signatures of dependencies are available to the agent, eliminating guesswork about what the Rust API looks like. This is the single biggest driver of translation quality.

A4 — Tier Classification

Each node is assigned a translation tier — haiku, sonnet, or opus — based on cyclomatic complexity, idiom count, and node kind. Simpler nodes use cheaper, faster models. The tier is also the escalation path: if Haiku fails after 3 attempts, the node is re-tried at Sonnet, then Opus.

A5 — Rust Skeleton Generation

A Python script generates a complete, compilable Rust project. Every module exists. Every struct has its fields. Every function has a todo!("OXIDANT: not yet translated — {node_id}") body. The skeleton must pass cargo build before Phase B begins. The count of remaining todo!() macros is the primary progress metric.

Class hierarchy handling

msagl-js has 101 classes that use extends, forming 22 distinct parent-child hierarchies. The skeleton generator classifies each hierarchy before emitting any Rust, because the correct Rust representation depends on why the hierarchy exists — not just that it exists.

Classification is hardcoded in KNOWN_HIERARCHIES inside analysis/hierarchy.py, validated against a manual Rust port of msagl-js at Routers/msagl-rust. It is not a heuristic — every hierarchy was checked by hand.

Category A — Discriminated unions → `pub enum`

These hierarchies have subclasses that add 0–3 unique fields each. TypeScript code dispatches on them using instanceof checks. In Rust, these must be enums — a flat struct has no runtime discriminant to match on.

Classified as enum: SweepEvent, VertexEvent, BasicVertexEvent, BasicReflectionEvent, Layer, OptimalPacking

The skeleton emits a single pub enum in the base class's .rs file. Each child class's fields become named fields of a variant. Child class .rs files emit no struct — they are folded into the parent enum.

TypeScript (11 separate classes)

class SweepEvent { }
class AxisCoordinateEvent extends SweepEvent {
  site: Point;
}
class ConeClosureEvent extends SweepEvent {
  coneToCLose: Cone;
  site: Point;
}
class VertexEvent extends SweepEvent { }
class OpenVertexEvent extends VertexEvent {
  vertex: PolylinePoint;
}

Rust skeleton (one enum)

pub enum SweepEvent {
    AxisCoordinateEvent {
        site: Rc<RefCell<crate::point::Point>>,
    },
    ConeClosureEvent {
        cone_to_close: Rc<RefCell<crate::cone::Cone>>,
        site: Rc<RefCell<crate::point::Point>>,
    },
    VertexEvent(crate::vertex_event::VertexEvent),
    // ...
}

Sub-hierarchies that are themselves enum bases (e.g. VertexEvent is a child of SweepEvent but also has its own children) are emitted as their own pub enum in their module and referenced as a tuple variant from the parent enum.

When method signatures in other modules reference a now-folded child type (e.g. PortObstacleEvent), a type redirect table (_enum_child_redirect) maps that name to the parent enum type (crate::sweep_event::SweepEvent) so cross-module method signatures still compile.

Category B — Behavior hierarchies → struct composition

These hierarchies have subclasses that are large independent classes sharing a base. Each subclass gets its own pub struct with pub base: ParentType as the first field. When agents convert methods that call super.method(), they write self.base.method().

Classified as struct: Algorithm (24 subclasses), Attribute (4), SegmentBase, LineSweeperBase, GeomObject, BasicGraphOnEdges, Entity, Port, DrawingObject, SvgViewerObject, ObstacleSide, BasicObstacleSide, ConeSide, VisibilityEdge, KdNode, Packing

TypeScript

class SplineRouter extends Algorithm {
  continueOnOverlaps: boolean;
  obstacleCalculator: ShapeObstacleCalculator;

  run() {
    super.run();
    // ...
  }
}

Rust skeleton

pub struct SplineRouter {
    // base field always first, cross-module path
    pub base: crate::algorithm::Algorithm,
    pub continue_on_overlaps: bool,
    pub obstacle_calculator: Rc<RefCell<
        crate::shape_obstacle_calculator
        ::ShapeObstacleCalculator>>,
}

// agent writes: self.base.run();

External parents

If a class extends a type that is not in the manifest corpus (e.g. browser built-ins like EventSource), the skeleton emits a comment rather than a field:

pub struct MyClass {
    // NOTE: extends EventSource (external — not in corpus)
    pub some_field: String,
}

Why this matters: agents have no memory between invocations. Without a hierarchy-aware skeleton, every agent in a family like SweepEvent independently invents its own representation — producing inconsistent, incompatible Rust. The scaffold locks in the right structure before any agent touches the code.

Idiom detection & translation guidance

TypeScript has patterns with no direct Rust equivalent. Oxidant detects these statically in Phase A and injects targeted translation guidance into Phase B prompts. Each node only receives guidance for the idioms present in its own source — no noise, no irrelevant context.

mutable_shared_state 1,608

null_undefined 1,171

dynamic_property_access 945

static_members 920

number_as_index 519

closure_capture 441

array_method_chain 248

map_usage 242

set_usage 220

generator_function 186

class_inheritance 118

union_type 34

optional_chaining 13

async_await 7

Idiom guidance lives in idiom_dictionary.md — a versioned markdown file with one section per idiom. Sections are keyed to idiom names so context.py can look them up at translation time. Adding a new idiom pattern means adding one entry to detect_idioms.ts and one section to the dictionary.

Example: null_undefined → Option<T>

// TypeScript
const node = graph.findNode(id);
if (node == null) return null;
return node.label ?? "untitled";

// Rust (translated output)
let node = graph.borrow().find_node(id);
if node.is_none() { return None; }
let node = node.unwrap();
return Some(node.borrow().label
    .clone()
    .unwrap_or_else(|| "untitled".to_string()));

Example: mutable_shared_state → Rc<RefCell<T>>

// TypeScript
class GeomGraph {
  boundingBox: Rectangle;
  constructor(graph: Graph) {
    this.boundingBox = new Rectangle();
  }
}

// Rust (skeleton pattern)
pub struct GeomGraph {
    pub bounding_box: Rectangle,
    base: GeomNode,
}
// Shared ref: Rc<RefCell<GeomGraph>>
// Mutation: node.borrow_mut().bounding_box = r;

Phase B: The translation loop

Phase B is a LangGraph StateGraph that iterates over the manifest in topological order. Each iteration picks one node, builds a context-rich prompt, calls Claude Code as a subprocess, verifies the output, and either accepts it or retries.

Prompt construction

Every prompt is assembled fresh for each attempt by context.py:

The full TypeScript source text of the node
The Rust function signature from the skeleton (not invented — read from disk)
Rust snippets of already-converted dependencies (the actual translated code they call into)
Relevant idiom translation guidance from idiom_dictionary.md
On retries: the exact cargo check error with line numbers

Subscription auth — no API key

Phase B calls claude --print --output-format json <prompt> as a subprocess, intentionally stripping ANTHROPIC_API_KEY from the environment before invoking. This forces Claude Code to use the user's Max subscription rather than billing to the API account. The model used is determined by the node's tier, not overridden at the subprocess call.

Tier escalation

Each node starts at its assigned tier (haiku / sonnet / opus). If the snippet fails verification and retries are exhausted, the node is re-tried at the next tier. A sonnet-tier node that fails 4 attempts is escalated to opus for 5 more attempts. If opus is exhausted, the node goes to review_queue.json.

Verification: three layers

Every translated snippet passes three checks before being accepted, ordered cheapest-first. The first failure short-circuits — no point running cargo check if the snippet still contains todo!().

Why cargo check instead of a Rust AST?

The TypeScript side uses ts-morph (full compiler-backed AST). The Rust side uses the compiler directly — no Rust AST library like syn. This is intentional: cargo check performs full type checking including borrow checker analysis, and the skeleton means every other unconverted function is still present as a todo!() stub. The new snippet is type-checked against the real skeleton API, not a mock. Errors include exact line numbers and types — far more useful than anything a static AST check could produce.

The inject-and-restore pattern

For each cargo check verification, the snippet is injected into the skeleton at the exact todo!() marker for that node, cargo check runs on the whole project, then the original todo!() is restored — even if the check fails. The skeleton is always left in a compilable state after verification.

CLI reference

# Full Phase A (extract AST, detect idioms, sort, classify, generate skeleton)
oxidant phase-a

# Phase A without API tier classification (uses heuristic rules instead)
oxidant phase-a --heuristic-tiers

# Reclassify tiers on an existing manifest (no re-running Phase A)
oxidant classify-tiers --heuristic

# Phase B — translate all nodes
oxidant phase-b

# Phase B smoke test — stop after N nodes
oxidant phase-b --max-nodes 5

# Phase B dry run — print the first node's prompt, no API calls
oxidant phase-b --dry-run

# Phase C — Clippy refinement pass
oxidant phase-c

# Phase D — full build + integration error isolation
oxidant phase-d
oxidant phase-d --manifest conversion_manifest.json

TypeScript → Rust at repository scale