Ark Evaluation Packet

RTL Historical Fix Replay Executive Summary v0.1

RTL Historical Fix Replay Executive Summary v0.1

Buyer-Safe Claim

Ark historical-fix replay evaluates whether ranked RTL review targets overlap

with regions that were later modified in public RTL repair commits. This is

repair-region overlap evidence, not bug detection, not formal signoff, and not

a claim that Ark would have found the original issue independently.

Result Snapshot

Dataset: HWE-bench public RTL repair smoke set, HDL-FixBench-shaped.

MetricResult
Public repair cases analyzed15
Projects coveredIbex, CVA6, OpenTitan
Cases with ranked Ark targets13
Top-3 exact-signal overlap12 / 13
Top-1 blind-structural overlap12 / 13
Random baseline mean top-3 rate0.3219
Median review-compression ratio349.3333x
Median case runtime0.2013s
Max case runtime3.7657s
Misses / no-target cases2

Interpretation

In this initial public historical repair smoke set, Ark usually produced ranked

review targets that overlapped identifiers in the eventual repair region. The

exact-signal top-3 result is compared against a deterministic random-signal

baseline over the same candidate designs.

The blind-structural check removes repaired identifiers from target tokens and

asks whether the remaining ranked-target neighborhood still carries structural

context. This helps guard against the objection that the result is only name

echo from the diff.

Misses

Two cases produced no ranked targets. These are treated as useful diligence

signals, not hidden failures:

context.

Call-Safe Language

We have started retrospective validation on public RTL repair history. In a

15-case HWE-bench-shaped smoke set across Ibex, CVA6, and OpenTitan, Ark

produced ranked targets in 13 cases. Top-3 exact-signal overlap with repaired

identifiers was 12/13, compared with a random baseline mean top-3 rate of

0.322. Median review compression was about 349x. We treat this as repair-region

overlap evidence, not bug detection or signoff. In this local smoke run, median

case runtime was about 0.2 seconds.

Boundaries

review.