Ark Evaluation Packet

RTL Historical Fix Replay Executive Summary v0.1

Buyer-Safe Claim

Ark historical-fix replay evaluates whether ranked RTL review targets overlap

with regions that were later modified in public RTL repair commits. This is

repair-region overlap evidence, not bug detection, not formal signoff, and not

a claim that Ark would have found the original issue independently.

Dataset: HWE-bench public RTL repair smoke set, HDL-FixBench-shaped.

Metric	Result
Public repair cases analyzed	15
Projects covered	Ibex, CVA6, OpenTitan
Cases with ranked Ark targets	13
Top-3 exact-signal overlap	12 / 13
Top-1 blind-structural overlap	12 / 13
Random baseline mean top-3 rate	0.3219
Median review-compression ratio	349.3333x
Median case runtime	0.2013s
Max case runtime	3.7657s
Misses / no-target cases	2

In this initial public historical repair smoke set, Ark usually produced ranked

review targets that overlapped identifiers in the eventual repair region. The

exact-signal top-3 result is compared against a deterministic random-signal

baseline over the same candidate designs.

The blind-structural check removes repaired identifiers from target tokens and

asks whether the remaining ranked-target neighborhood still carries structural

context. This helps guard against the objection that the result is only name

echo from the diff.

Two cases produced no ranked targets. These are treated as useful diligence

signals, not hidden failures:

context.

We have started retrospective validation on public RTL repair history. In a

15-case HWE-bench-shaped smoke set across Ibex, CVA6, and OpenTitan, Ark

produced ranked targets in 13 cases. Top-3 exact-signal overlap with repaired

identifiers was 12/13, compared with a random baseline mean top-3 rate of

0.322. Median review compression was about 349x. We treat this as repair-region

overlap evidence, not bug detection or signoff. In this local smoke run, median

case runtime was about 0.2 seconds.

review.