Guava CFR Classwise Parity Plan

Summary

This plan defines a class-by-class execution strategy for narrowing Rust/IPND output against CFR on external_jars/guava-33.5.0-jre.jar.

Use the current report as the queue source:

This changes the working strategy from "largest family first" to "lowest-diff class first," so each small class becomes a fast closed loop and repeated fixes can compound upward.

Implementation Changes

Queue generation

Generate the working queue with:

awk -F '\t' 'NR>1 && $2=="both" && $5=="0" {print $7 "\t" $1 "\t" $8}' \
  tmp/compare-guava-33.5.0-jre-quality/reports/per_class_enriched.tsv | sort -n

Regenerate this queue after every merged fix, not only at tranche boundaries. A single general fix (e.g. modifier ordering) can close many classes across all tranches at once; serial investigation against a stale queue wastes effort. Cost is one awk | sort — negligible.

Classify before fixing (triage pass)

Before implementing anything, do a triage pass over the current tranche and record each class's delta classification in:

tmp/compare-guava-33.5.0-jre-quality/reports/triage.tsv

Columns: class_fqn, normalized_diff_lines, family, notes.

family is one of:

This surfaces the dominant family in the low-diff tranche and may justify reordering within a tranche to batch by family — one general fix can then close N classes simultaneously instead of investigating them serially.

First low-diff tranche

Start with the first low-diff tranche, in this exact order. Confirm this list is exhaustive for normalized_diff_lines <= 10 against the current report; if it is a sample, regenerate from the queue command and replace this list.

(Count: 13. Update this number if the tranche is regenerated.)

Per-class workflow

For each class:

Per-class time budget and skip rule

To prevent rabbit holes:

Record skipped classes in:

tmp/compare-guava-33.5.0-jre-quality/reports/skipped.tsv

Columns: class_fqn, normalized_diff_lines, family, reason, date.

Skipped classes are revisited after later general fixes land — they often close for free.

Tranches

Regenerate the queue after every merged fix and continue from the new lowest-diff non-match. Tranche boundaries are reporting checkpoints, not gating barriers.

Validation Plan

Primary gate

Run the full Guava parity gate after each material fix or small batch:

IPND_REUSE_CFR=1 bash scripts/run_source_cfr_parity.sh external_jars/guava-33.5.0-jre.jar

IPND_REUSE_CFR=1 reuses the previously captured CFR output to skip re-decompilation. Set it to 0 whenever a fix touches CFR-output ingestion (parser, normalizer, or anything that consumes CFR text), since stale CFR outputs would mask both improvements and regressions.

Metrics tracked after every tranche

The last two guard against "fix" commits that hold normalized_match flat while inflating raw diffs on currently-clean classes.

Acceptance criteria for tranche 1

Secondary corpus

Run the secondary corpus after tranche 1, not only at the end:

IPND_REUSE_CFR=1 bash scripts/run_source_cfr_parity.sh external_jars/commons-lang3-3.14.0.jar

Rationale: a fix that helps Guava but regresses commons-lang3 should be caught after 13 classes, not after 80+. Track the same metrics; require no regression on commons-lang3 to advance to the next tranche.

Documented exception schema

"Accepted CFR-only artifact" must be concrete or it drifts into "mentioned somewhere in a commit message." Maintain:

tmp/compare-guava-33.5.0-jre-quality/reports/accepted_cfr_artifacts.md

Each entry uses this schema:

### <class FQN>

- Date: YYYY-MM-DD
- Family: <one of the family labels>
- Category: <e.g. CFR-only comment, CFR ternary preference, CFR cast-style preference>
- Diff snippet:
  ```diff
  <minimal CFR-vs-IPND diff that remains>

Entries without a revisit trigger are not acceptable.

## Public API / Interface Changes

No public API changes are planned.

Expected implementation areas are internal only:

- source/declaration rendering
- anonymous and synthetic helper reconstruction
- generic/type reconstruction
- body rewrite canonicalization when needed by a low-diff class

## Assumptions

- "CRF" in the request means CFR.
- The current Guava report is authoritative for a given tranche; **a fresh full rerun is mandatory at tranche boundaries**, and recommended after every merged fix. The queue must not be consumed against a stale report across a tranche boundary.
- The preferred strategy is class-by-class closure from smallest normalized diff upward, even if a larger class has more total impact — but within a tranche, batching by `family` is allowed and encouraged when it lets one general fix close several classes at once.
- "Identical or really close" means normalized equality first, raw equality where practical, and documented exceptions only for CFR-specific non-semantic output, recorded under the schema above.
- Tests for emitter changes live at the IR/emitter level. Golden-file tests keyed on specific Guava classes are not a substitute and are explicitly disallowed.