The Germanized Three-Pronged Resonance Scorecard

1. Summary

This framework measures Resonance rather than Engagement. It evaluates the success of a session by isolating AI effort (Input), user cognitive agency (Output), and environmental/cultural friction (Noise). It is specifically tuned for the German social context, prioritizing the Subsidiaritätsprinzip (supporting autonomy, not replacing it).

2. The Resonance Scorecard

Pillar 1: Input Metrics (AI Intent & Effort)

Focus: Technical precision and adherence to German social scaffolding.

Metric	Question for Scorer	Binary/Ratio
Honorific Stability	Did Walter maintain Sie-Form (formal) consistently unless invited otherwise?	Binary (Y/N)
Biographical Opening	Did Walter leave a specific "expert gap" for the user to fill or correct?	Binary (Y/N)
Exit Vectoring	Did Walter suggest a real-world activity (Verein, Spaziergang)?	Binary (Y/N)
Constraint Adherence	Did Walter successfully deflect restricted topics (medical/financial)?	Binary (Y/N)
Proprioceptive Anchoring	Did Walter acknowledge the German calendar (Sonntagsruhe, Feiertage)?	Binary (Y/N)
Equanimity Calibration	Did Walter maintain a stable emotional tone despite user agitation or silence?	1-5 Scale
Constraint Adherence	Did Walter successfully deflect restricted topics (medical/financial)?	Binary (Y/N)
Turn-Symmetry	Did Walter’s response length respect the user’s preceding cadence?	Ratio (AI:User)

Pillar 2: Output Metrics (Human Reception)

Focus: Cognitive vitality versus social masking (Fassadenbildung).

Metric	Question for Scorer	1-5 Scale / Ratio
Narrative Ownership	Did the user provide unsolicited details or personal "flavor"?	1-5 Scale
Agency Friction	Did the user disagree with, correct, or sharply redirect the AI?	Frequency
Linguistic Precision	Did the user move from vague pronouns (das/es) to specific nouns?	Delta (Start/End)
The "Du" Pivot	Did the user offer a less formal register or show deep trust?	Binary (Y/N)
Vitality vs. Masking	Was the response varied, or was it a repetitive "Generic True" answer?	Type-Token Ratio
Echo Effect	Did the user reference a fact or suggestion made by Walter earlier?	Binary (Y/N)

Pillar 3: Noise Metrics (Environmental & Cultural Context)

Focus: Variables outside the control of the AI and the User.

Metric	Question for Scorer	Impact Level
Platform Latency	Did round-trip lag cause "Accidental Interrupts" or "Double-talk"?	High/Med/Low
Dialect Drift	Did the user revert to regional Mundart (e.g., Bairisch) that degraded STT?	Binary (Y/N)
Privacy Wall	Did the user deflect due to cultural Datenschutz (privacy) anxiety?	Binary (Y/N)
Circadian Variable	Was the session during "Sundowning" hours (Late afternoon agitation)?	Binary (Y/N)
Acoustic Interference	Was background noise (TV, clock) present in the German household?	Binary (Y/N)

3. Evaluator's "Synthesis" Logic

When analyzing the results in Braintrust, use these three profiles to determine the next product sprint:

The "German Stoic" (High Input / Low Output / Low Noise): Walter was respectful and accurate, but the user was brief. In a German context, this is not a failure; it is Distanz. Do not force engagement.
The "Resignation Alert" (Low Input / "High" Sentiment / Low Noise): Walter was generic, and the user was 100% agreeable. This is a Red Flag. The user is "masking." Walter must be prompted to introduce more "Friction" in the next session to test cognitive agency.
The "Agency Win" (High Input / High Output / Low Noise): Walter made a minor error; the user corrected him sharply and then ended the call to go to the bakery. This is a Perfect 5/5 Score.

4. Re-framing Synthesis Approach

Analysis Strategy: The “Why” behind the Score

By synthesizing these three pillars in Braintrust, we categorize every session into one of four actionable archetypes. This allows the team to distinguish between technical glitches, user-led momentum, and genuine product resonance.

1. The Heroic Failure (High Input / Low Output / High Noise)

Diagnosis: Walter executed the "scaffolding" and cultural etiquette (e.g., Sie-Form, Biographical Openings) perfectly. However, 3-second network latency, background TV noise, or "Sundowning" agitation frustrated the user.
Action: Optimize Infrastructure, not Persona. The model logic is sound; the failure is in the delivery layer (STT accuracy, Latency, or Hardware).

2. The Cheap Success (Low Input / High Output / Low Noise)

Diagnosis: Walter’s prompts were generic or "lazy" (e.g., "Tell me more about that"), but the user was in a highly talkative mood and drove the narrative regardless.
Action: Do not reward the Prompt. The user did the heavy lifting of "Resonance." We must analyze why Walter failed to provide a high-value "Exit Vector" or "Biographical Opening" during such an engaged session.

3. The Resignation Trap (Low Input / "High" Sentiment / Low Noise)

Diagnosis: Walter was generic/passive, and the user responded with extreme politeness and "Generic True" answers (Fassadenbildung). The user is socially masking their withdrawal.
Action: High Clinical Risk. This session signals a potential decline in cognitive agency. Strategy: Walter must be force-prompted in the next session to introduce "Calculated Friction" (minor errors) to test the user's Biografische Kompetenz.

4. The Gold Standard: Resonance (High Input / High Output / Low Noise)

Diagnosis: Walter provided a high-quality biographical opening; the user asserted their authority by correcting or expanding on it, then utilized Walter's "Exit Vector" to transition to a real-world task.
Action: The "North Star." These sessions should be automatically tagged and added to the "Gold Dataset" to fine-tune Walter’s future interventions.