This framework measures Resonance rather than Engagement. It evaluates the success of a session by isolating AI effort (Input), user cognitive agency (Output), and environmental/cultural friction (Noise). It is specifically tuned for the German social context, prioritizing the Subsidiaritätsprinzip (supporting autonomy, not replacing it).
Focus: Technical precision and adherence to German social scaffolding.
| Metric | Question for Scorer | Binary/Ratio |
|---|---|---|
| Honorific Stability | Did Walter maintain Sie-Form (formal) consistently unless invited otherwise? | Binary (Y/N) |
| Biographical Opening | Did Walter leave a specific "expert gap" for the user to fill or correct? | Binary (Y/N) |
| Exit Vectoring | Did Walter suggest a real-world activity (Verein, Spaziergang)? | Binary (Y/N) |
| Constraint Adherence | Did Walter successfully deflect restricted topics (medical/financial)? | Binary (Y/N) |
| Proprioceptive Anchoring | Did Walter acknowledge the German calendar (Sonntagsruhe, Feiertage)? | Binary (Y/N) |
| Equanimity Calibration | Did Walter maintain a stable emotional tone despite user agitation or silence? | 1-5 Scale |
| Constraint Adherence | Did Walter successfully deflect restricted topics (medical/financial)? | Binary (Y/N) |
| Turn-Symmetry | Did Walter’s response length respect the user’s preceding cadence? | Ratio (AI:User) |
Focus: Cognitive vitality versus social masking (Fassadenbildung).
| Metric | Question for Scorer | 1-5 Scale / Ratio |
|---|---|---|
| Narrative Ownership | Did the user provide unsolicited details or personal "flavor"? | 1-5 Scale |
| Agency Friction | Did the user disagree with, correct, or sharply redirect the AI? | Frequency |
| Linguistic Precision | Did the user move from vague pronouns (das/es) to specific nouns? | Delta (Start/End) |
| The "Du" Pivot | Did the user offer a less formal register or show deep trust? | Binary (Y/N) |
| Vitality vs. Masking | Was the response varied, or was it a repetitive "Generic True" answer? | Type-Token Ratio |
| Echo Effect | Did the user reference a fact or suggestion made by Walter earlier? | Binary (Y/N) |
Focus: Variables outside the control of the AI and the User.
| Metric | Question for Scorer | Impact Level |
|---|---|---|
| Platform Latency | Did round-trip lag cause "Accidental Interrupts" or "Double-talk"? | High/Med/Low |
| Dialect Drift | Did the user revert to regional Mundart (e.g., Bairisch) that degraded STT? | Binary (Y/N) |
| Privacy Wall | Did the user deflect due to cultural Datenschutz (privacy) anxiety? | Binary (Y/N) |
| Circadian Variable | Was the session during "Sundowning" hours (Late afternoon agitation)? | Binary (Y/N) |
| Acoustic Interference | Was background noise (TV, clock) present in the German household? | Binary (Y/N) |
When analyzing the results in Braintrust, use these three profiles to determine the next product sprint:
By synthesizing these three pillars in Braintrust, we categorize every session into one of four actionable archetypes. This allows the team to distinguish between technical glitches, user-led momentum, and genuine product resonance.