🧪 HYPOTHESES TRACKER — THEO DÕI GIẢ THUYẾT

Triết lý: Mọi insight bắt đầu từ giả thuyết. Giả thuyết phải được CHỨNG MINH hoặc BÁC BỎ bằng data. Quy tắc: KHÔNG BAO GIỜ chấp nhận giả thuyết mà không có bằng chứng thống kê.


📊 DASHBOARD

Status Count
⬜ PENDING (chưa test) 6
🔄 TESTING (đang verify) 0
✅ VERIFIED (đã chứng minh) 16
❌ REJECTED (bác bỏ) 8
🔀 MODIFIED (sửa đổi) 2
TOTAL 32

⬜ PENDING HYPOTHESES (chờ verify)

[H-009] — Tăng γ (fairness weight) lên 0.25 giảm agent gap mà không hurt Recall

[H-010] — Tăng ALS half-life từ 7d lên 30d cải thiện Recall@10

[H-011] — Long-tail novelty injection tăng Coverage từ 3.71% → 8% mà Recall giảm <1%

[H-012] — Removing require_login=True trong ColdStartProfiler tăng cold-user coverage lên 40%+

[H-019] — Filtering pageview noise (dwell > 30s) before ALS training improves als_view quality

[H-027] — Time-weighted ALS (exponential recency) will improve warm recall on clean eval

[H-029] — Non-login pageview preferences (city+cat only) improve blind user recall WITHOUT touching ALS


✅ VERIFIED HYPOTHESES

[H-020] — Adding PCI lead pairs to ALS training improves Recall@10 ✅ VERIFIED

[H-021] — PCI preferences improve cold-start SegPop matching ✅ VERIFIED

[H-022] — PCI purchased=True items weighted 3x in ALS improves warm recall ✅ VERIFIED

[H-028] — A single LightGBM ranker destroys cold user recall, requiring segmented inference ✅ VERIFIED

[H-025] — Retraining ALS+SegPop on split-clean data will drop blind recall to ~0.01-0.02 ✅ VERIFIED

[H-026] — PCI prefs will show relative uplift even with clean retrain ✅ VERIFIED

[H-030] — ALS1024 + cascade-direct beats hybrid/segmented production baseline ✅ VERIFIED

[H-013] — IntentRecommender tăng mạnh Recall cho Cold-start/Warm-start users ✅ VERIFIED

[H-016] — Hard cascade slot-competition restricts Recall@200 ceiling ✅ VERIFIED

[H-017] — Round-robin interleave is inferior to sequential priority for candidate generation ✅ VERIFIED

[H-018] — Disabling als_view improves Recall@200 ✅ VERIFIED

[H-014] — adview_count correlates with contact probability up to a point ✅ VERIFIED

[H-015] — Users have high category stickiness ✅ VERIFIED

[H-002] — 64% test users are Cold-Start ✅ VERIFIED

[H-003] — dwell_time_sec is in milliseconds ✅ VERIFIED


❌ REJECTED HYPOTHESES

[H-001] — project_id nullity correlates with non-apartment categories ❌ REJECTED

[H-020] — LightGBM reranker on cascade k=200 improves top-10 ❌ REJECTED

[H-021] — Intra-segment offset diversity improves cold user score ❌ REJECTED

[H-022] — PV-first cascade improves warm user Recall@10 ❌ REJECTED

[H-024] — Category-proportional blind allocation beats global demand fallback ❌ REJECTED

[H-031] — Snapshot demand fallback improves public leaderboard ❌ REJECTED

[H-032] — ALS1536 + recency/time-decay branch beats ALS1024 v17 ❌ REJECTED

[H-033] — Conservative v17/v18 slot blend can safely improve tail ranks ❌ REJECTED

[H-023] — Warm users contribute ~0.10 recall ✅ VERIFIED


🔀 MODIFIED HYPOTHESES

[H-004] → [H-004-M] — other_interaction IS a positive signal (is_contact=1) 🔀 MODIFIED

[H-023F] → [H-023F-M] — Pure freshness is not enough; snapshot demand freshness is the useful variant 🔀 MODIFIED