Project: Real Estate Recommendation System
Date: 2026-05-21
Best protected submission: outputs/submission_1024.zip
Best public leaderboard: 0.2116
Final production philosophy: strong retrieval first, segment-aware fallback second, no unproven reranking.
The final useful solution is not a single clever model. It is a carefully constrained recommendation system built from the data's own behavior:
Raw marketplace logs
-> positive signal cleaning
-> compact user/item caches
-> high-density ContactALS retrieval
-> intent and co-contact fallback sources
-> segment popularity last resort
-> direct top-10 cascade
-> strict submission validation
The winning public artifact is:
outputs/submission_1024.zip
publicScore = 0.2116
rank = Top5 at time of submission
The decisive turn was moving from a fragile "train a big reranker over everything" mindset to a simpler marketplace philosophy:
In this task, the most valuable object is not a feature table.
It is a reliable ordered candidate list, produced by the strongest available signal for each user.
The best leaderboard result came from ALS1024 + cascade-direct, not from LightGBM hybrid, snapshot blind fallback, or ensembling:
| Attempt | Public LB | Lesson |
|---|---|---|
| v14 clean cascade baseline | 0.0344 | Warm users work, cold/blind mostly dead |
| v17 ALS1024 cascade-direct | 0.2116 | Strong CF retrieval + direct cascade is the best known path |
| snapshot blind fallback | 0.0003 | Offline blind gains did not transfer |
| v18 ALS1536/time-decay branch | 0.2108 | Bigger ALS was not better than v17 |
| v19 v17-top9/v18-slot10 blend | 0.1974 | Even v17 rank10 carries useful signal |
The protected final answer is therefore:
Use v17 artifact: outputs/submission_1024.zip
Do not replace it with v18, snapshot, hybrid, or slot-blend variants.
At the beginning, the problem looked like a standard recommender competition:
That plan was directionally reasonable but incomplete. Real estate recommendation is not like recommending movies or music. A user is not trying to consume many similar items forever. A user is searching inside a constrained marketplace:
city -> district -> property type -> budget -> listing freshness -> seller/contact behavior
The EDA changed the design in four major ways.
One of the strongest findings was:
85.5% of ground-truth items are completely new to the user.
That means a pure replay strategy cannot win. The model should not only repeat viewed/contacted listings. It must discover plausible new listings that are close to the user's intent.
Design consequence:
Use replay as a precise fallback, but make the main engine a retrieval model
that can discover new relevant items.
This is why ContactALS became central. ALS can recommend items that the user has never touched, based on similar users' contact behavior.
EDA found:
91.9% of GT items match user's preferred city.
72.2% of GT items match user's preferred category.
city + category combined explain a large part of relevance.
In real estate, location is not a weak metadata feature. It is the frame in which almost all user intent lives.
Design consequence:
Every cold-start fallback must preserve city/category intent.
Every ranker feature or segment fallback must understand pref_city and pref_cat.
This is why the pipeline builds cold_user_prefs.parquet, why SegmentPopularity is keyed by city/category/district, and why a city-name bug was catastrophic.
The public test distribution exposed a severe user segmentation problem:
Test users: 161,568
Warm users: 54,502 (33.7%)
Cold+prefs: 12,191 (7.5%)
Blind users: 94,875 (58.7%)
A majority of test users had no usable training contact history. But the offline experiments also showed:
Segment popularity ceiling for truly blind users: around 1.6% Recall@10
Clean truly-blind recall: near zero in realistic split-clean eval
Design consequence:
Do not over-invest in blind popularity hacks.
Protect warm-user ALS quality first.
Use segment fallback only as a necessary last resort.
This is why snapshot blind fallback was attractive offline but dangerous in production. Public LB later confirmed it was rejected:
outputs/submission_snapshot_blind.zip -> 0.0003
Several experiments sounded good but failed:
| Idea | Result | Why it failed |
|---|---|---|
| Unified LightGBM reranker | v11 = 0.0048 | Trained/deployed distribution mismatch and cold-start overfit |
| Offset diversity | v12 = 0.0050 | Pushed users away from the most relevant popular items |
Remove is_login globally |
v13 = 0.0140 | Added noisy device-level/non-login interactions into ALS |
| Snapshot blind fallback | 0.0003 | Item-side demand did not transfer to public LB |
| ALS1536 branch | 0.2108 | Bigger capacity plus bundled changes did not beat ALS1024 |
| Tail-slot blend | 0.1974 | v17 top-10 ordering was already strong |
The final design became more conservative:
If a change does not preserve the exact retrieval distribution that produced LB gain,
it must be treated as risky until proven.
The final solution is built around seven principles.
The statement around other_interaction was misleading. The data showed:
other_interaction has is_contact = 1
pageview has is_contact = 0
So other_interaction is a positive signal. It must be included in positive events.
Implementation:
positive_events = [
"view_phone",
"contact_chat",
"contact_zalo",
"contact_sms",
"other_interaction",
]
Philosophy:
When the competition description and the ground-truth flag disagree,
trust the label-generating data.
The raw tables are too large for casual joins:
fact_user_events: 161,731,336 rows
fact_listing_snapshot: 19,762,167 rows
fact_post_contact_interactions: 25,486,445 rows
dim_listing: 3,107,114 rows
The system therefore turns raw events into compact caches:
| Cache | Purpose |
|---|---|
.cache/contact_pairs.parquet |
login positive contacts with city/category/count/last_date |
.cache/als_contact_pairs.parquet |
user-item contact matrix for ALS |
.cache/als_weighted_contact.parquet |
weighted ALS matrix, real contacts > soft contacts |
.cache/als_pageview_pairs.parquet |
pageview matrix, available but not used in final top-10 |
.cache/session_items.parquet |
session co-occurrence substrate |
.cache/cold_user_prefs.parquet |
city/category preferences for non-warm users |
.cache/snapshot_stats.parquet |
item-side recent views/contacts/trends |
Philosophy:
The model should not fight the raw data scale every time it trains.
Make the expensive facts small, typed, and reusable.
The strongest leaderboard jump did not come from adding more feature columns. It came from upgrading the retrieval backbone:
v14: 0.0344
v17: 0.2116
relative gain: 6.15x
The main change was a high-capacity ContactALS model and a direct cascade path.
Philosophy:
At top-10, a beautiful ranker cannot rescue a bad candidate pool.
First retrieve the right 10-200 items; only then worry about reranking.
Removing the is_login filter made the matrix bigger, but worse:
Clean login baseline: 0.034
No is_login filter: 0.014
relative drop: -59%
The model learned that non-login identifiers behave differently from logged-in accounts. Adding them globally diluted collaborative structure.
Philosophy:
More rows are useful only if they preserve the identity semantics of the user_id.
Noisy scale is not signal.
Round-robin and diversity-heavy policies were worse. Budget-based sequential union won because it respects source strength:
Budget-based sequential union Recall@200: 0.3177
Round-robin interleave: worse
Offset diversity: LB collapse
Philosophy:
Sources are not equal. Let the strongest available signal speak first.
Fallbacks should fill holes, not compete equally with the best source.
Cold-start matters because many test users are cold or blind. But the ceiling for truly blind popularity is very low.
Philosophy:
Cold-start support is mandatory for valid submissions,
but the score is won by preserving the high-confidence warm path.
This explains why final production chooses cascade-direct for all users instead of a unified LightGBM reranker.
The v19 blend kept v17 ranks 1-9 and changed only rank10. It still dropped:
v17: 0.2116
v19: 0.1974
delta: -0.0142
Philosophy:
Do not treat the tail item as disposable.
The whole v17 top-10 ordering contains learned signal.
fact_user_eventsThis is the main behavioral table. It contains pageviews and contact-like positive events.
Used for:
Important filters:
For ALS/contact training:
is_login == "login"
is_contact == 1
For positive event definitions:
event_type in [
view_phone,
contact_chat,
contact_zalo,
contact_sms,
other_interaction
]
For pageview-derived preferences:
event_type == pageview
Important correction:
dwell_time_sec is actually milliseconds.
3 seconds means raw threshold 3000ms.
dim_listingThis is the item metadata table.
Used for:
Important fields:
item_id
city_name
district_name
category
price
seller_id / seller fields
listing quality fields
Important lesson:
Null metadata is not always bad data.
Many real estate fields are naturally sparse by property type.
fact_post_contact_interactions / PCIPCI was discovered as a major hidden signal source.
Key findings:
10,654 blind test users had PCI data.
644,732 new lead pairs were not in ALS training data.
Cold+PCI prefs achieved about 30x uplift in clean eval:
cold+prefs: 0.0612
cold-no-prefs: 0.0020
Used for:
Caution:
PCI helps when it increases useful density.
Blind/cold PCI prefs are valuable, but broad uncontrolled merging can change matrix semantics.
fact_listing_snapshotSnapshot contains item-side recent demand:
views_24h
contacts_24h
recent trend signals
active item signals
It was useful as a feature source in experiments, but the public LB rejected using snapshot demand as the final blind fallback:
snapshot blind fallback public LB = 0.0003
Final interpretation:
Snapshot is a useful diagnostic and possible feature source,
but it should not replace the protected v17 cascade path.
test_users.parquetThe required prediction universe:
161,568 users
10 rows per user
1,615,680 total submission rows
Test users drive:
The preprocessing module is the bridge between raw logs and modeling.
Entry point:
DataPreprocessor.process_and_cache(lf, snapshot_path)
It runs the following transformations.
Output:
.cache/contact_pairs.parquet
Logic:
filter is_login == login
filter event_type in positive_events
group by user_id, item_id, city_name, category
aggregate count and last_date
Purpose:
Why it matters:
This cache preserves city/category with the interaction,
so downstream fallback can stay location-aware.
Output:
.cache/als_contact_pairs.parquet
Logic:
filter is_login == login
filter is_contact == 1
group by user_id, item_id
score = count
Purpose:
Sparse implicit-feedback matrix for ContactALS.
Output:
.cache/als_weighted_contact.parquet
Logic:
real contacts:
view_phone, contact_chat, contact_zalo, contact_sms -> weight 3
soft positive:
other_interaction -> weight 1
group by user_id, item_id
score = sum(weight)
Purpose:
Give stronger weight to high-intent contact actions,
while keeping other_interaction as a valid but softer positive.
Output:
.cache/als_pageview_pairs.parquet
Logic:
filter is_login == login
filter event_type == pageview
group by user_id, item_id
aggregate view_count and avg_dwell
Purpose:
Final decision:
ViewALS is disabled in final cascade budgets because it diluted the candidate pool.
Output:
.cache/session_items.parquet
Logic:
filter login events with session_id
group item_ids by session_id
keep sessions with 2 <= n_items <= 30
Purpose:
Support session-level co-occurrence ideas and diagnostics.
Output:
.cache/cold_user_prefs.parquet
Logic:
identify warm users from positive contacts
for non-warm users, aggregate pageview city/category modes
pref_city = mode(city_name)
pref_cat = mode(category)
Purpose:
Convert users without contact history into users with at least city/category intent.
Important warning:
The current working code includes the H-029 extension that allows non-login
pageviews for preference extraction. That is different from removing is_login
globally from ALS. Global non-login ALS was already rejected.
Output:
.cache/snapshot_stats.parquet
Features:
item_avg_views_7d
item_avg_contacts_7d
item_conversion_rate
item_trend_score
item_is_active
Final decision:
Useful for analysis and hybrid feature parity,
but snapshot blind fallback is not part of the protected v17 final.
This section maps each major insight to its algorithmic consequence.
other_interaction Is PositiveInsight:
other_interaction has is_contact=1.
Feature/model consequence:
Add other_interaction to positive_events.
Use it in contact pairs and ALS.
Weight it lower than direct contacts in weighted ALS.
Why:
It contains positive behavioral information, but it is weaker than a phone/chat/Zalo/SMS contact.
Insight:
91.9% city match.
72.2% category match.
Feature/model consequence:
Build pref_city and pref_cat for every user possible.
Use pref_city/pref_cat in SegmentPopularity.
Use city_match and cat_match in ranker experiments.
Build RecentCC by (city, category).
Why:
Real estate intent is geographically anchored.
No fallback should ignore geography if any user signal exists.
Insight:
85.5% GT items are new to user.
Feature/model consequence:
Do not rely on pure replay.
Use ContactALS, IntentRecommender, CoContact, UserKNN, SellerExpansion.
Why:
The model must generalize from past behavior to unseen listings.
Insight:
Warm: 33.7%
Cold+prefs: 7.5%
Blind: 58.7%
Feature/model consequence:
Every user must have 10 valid recommendations.
Build a fallback chain ending in SegmentPopularity.
Use PCI/cold prefs to rescue any cold user possible.
Why:
A pure ALS solution leaves too many users uncovered.
Insight:
Segment popularity ceiling for blind users is around 1.6% Recall@10.
Feature/model consequence:
Use SegPop as last resort, not as the main scorer.
Protect ALS-first warm recommendations.
Why:
Blind users have no user-side intent. Popularity can keep submissions valid,
but cannot carry a top solution alone.
Insight:
Removing is_login globally dropped LB from 0.034 to 0.014.
Feature/model consequence:
Keep login-only ALS/contact matrix.
Do not train collaborative embeddings on mixed identity semantics.
Only consider non-login pageviews for isolated city/category preference extraction.
Why:
Collaborative filtering needs stable user identity.
Anonymous/device-like IDs can pollute the matrix.
Insight:
ALS1024 cascade-direct reached 0.2116.
ALS1536 branch reached 0.2108.
Feature/model consequence:
Use ALS1024 as protected production baseline.
Do not assume larger factors improve public LB.
Why:
Capacity helps until it does not. The only proven high-score artifact is 1024.
Insight:
LightGBM hybrid destroyed cold recall and failed public LB in earlier submissions.
Feature/model consequence:
Final inference_mode = cascade.
Do not route final production through unified LightGBM.
Why:
The ranker can overfit warm dense features and mis-handle cold sparse candidates.
The final system is an ensemble at the retrieval level, not at the score-blending level.
Role:
Main warm-user retrieval engine.
Input:
user_id, item_id, score
Where score comes from:
weighted positive contacts:
real contacts weight 3
other_interaction weight 1
optional PCI lead weights in experimental branches
Final protected model:
ALS factors: 1024
iterations: 30
regularization: 0.01
artifact: outputs/models/als/
model size: about 5.8GB
user_factors: (810,411, 1024)
item_factors: (696,252, 1024)
Why it works:
It converts sparse contact histories into dense user/item embeddings,
allowing discovery of unseen listings.
Why it is first in the final cascade:
For warm users, ALS recommendations are the highest-confidence top-10 source.
Role:
Intent matching from pageviews to similar current listings.
Core idea:
If a user browses listings in a district/category/price region,
recommend other valid listings in that intent bucket.
Feature basis:
pageview item_id -> listing metadata -> district/category/price intent
Why it exists:
Pageviews are weaker than contacts, but they expose search intent,
especially for users without contacts.
Role:
Replay recently viewed items as a precise but narrow signal.
Window:
14 days
Why it is not the main source:
85.5% of GT items are new to the user, so pure replay cannot dominate.
Role:
Item-to-item expansion from recent contact history.
Core idea:
Users who contacted item A also contacted item B.
If current user contacted A, recommend B.
Window:
30 days
Why it exists:
Real estate shoppers often compare listings in the same micro-market.
Co-contact captures that local comparison behavior.
Role:
Neighbor-based collaborative fallback.
Core idea:
Find users who overlap on contacted items, then recommend their other items.
Why it exists:
It is a simpler local CF signal that can complement ALS.
Role:
Recommend other listings from sellers the user has interacted with.
Why it exists:
Real estate sellers often list similar properties in the same area or category.
Risk:
Seller affinity is useful as fallback, but weaker than user/item CF.
Role:
Recent popular contacts by (city, category).
Window:
7 days
Why it exists:
Real estate inventory is time-sensitive.
Recent demand in the same city/category is a better fallback than old global popularity.
Role:
Last resort fallback.
Cascade levels:
(city, category, district)
-> (city, category)
-> city
-> category
-> global
Pool sizes:
global_k = 500
segment_k = 500
cc_k = 500
ccd_k = 100
Why it exists:
Every test user must receive 10 valid unique items.
When all personalized sources fail, SegPop guarantees coverage.
Why it is last:
Popularity alone has a low ceiling for truly blind users.
The cascade is the core serving algorithm.
It is not a boosting model, not a LightGBM cascade, and not a ranker by itself. It is a deterministic priority-based slot filler.
Final mode:
inference_mode = cascade
k = 10
Current top-10 source order:
1. ALS
2. Intent
3. CoContact
4. PageviewReplay
5. UserKNN
6. SellerExpansion
7. RecentCC
8. SegmentPopularity
Important nuance:
Budgets are caps, not guaranteed allocations.
The cascade stops as soon as it has 10 unique valid items.
So for a warm user with good ALS coverage:
ALS may fill all 10 slots.
No lower source is needed.
For a cold user:
ALS returns nothing or little.
Intent/pageview/recent_cc/segpop fill the remaining slots.
Pseudo-code:
def cascade_generate(user, k=10):
seen = set()
recs = []
for source in source_order_top10:
budget = budget_top10[source]
if budget <= 0:
continue
candidates = source.recommend(user, budget)
for item in candidates:
if item not in seen and item in valid_items:
recs.append(item)
seen.add(item)
if len(recs) == k:
return recs
return recs[:k]
Round-robin assumes sources are equally trustworthy. EDA rejected that assumption.
Sequential priority works because:
ALS is stronger for warm users.
Intent/PV are useful only when user behavior supports them.
SegPop is a fallback, not a peer to ALS.
The design gives each source a role:
| Source type | Role |
|---|---|
| ALS | Primary high-confidence retrieval |
| Intent/PV | Behavioral intent recovery |
| CoContact/UserKNN/Seller | Collaborative/local expansion |
| RecentCC/SegPop | Coverage and cold fallback |
Hybrid mode exists:
Cascade k=200 -> feature engineering -> LightGBM LambdaRank -> top10
But final production does not use it because:
Earlier LightGBM submissions collapsed on LB.
Unified ranker overfit warm feature density.
Cold-start candidates lack many behavioral features.
Segmented hybrid was still risky.
Direct cascade preserves the retrieval distribution that the leaderboard rewarded.
The training pipeline has two possible personalities:
The protected best solution uses the first personality.
For offline evaluation, contacts are split by time:
train_contacts = contacts with last_date <= split_date
val_contacts = contacts with last_date > split_date
Why this matters:
If ALS/SegPop are trained on full data including validation period,
offline blind recall is inflated.
Confirmed leak:
Blind recall dropped from 0.1654 to 0.0004 after split-clean retraining.
Input:
train_contacts
valid_items from dim_listing
listing metadata
Output:
outputs/models/segpop.pkl
Function:
Build popular item lists by city/category/district hierarchy.
Critical warning:
Training pipeline can overwrite segpop.pkl.
The protected production state must use the recency-aware SegPop artifact.
Input:
ALS user-item pairs
Best protected configuration:
factors = 1024
iterations = 30
regularization = 0.01
GPU = enabled
Output:
outputs/models/als/
Why 1024:
v17 ALS1024 achieved 0.2116.
v18 ALS1536 achieved 0.2108.
Therefore 1024 is the best proven capacity.
ViewALS trains collaborative filtering on pageview pairs.
It was disabled because:
als_view diluted the candidate pool.
Disabling it improved Recall@200 by about 5.4%.
It also creates memory pressure.
Final config behavior:
als_view budget = 0
do not load stale als_view artifact
Ranker features include:
source flags
ALS scores
user behavior stats
item contact/view stats
item quality metadata
city/category/price match features
snapshot stats
seller affinity
recent history
But final mode skips this path:
inference_mode = cascade
Reason:
The public leaderboard repeatedly punished ranker/hybrid variants.
The final inference pipeline does the following.
Inputs:
test_users.parquet
dim_listing
.cache/contact_pairs.parquet
.cache/cold_user_prefs.parquet
outputs/models/als/
outputs/models/segpop.pkl
It also fits or loads runtime candidate sources:
PageviewReplay
CoContact
RecentCC
IntentRecommender
UserKNN
SellerExpansion
Preference priority:
1. Contact history preferences for warm users
2. cold_user_prefs for users without contacts
3. no prefs for truly blind users
Fields:
pref_city
pref_cat
Usage:
RecentCC and SegPop use these preferences for location/category-aware fallback.
For every test user:
call CascadeCandidateGenerator.generate_batch(...)
request k = 10
validate item_id in valid_items
deduplicate per user
The result is a dictionary:
user_id -> [item_1, item_2, ..., item_10]
Required format:
ID,user_id,rank,item_id
Required shape:
161,568 users * 10 ranks = 1,615,680 rows
rank in 1..10
no duplicate item per user
item_id must exist in dim_listing
rank-1 top item must not exceed 10% of users
zip/gz size <= 100MB
v17 validation:
Rows: 1,615,680
Users: 161,568
Unique items: 62,947
Rank-1 top item: 9,948 users
Zip size: 41.37MB
Validator: PASS
RAW TABLES
|
------------------------------------------------
| | |
fact_user_events dim_listing fact_post_contact_interactions
| | |
| | |
v v v
DataPreprocessor valid item universe PCILoader
| | |
| | |
+-------- compact caches ----------------------+
|
v
---------------------------------
| |
ContactALS 1024 SegmentPopularity
| |
| |
+--------------+----------------+
|
v
CascadeCandidateGenerator
|
-------------------------------------------------
| | | | | | |
ALS Intent CoContact PV UserKNN Seller RecentCC
| | | | | | |
+------------------- sequential union ----------+
|
v
Top-10 direct list
|
v
submission_1024.zip
|
v
Public LB = 0.2116
Warm users are the segment with the richest signal. ContactALS uses their positive interactions to retrieve new items with similar collaborative structure.
Why this matters:
Warm users were the segment that already explained most early LB score.
Improving warm retrieval created the largest jump from 0.0344 to 0.2116.
Unified LightGBM tried to use dense warm features for everyone. That harmed cold users.
The final cascade avoids that:
If a source has no signal for a user, it simply contributes nothing.
The next source fills the slots.
This is safer than forcing every candidate through one global scoring function.
Even if a user has no ALS recommendations:
Intent -> PV -> CoContact -> UserKNN -> Seller -> RecentCC -> SegPop
will eventually produce valid items.
This matters because submission failure is not just low recall. Invalid shape, duplicate items, or missing users would kill the run.
The model is not just mathematical. It encodes marketplace facts:
location matters
category matters
recent demand matters
seller context can matter
positive contact behavior is stronger than browsing
anonymous browsing is noisy for CF
That is the difference between a generic recommender and a real estate recommender.
Rejected evidence:
v11 = 0.0048
Problems:
trained on one candidate distribution, inferred on another
overfit warm behavioral features
cold-start candidates were feature-sparse
Lesson:
Do not use a global ranker unless its training candidates exactly match inference candidates
and segment-level behavior is validated.
Rejected evidence:
v12 = 0.0050
Problem:
Diversity pushed users away from the most relevant popular items.
Lesson:
In cold-start fallback, the top popular items are often top for a reason.
Do not diversify blindly.
is_loginRejected evidence:
v13 = 0.0140
previous clean baseline about 0.034
relative drop about -59%
Problem:
Non-login IDs do not behave like stable account IDs.
They polluted ALS density and lowered embedding quality.
Lesson:
Keep login-only collaborative training.
Rejected evidence:
public LB = 0.0003
Problem:
Offline blind uplift did not transfer.
Item-side demand was not enough to solve user-side missing intent.
Lesson:
Do not let blind fallback experiments override the strong warm retrieval path.
Rejected evidence:
v18 = 0.2108
v17 = 0.2116
Problem:
The branch bundled multiple changes and did not beat the protected baseline.
Lesson:
ALS1024 remains the best proven capacity.
Future factor/time-decay work must isolate one variable at a time.
Rejected evidence:
v19 = 0.1974
Policy:
keep v17 ranks 1-9
replace rank10 with first unique v18 item
Problem:
Even this tiny replacement damaged the list.
Lesson:
Treat v17 top-10 as an ordered object, not as nine good items plus one disposable slot.
The project learned that offline evaluation can be misleading unless it matches test conditions.
Problem:
val_users = users with validation contacts
This selects users who are active in the validation period, which over-represents warm users.
Test reality:
Test is much colder and more blind.
If ALS/SegPop are trained on full data, validation-period contacts leak into item popularity and embeddings.
Evidence:
Blind recall before clean retrain: 0.1654
Blind recall after clean retrain: 0.0004
Lesson:
Any offline claim must specify whether models were split-clean retrained.
Even good clean eval cannot fully simulate public LB. The snapshot fallback is the clearest example:
offline looked directionally useful
public LB = 0.0003
Final rule:
Use offline eval to reject bad ideas cheaply.
Use LB only for isolated, high-confidence changes.
Protect the best proven artifact.
The best known file is:
outputs/submission_1024.zip
It is already validated and scored:
publicScore = 0.2116
At the time of this report, the code/config had later experiment settings such as:
als_factors = 1536
als_time_decay_half_life = 30.0
pci_merge_mode = test_only
non-login pageview preference extraction
These settings are not the protected v17 proof by themselves.
Important distinction:
Best artifact: outputs/submission_1024.zip
Current code: may include post-v17 experiments
If reproducing v17 exactly, restore/freeze the v17-compatible configuration before retraining or packaging.
The official format requires uppercase ID:
ID,user_id,rank,item_id
This matters. A lowercase id validator would be wrong for this competition.
For each test user:
The algorithm is simple by design:
The first source that knows something reliable about the user gets priority.
The fallback source only acts when the stronger source is silent.
Future work should be conservative and isolated.
Before any final submission:
submit outputs/submission_1024.zip directly
or restore exact v17 config and regenerate only if necessary
Do not bundle:
factors + time decay + PCI mode + cold prefs
Test one at a time:
ALS768 vs ALS1024
ALS1024 no decay vs ALS1024 decay
ALS1024 existing_only PCI vs ALS1024 no PCI
If LightGBM returns:
train separate rankers by segment
warm ranker only for warm users
cold ranker only for cold-with-pref users
never route truly blind users through warm feature assumptions
The only promising cold path is not better popularity. It is finding actual user-side signal:
PCI prefs
login pageviews
safe non-login pref extraction
query/session-derived intent
device/account mapping if valid
Required before trusting any future idea:
test-aligned user mix
split-clean model retraining
segment-level recall
artifact-level submission validation
comparison against v17, not old v14
The final system is built on a restrained idea:
Use the strongest personalized retrieval signal when it exists.
Use intent-aware fallback when it does not.
Use popularity only to guarantee coverage.
Do not let a complex reranker or clever ensemble disturb a proven top-10 list.
The project's central lesson is that recommendation quality came less from adding layers and more from respecting the structure of the marketplace:
real estate is local,
contact is stronger than browsing,
identity quality matters more than row count,
cold-start has a hard ceiling without user intent,
and the final top-10 order is precious.
That is why the protected final answer remains:
outputs/submission_1024.zip
public LB: 0.2116
algorithm: ALS1024 + direct priority cascade