Visual companion to KustoEngineV11ShardsMetadataQueryFlow.md. Paste any block into
the Mermaid Live Editor or render in a Mermaid-aware viewer.
In V11 a table's shards are no longer a single persisted set. The truth is persisted shard groups (Rust storage) PLUS an in-memory delta of pending attach/detach/drop ops. A periodic drain (~15 min) folds delta into new groups and bumps the version, so between drains the delta holds real, query-visible rows. Everything must be read through SchemaManager, which wraps each DB so reads see baseline+delta. The whole mode is off unless EnableShardsMetadataDelta is set.
IShardsMap Surface — Delta-Aware vs Storage-OnlyIShardsMap already splits its methods into two camps. The combined-view reads (GetTableExtentsMetadata, the ordered variant, and the stats-only GetTableShardsMapSummary) merge storage with delta and are the only sanctioned query reads. The raw methods (GetTableShardGroups, GetShardsMetadataRaw) return storage state with delta never consulted — fine for internal metadata work, wrong for query results. The take bug is simply a query consumer calling into the wrong camp.
take Fast Path — the 8-Node Bypass ChainThis is how T | take K actually reaches storage today. The planner pre-materializes shard groups (SetShardGroups), then CreateTrivialLimiterStrategy consumes them. The chain flows planner → prefilter → CreateShardsMetadataFilters → GetTableShardGroups → ShardsMetadataStorage.GetShardGroups. Every hop is storage-only: the in-memory delta is never touched anywhere along this path, which is the structural source of the stale results.
Zooming into CreateTrivialLimiterStrategy: it picks the "latest" group by ArgMin(Age), and if K fits that group's TotalRowCount it returns just its newest extent; otherwise it walks all groups by age until the budget is met. The trap: ArgMin(Age) ranks persistence age, so the "newest" group was committed at the last drain — by definition older than anything still in delta. Both return paths are persisted-only, so delta rows are silently dropped.
The bug is not one line — it's two reinforcing flaws. (1) A false V10 invariant: the fast path assumes "having shard groups means having the table", so every signal it trusts (ArgMin(Age), TotalRowCount, the group walk) is storage-only. (2) A filter-contract violation: it asks for one group via ShardGroupIdsFilter, but delta shards have no group id, and the framework's SafeHasBoundedShardIdsFilter only inspects shard ids — so the filter falls through silently and delta is never surfaced.
IsDeltaEnabledThe fix routes V11+delta away from shard-group state entirely. Step 1: only materialize shard groups when delta is OFF. Step 2: CreateTrivialLimiterStrategy branches on IsDeltaEnabled — the existing fast path stays untouched where it's correct, V11+delta takes a new path. Step 3: the new path streams the delta-aware ordered iterator (delta first, then newest groups), accumulating rows and cold counts, breaking when the budget is met. Step 4: two Ensure guards throw if shard-group access or a ShardGroupIdsFilter ever reappears in delta mode.
A natural worry: does the new loop scan more? No — because the native cache loads whole shard groups on a miss, ignoring the filter and maximumShardCount; those limits only trim the later C# iteration. So the dominant cost is "how many groups did we touch", not "how many rows". A lazy iterator with a caller-side break already stops loading once the budget is met, so the redesign is performance-equivalent without needing a MaximumRowCount push-down into native storage.
Three designs were weighed. A (recommended) is delta-aware by construction with no new invariants and runtime guards against regressions. B (rejected) gates on the summary, but the summary is stats-only and a correct B just converges to A. C (deferred) synthesizes a fake delta "group", which forces inventing Age/Id/RowCount semantics and re-introduces the very category error A removes — only worth it if a non-take consumer later needs per-group reasoning.