Segmented Fixed Correctness — GLYPH v0.2

Branch:

Purpose:

Validate segmented retrieval after fixing FM sentinel semantics.

Root cause fixed:

FM-index requires corpus + real appended 0x00 sentinel.
Previous builds used synthetic BWT sentinel without appending a real sentinel to the indexed corpus.
This caused shifted FM intervals and undercounting.

Fixed build flow:

raw corpus
-> tools/prepare_sentinel_corpus_v1.py
-> build_sa_u32
-> build_bwt
-> build_fm

Dataset:

Manifest:

Pattern:

Ground truth:

GLYPH segmented result:

Conclusion:

Segmented retrieval correctness is restored when shards are built through the sentinel-safe indexing pipeline.