SHARD BOUNDARY SEMANTICS

GLYPH v0.x currently supports segmented retrieval by splitting corpora into independently indexed shards.

This improves:

However, segmented retrieval introduces important semantic constraints.

Core invariant#

Each shard is indexed independently.

FM retrieval operates only within a single shard boundary.

GLYPH v0.x does NOT currently perform:

Patterns spanning shard boundaries may be missed.

Example:

shard0 ends with:

blk_000

shard1 begins with:

123\n

Query:

blk_000123

Expected global-corpus count:

Current segmented result:

because the pattern crosses a shard boundary.

This behavior is currently:

It is NOT currently treated as a bug.

Segmented retrieval correctness depends on whether retrieval semantics are defined as:

A: exact retrieval within independent shards

or:

B: exact retrieval over the logical global corpus

GLYPH v0.x currently implements A.

It does not yet implement B.

Future versions may support boundary-safe retrieval via:

None are currently implemented.

Segmented retrieval should currently be treated as:

exact retrieval within independently indexed shard regions

not as globally complete substring retrieval.

Future regression tests should explicitly include:

This prevents accidental semantic drift.

Segmented retrieval correctness must be defined explicitly.

Silent incompleteness is more dangerous than explicit constraints.