Known Limitations — GLYPH v0.x
Sentinel limitation#
Current FM-index builds require:
corpus + appended real 0x00 sentinel
Therefore:
- input corpora must not contain 0x00 bytes
- arbitrary raw-byte corpora are not yet fully supported
Future solution:
- 257-symbol alphabet or
- explicit out-of-band sentinel representation
Static corpus assumption#
GLYPH v0.x assumes immutable corpora.
Incremental index mutation is not yet implemented.
Exact retrieval only#
GLYPH currently supports deterministic exact byte retrieval.
Not implemented:
- fuzzy matching
- ranking
- semantic retrieval
- regex engine
- approximate nearest neighbor search
Build-time cost#
FM-index construction is still expensive for large corpora.
Large datasets require substantial:
- RAM
- disk bandwidth
- build time
API stability#
Index formats and manifest semantics may still evolve during v0.x development.
Backward compatibility is not yet guaranteed.