GLYPH Roadmap
v0.1 — Current#
Current state:
- exact byte-level retrieval
- SA32u
- BWT
- FM-index
- persistent FM query backend
- 1GB / 4GB validated corpora
- HDFS benchmark published
Known tradeoffs:
- very high RAM overhead
- static corpora only
- no regex / ranking / fuzzy search
- no segmented retrieval yet
v0.2 — Segmented Retrieval#
Goal:
Break the 4GB single-shard ceiling.
Planned work:
- segmented corpus layout
- shard routing
- merged shortlist retrieval
- shard-local SA/BWT/FM
- deterministic cross-shard query merge
Target:
- 50GB+ static corpora
- bounded RAM per shard
v0.3 — RAM Reduction#
Goal:
Reduce persistent memory overhead.
Research directions:
- mmap-based access
- compressed FM structures
- sampled SA
- lazy loading
- MADV_RANDOM / paging experiments
Target:
- lower RAM amplification
- larger corpora on commodity hardware
Non-goals#
GLYPH is not:
- a search relevance engine
- a vector database
- a replacement for Elasticsearch
- a fuzzy matcher
- a regex engine
GLYPH explores a different tradeoff:
persistent byte-level indexed retrieval over static corpora.