EXACT VERIFICATION LAYER
Modern retrieval systems increasingly rely on probabilistic pipelines.
Typical stacks now include:
- embeddings
- semantic retrieval
- reranking
- contextual chunking
- agent orchestration
- adaptive retrieval loops
These systems are often powerful and useful.
However, they also introduce uncertainty.
Examples:
- approximate matches
- embedding drift
- ranking instability
- changing retrieval semantics
- inconsistent provenance
- retrieval non-reproducibility
GLYPH explores a complementary direction.
Core idea#
GLYPH investigates whether deterministic exact retrieval can function as a stable verification substrate beneath probabilistic systems.
Instead of replacing semantic retrieval, GLYPH focuses on:
- exact byte presence
- deterministic retrieval behavior
- stable byte offsets
- reproducible retrieval semantics
- exact provenance anchors
Goal:
probabilistic systems retrieve candidates;
deterministic systems verify exact presence.
Verification vs interpretation#
Probabilistic systems optimize for:
- semantic usefulness
- approximate intent matching
- contextual relevance
Verification systems optimize for:
- exact presence
- reproducibility
- deterministic observability
- stable references
These are different infrastructure roles.
GLYPH focuses on the second role.
Possible retrieval architecture#
One possible future pipeline:
LLM ↓ semantic retrieval ↓ reranker ↓ GLYPH exact verifier ↓ exact byte offsets ↓ ground-truth confirmation
In this model:
- semantic systems generate candidate regions
- GLYPH verifies exact byte-level existence
Why this may matter#
As retrieval systems become more probabilistic, infrastructure may require stronger deterministic anchors.
Examples:
- audit systems
- infrastructure observability
- legal/compliance workflows
- forensic analysis
- binary corpus verification
- reproducible AI retrieval pipelines
- exact provenance tracking
GLYPH explores whether exact deterministic retrieval can provide such anchors.
Important boundaries#
GLYPH does NOT:
- prove semantic truth
- validate reasoning
- solve hallucinations
- guarantee factual correctness
- replace semantic retrieval systems
GLYPH only verifies exact byte-level presence within indexed static corpora.
Current research areas#
Current exploration includes:
- FM-index infrastructure
- suffix-array retrieval
- mmap retrieval behavior
- deterministic substring search
- exact byte-offset recovery
- retrieval reproducibility
- static-corpus verification semantics
- sentinel-safe index construction
Experimental status#
GLYPH is currently experimental infrastructure research.
Known limitations include:
- high RAM overhead
- evolving APIs
- incomplete correctness coverage
- limited operational hardening
- static-corpus assumptions
The project should not currently be treated as production infrastructure.
Core principle#
GLYPH explores a simple question:
can exact deterministic retrieval remain stable
beneath increasingly probabilistic systems?