GLYPH Exact Retrieval Layer — SPEC v0.1

1. Purpose#

Deterministic exact retrieval from raw bytes.

query → anchors → FM → chunk_ids


2. Outcomes (Contract)#

Outcome Meaning
EMPTY_QUERY query_len == 0
TOO_SHORT query_len < frag_len
QUERY_TOO_LONG query_len > max_query_bytes
NON_SELECTIVE no anchors passed entropy filter
NO_HIT anchors selected but FM returned 0
EXACT_UNIQUE exactly one chunk matched
EXACT_MULTI multiple chunks matched
INVALID_PARTIAL_HIT some anchors hit, others failed

3. Result Format#

shortlist_size total_count truncated shortlist_top


4. Explain Mode#

accepted_anchors dropped_by_entropy selected_anchors min_selected_sa_hits max_selected_sa_hits anchors_with_zero_hits


5. Limits / Safety#

Param Default
frag_len 48
window_step 64
max_windows 64
pick_k 3
entropy_min 2.0
non_selective_threshold 16
max_query_bytes 1MB
limit 100

6. Guarantees#

  • deterministic results
  • no false positives
  • bounded FM calls (≤ max_windows)
  • explainable decisions

7. Known Limitations (v0.1)#

  • no FULL_EXACT (distance constraints)
  • no scoring for EXACT_MULTI
  • no index versioning
  • no concurrency guarantees
  • no internal timeout control

8. Test Coverage#

Covered:

  • empty query
  • short query
  • long query
  • low entropy
  • absent data
  • exact unique
  • exact multi
  • partial corruption

9. Fuzz Coverage#

rare_anchor_fuzz_suite_v1.py

Covers:

  • adversarial queries
  • mutation
  • binary noise
  • boundary cases

10. Future (v0.2)#

  • FULL_EXACT (distance constraint)
  • max_hits hard cap
  • index versioning
  • concurrency model
  • internal timeout