GLYPH Query Tiers
Overview#
GLYPH has three explicit query tiers. Each tier adds guarantees and costs. Tiers are not interchangeable — choose based on use case.
Tier 1 — Fast count#
Tool: persistent FM backend (query_fm_server_v1)
What it does:
- FM backward search only
- returns count = r - l from FM interval
What it does NOT do:
- no manifest verification
- no corpus integrity check
- no offset recovery
- no locate
Latency (HDFS 1GB, warm): p50: ~0.010 ms p99: ~0.015 ms
When to use:
- repeated exact queries over a trusted static corpus
- when count is sufficient
- when index is known-good and corpus is unchanged
Risk:
- manifest verification depends on the correctness of manifest.json
- locate is not included
- artifact checksum is not embedded inside fm.bin yet
Tier 2 — Verified query#
Tool: tools/query_verified_v1.py
What it does:
- manifest verification before query
- corpus sha256 check
- sentinel value check
- artifact existence check
- FM count query
- fail-fast on any mismatch
What it does NOT do:
- no locate (offset recovery)
- no per-query artifact checksum (manifest check only)
Latency (HDFS 1GB, warm): ~19 ms end-to-end (Python startup + manifest verification + subprocess)
When to use:
- CLI use where integrity matters
- when corpus may have changed between queries
- when artifact provenance must be confirmed before result
Risk:
- manifest check uses sha256 of corpus prefix (64KB)
- not a full corpus hash on every query
- locate not included
Tier 3 — Strict verified (planned)#
Status: not yet implemented.
Intended behavior:
- all Tier 2 checks
- full corpus sha256 (not prefix only)
- artifact checksum inside FM binary header
- verified locate: count == len(offsets), corpus[o:o+len(p)] == p
- explicit NotFound signal distinct from count=0
When to use:
- provenance audit
- forensic / compliance use
- when exact byte offsets must be verified against corpus
Latency: not yet measured. Expected higher than Tier 2.
Tier comparison#
| Property | Tier 1 | Tier 2 | Tier 3 |
|---|---|---|---|
| FM count | ✓ | ✓ | ✓ |
| Manifest check | — | ✓ | ✓ |
| Full corpus hash | — | — | ✓ |
| Artifact checksum | — | — | ✓ |
| Locate + verify | — | — | ✓ |
| Explicit NotFound | — | — | ✓ |
| Latency | ~0.010 ms | ~19 ms | TBD |
Design principle#
Each tier must fail-fast on its own guarantees. No tier silently falls back to a weaker tier. A Tier 2 query that fails manifest check must not proceed to FM count. A Tier 3 query that fails artifact checksum must not proceed to locate.
Relationship to artifact protocol#
Tier 1 → requires valid FM artifact (magic + version) Tier 2 → requires valid manifest.json + corpus Tier 3 → requires Tier 2 + artifact checksum inside fm.bin