GLYPH Query Tiers

Overview#

GLYPH has three explicit query tiers. Each tier adds guarantees and costs. Tiers are not interchangeable — choose based on use case.


Tier 1 — Fast count#

Tool: persistent FM backend (query_fm_server_v1)

What it does:

  • FM backward search only
  • returns count = r - l from FM interval

What it does NOT do:

  • no manifest verification
  • no corpus integrity check
  • no offset recovery
  • no locate

Latency (HDFS 1GB, warm): p50: ~0.010 ms p99: ~0.015 ms

When to use:

  • repeated exact queries over a trusted static corpus
  • when count is sufficient
  • when index is known-good and corpus is unchanged

Risk:

  • manifest verification depends on the correctness of manifest.json
  • locate is not included
  • artifact checksum is not embedded inside fm.bin yet

Tier 2 — Verified query#

Tool: tools/query_verified_v1.py

What it does:

  • manifest verification before query
  • corpus sha256 check
  • sentinel value check
  • artifact existence check
  • FM count query
  • fail-fast on any mismatch

What it does NOT do:

  • no locate (offset recovery)
  • no per-query artifact checksum (manifest check only)

Latency (HDFS 1GB, warm): ~19 ms end-to-end (Python startup + manifest verification + subprocess)

When to use:

  • CLI use where integrity matters
  • when corpus may have changed between queries
  • when artifact provenance must be confirmed before result

Risk:

  • manifest check uses sha256 of corpus prefix (64KB)
  • not a full corpus hash on every query
  • locate not included

Tier 3 — Strict verified (planned)#

Status: not yet implemented.

Intended behavior:

  • all Tier 2 checks
  • full corpus sha256 (not prefix only)
  • artifact checksum inside FM binary header
  • verified locate: count == len(offsets), corpus[o:o+len(p)] == p
  • explicit NotFound signal distinct from count=0

When to use:

  • provenance audit
  • forensic / compliance use
  • when exact byte offsets must be verified against corpus

Latency: not yet measured. Expected higher than Tier 2.


Tier comparison#

Property Tier 1 Tier 2 Tier 3
FM count
Manifest check
Full corpus hash
Artifact checksum
Locate + verify
Explicit NotFound
Latency ~0.010 ms ~19 ms TBD

Design principle#

Each tier must fail-fast on its own guarantees. No tier silently falls back to a weaker tier. A Tier 2 query that fails manifest check must not proceed to FM count. A Tier 3 query that fails artifact checksum must not proceed to locate.


Relationship to artifact protocol#

Tier 1 → requires valid FM artifact (magic + version) Tier 2 → requires valid manifest.json + corpus Tier 3 → requires Tier 2 + artifact checksum inside fm.bin