SA Migration Status

Status: active migration bridge Date: 2026-05-17

Current State#

Legacy pipeline:

raw corpus
    -> raw uint32 sa.bin
    -> build_bwt
    -> build_fm

Current production compatibility:

  • build_bwt expects raw uint32 SA
  • existing pipeline remains unchanged
  • no runtime migration yet

Completed#

SA Container Specification#

File:

docs/specs/SA_CONTAINER_V1.md

Defined:

  • magic
  • version
  • entry width
  • corpus size
  • endian flag
  • reserved flags
  • payload layout

SA Container Writer#

File:

tools/write_sa_container_v1.py

Capabilities:

  • wraps raw SA into GLYPHSA1 container
  • validates:
    • empty file
    • entry width
    • divisibility
    • corpus/entry mismatch

SA Container Reader#

File:

tools/read_sa_container_v1.py

Capabilities:

  • validates container header
  • validates:
    • magic
    • version
    • entry_width
    • file_size
  • prints header metadata
  • fail-fast on corruption

Regression Coverage#

Green tests:

  • FM correctness tests
  • locate tests
  • manifest integrity tests
  • verified query tests
  • SA container writer tests
  • SA container reader tests

Total:

38 tests green

Architectural Boundary Reached#

Before:

SA = anonymous binary blob

Now:

SA = versioned artifact contract

This enables future compatibility layers.

Next Planned Step#

Container-aware build_bwt:

  • accept raw SA32
  • OR GLYPHSA1 container

without breaking existing indexes.

Future Path#

raw SA32 -> GLYPHSA1 entry_width=4 -> container-aware readers -> GLYPHSA1 entry_width=8 -> SA64 migration