HDFS 1GB Persistent FM Benchmark — p50/p95/p99

Benchmark#

Tool:

benchmarks/persistent_fm_v1.py

Mode:

persistent_cpp_backend

Corpus:

bench_1gb/HDFS_1GB.log

Artifacts:

bench_1gb/out/hdfs_1gb.fm.bin
bench_1gb/out/hdfs_1gb.bwt.bin

Query set:

bench_1gb/queries.txt

Query count:

100

Warm runs:

10

Warm measurements:

1000

Result#

Backend startup / index load:

7576.852983 ms

Cold query batch:

min:   0.012595 ms
p50:   0.013735 ms
p95:   0.017700 ms
p99:   0.021775 ms
max:   0.048982 ms
mean:  0.014639 ms

Warm query batch:

min:   0.008186 ms
p50:   0.009578 ms
p95:   0.010399 ms
p99:   0.010468 ms
max:   0.013484 ms
mean:  0.009799 ms

Sample response:

715386381 715386399 18

Interpretation#

This benchmark measures persistent C++ FM backend latency with the index already loaded into a long-lived process.

It does not include:

  • Python startup per query
  • manifest verification
  • HTTP overhead
  • OS-level cold-cache drop
  • reboot-level cold mmap behavior

The result shows a stable warm latency envelope on the benchmark machine:

warm p99 / p50 ≈ 1.09

This is substantially more informative than a single average latency number.


Machine#

See:

benchmarks/MACHINE_SPEC.md