HDFS 1GB Persistent FM Benchmark — controlled cold-cache runs

Benchmark#

Tool:

benchmarks/persistent_fm_v1.py

Mode:

persistent_cpp_backend

Corpus:

bench_1gb/HDFS_1GB.log

Artifacts:

bench_1gb/out/hdfs_1gb.fm.bin
bench_1gb/out/hdfs_1gb.bwt.bin

Query set:

bench_1gb/queries.txt

Query count:

100

Warm runs per cold-cache run:

3

Cache reset before each run:

sync
echo 3 | sudo tee /proc/sys/vm/drop_caches

Run 1#

Backend startup / index load:

7333.136344 ms

Cold query batch:

min:   0.012651 ms
p50:   0.013659 ms
p95:   0.018244 ms
p99:   0.022719 ms
max:   0.052441 ms
mean:  0.014355 ms

Warm query batch:

min:   0.009328 ms
p50:   0.012513 ms
p95:   0.012763 ms
p99:   0.014712 ms
max:   0.047950 ms
mean:  0.012111 ms

Run 2#

Backend startup / index load:

7374.850554 ms

Cold query batch:

min:   0.012714 ms
p50:   0.015465 ms
p95:   0.019207 ms
p99:   0.023550 ms
max:   0.049943 ms
mean:  0.015518 ms

Warm query batch:

min:   0.009647 ms
p50:   0.012375 ms
p95:   0.012612 ms
p99:   0.012736 ms
max:   0.015037 ms
mean:  0.012031 ms

Interpretation#

Controlled cold-cache runs show that the dominant cold cost is backend startup / index load, not individual FM query execution.

After the persistent backend is started, query latency remains stable.

Observed startup/load range:

~7.33–7.37 s

Observed warm p50 range after cache reset:

~0.0124–0.0125 ms

Observed warm p99 range after cache reset:

~0.0127–0.0147 ms

This measurement does not represent reboot-level cold storage behavior. It represents Linux page-cache drop followed by persistent backend startup.


Machine#

See:

benchmarks/MACHINE_SPEC.md