GLYPH Benchmark Machine Specification

All benchmark results in this repository were produced on the following machine.

Numbers from different hardware are not directly comparable.

Hardware#

Component	Value
CPU	AMD EPYC 4344P 8-Core Processor
Logical CPUs	16
Physical cores	8
Threads per core	2
Sockets	1
RAM	125 GiB
Storage	2 × NVMe disks, 894.3 GiB each
Storage rotation	non-rotational (`ROTA=0`)
NIC	not relevant for local benchmarks

Software#

Component	Value
OS / kernel	Linux ace-core 5.15.0-163-generic x86_64
Kernel build	#173-Ubuntu SMP Tue Oct 14 17:51:00 UTC 2025
Python	Python 3.10.12
GCC	gcc 11.4.0
CMake	cmake 3.22.1
libsais	vendored (`third_party/libsais`)

Memory state at measurement time#

Metric	Value
Total RAM	125 GiB
Used RAM	39 GiB
Free RAM	1.5 GiB
Buff/cache	84 GiB
Available RAM	84 GiB
Swap	1.0 GiB total / 1.0 GiB used

Interpretation:

The system had a large active page cache during benchmark work.

Warm-query results should be interpreted as warm-cache measurements.

Page cache state#

Benchmark type	Cache state
cold query	first measured query in current benchmark process
warm query	index accessed at least once before measurement

Current benchmark scripts do not yet perform OS-level cache dropping.

Linux cache-drop procedure for future controlled cold measurements:

sync && echo 3 | sudo tee /proc/sys/vm/drop_caches
This requires root access and should be documented whenever used.
 
⸻
 
Why this matters
 
Benchmark numbers without a machine specification are not reproducible.
 
Common failure modes:
 
* comparing warm RAM numbers from a high-RAM server against a laptop
* comparing page-cache-warm measurements against cold storage measurements
* mixing operational wrapper latency with raw persistent backend latency
* omitting CPU, RAM, kernel, and Python version
 
GLYPH benchmark numbers must be interpreted together with this document.
 
## Cache residency note
 
HDFS 1GB FM artifact:
 
    bench_1gb/out/hdfs_1gb.fm.bin
    size: 8.1 GiB
 
CPU L3 cache:
 
    32 MiB
 
Conclusion:
 
    The full FM artifact is not L3-cache-resident.
 
Therefore stable warm-query p50/p95/p99 behavior should not be explained
as full-index cache residency.
 
More likely contributors:
 
- OS page cache residency
- mmap-backed access
- small number of touched pages per query
- simple persistent backend path
- low orchestration overhead in single-backend mode
 
## Perf observation: persistent FM backend
 
Command:
 
    perf stat -e page-faults,cache-misses,cache-references \
      python3 benchmarks/persistent_fm_v1.py \
      --fm bench_1gb/out/hdfs_1gb.fm.bin \
      --bwt bench_1gb/out/hdfs_1gb.bwt.bin \
      --queries-file bench_1gb/queries.txt \
      --warm-runs 3
 
Result:
 
    startup_ms:       ~3704 ms
    warm p50:         ~0.014 ms
    warm p99:         ~0.015 ms
    page-faults:      ~2.36M
    cache-misses:     ~52M
    cache-references: ~1.67B
 
Interpretation:
 
    perf stat covers the full benchmark process, including backend startup,
    mmap loading, and warm queries.
 
    Therefore page-faults mostly describe startup/load behavior, not
    individual warm-query behavior.
 
    Warm query latency remains stable after startup.
 
Current model:
 
    The full FM artifact is not L3-cache-resident.
    Stable warm behavior is likely caused by OS page cache residency,
    mmap-backed access, small touched working set per query, and a simple
    persistent backend path.