HDFS 1GB Benchmark — GLYPH v0.1
Dataset:
- HDFS.log truncated to 1GB
- source: Loghub / HDFS dataset
Query set:
- 100 unique
blk_*IDs extracted from the 1GB corpus
Baseline:
time while read -r q; do
grep -F "$q" bench_1gb/HDFS_1GB.log > /dev/null
done < bench_1gb/queries.txt
Result:
- grep -F repeated scan: 11.516 sec
GLYPH index:
- SA32u built
- BWT built
- FM-index built
- persistent backend: query_fm_server_v1
Index artifacts:
- corpus: 1.0GB
- SA32u: 4.0GB
- BWT: 1.0GB
- FM: 8.1GB
Persistent FM benchmark:
- queries: 100
- total_time_sec: 0.001673
- avg_ms_per_query: 0.016727
Interpretation:
GLYPH is not a replacement for grep on one-off ad-hoc scans.
GLYPH is designed for repeated exact queries over pre-indexed static corpora. In that mode, it avoids rescanning the corpus and uses a persistent FM-index backend.