HDFS Benchmark — GLYPH v0.1

Dataset:

  • HDFS.log
  • size: 1.5GB
  • source: Loghub / HDFS dataset

Query set:

  • 100 unique blk_* IDs extracted from the dataset

Baseline:

time while read -r q; do
  grep -F "$q" HDFS.log > /dev/null
done < hdfs_queries.txt

Result:

  • grep -F loop: 14.385 sec

GLYPH index:

  • SA32u built
  • BWT built
  • FM-index built
  • persistent backend: query_fm_server_v1

Persistent FM benchmark:

  • queries: 100
  • total_time_sec: 0.001526
  • avg_ms_per_query: 0.015261

Interpretation:

GLYPH is not faster than grep for a single ad-hoc scan.

GLYPH is designed for repeated exact queries over pre-indexed static data. In that mode, it avoids rescanning the corpus and uses the FM-index backend directly.