# Benchmarks The Python benchmark helpers live under `python/benchmarks/`. Compare reader throughput: ```powershell python python/benchmarks/compare_readers.py path/to/input.csv ``` Compare the streaming EDA example against pandas with the pyarrow CSV engine: ```powershell python python/benchmarks/compare_eda.py path/to/input.csv ``` Compare a streaming `fastpycsv` filter against pyarrow on a Used Cars-style `region == "el paso"` workload: ```powershell python python/benchmarks/compare_filter.py path/to/vehicles.csv ``` Compare selected-column CSV-to-NumPy materialization against pyarrow: ```powershell python python/benchmarks/compare_numpy_materialization.py path/to/vehicles.csv ``` Compare Python object materialization against pyarrow and Polars: ```powershell python python/benchmarks/compare_python_materialization.py path/to/vehicles.csv ``` This compares full CSV and first+last-column subset materialization for row-oriented `list[dict]` outputs and column-oriented `dict[str, list]` outputs. The NumPy materialization benchmark includes pyarrow's direct CSV reader, pyarrow's Dataset/Scanner path, pyarrow's streaming CSV reader, and Polars lazy scanning. The Dataset row is included because it is pyarrow's closest API for parallel predicate/projection work, but it is also more configuration-sensitive than `pyarrow.csv.read_csv()`: without explicit column types, pyarrow Dataset may infer a narrow integer type from early rows and later fail on wider values. The benchmark pins obvious numeric columns such as `price`, `year`, and `odometer` to avoid measuring that crash path instead of CSV/filter throughput. The helpers run workloads in fresh Python processes and report wall time, throughput, and peak resident or working-set memory. Use the same Python version that built `fastpycsv`; for example, a `cp310` extension will not import under Python 3.14.