fastpycsv¶
fastpycsv is a fast Python CSV toolkit backed by Vince’s CSV Parser. It is built
for ETL-style workflows where users need to scan large CSV files, filter rows,
extract a few columns, materialize NumPy arrays, or write cleaned CSV output
without dragging every workflow through a heavyweight DataFrame engine.
The package is named fastpycsv; it does not replace or shadow Python’s standard
library csv module.
For the native C++ API, see the Vince’s CSV Parser documentation.
Main API¶
Most users should start with these functions:
import fastpycsv
# Lazy, list-like row objects over the native parser.
for row in fastpycsv.reader("vehicles.csv"):
if row["region"] == "el paso":
print(row["price"])
# Materialized Python row objects when another API needs them.
reader = fastpycsv.reader("vehicles.csv")
list_rows = reader.lists(["id", "price"]).all()
tuple_rows = fastpycsv.reader("vehicles.csv").tuples(["id", "price"]).all()
dict_rows = fastpycsv.reader("vehicles.csv").dicts(["id", "price"]).all()
# Column-oriented NumPy arrays for pandas/scientific workflows.
arrays = fastpycsv.read_numpy("vehicles.csv", columns=["price", "year", "odometer"])
# Bounded-memory NumPy batches for large files.
for batch in fastpycsv.read_numpy_batches("vehicles.csv", columns=["price", "year"]):
consume(batch)
# Native CSV output from lazy rows or ordinary Python iterables.
fastpycsv.write_csv(
"subset.csv",
(row for row in fastpycsv.reader("vehicles.csv") if row["region"] == "el paso"),
fieldnames=["id", "price", "year", "region"],
)
Why fastpycsv?¶
Fast lazy rows: iterate massive CSV files without eagerly building Python lists or dictionaries for every row.
Pythonic filtering: ordinary Python predicates work directly on lazy rows; native predicates are available for common comparisons.
NumPy export: selected columns can be materialized into NumPy arrays without routing through pandas or pyarrow first.
Streaming output:
write_csv()lets read/filter/project/write pipelines stay bounded-memory.Robust CSV parsing: embedded newlines and quoted fields are handled by the same parser core as the C++ library.
- Quickstart
- API Reference
- Primary API
fastpycsv.reader(csvfile, dialect="excel", **fmtparams)- Lazy Row Objects
- Materialized Row Iterators
fastpycsv.read_numpy(path, columns=None, *, cast=True, predicate=None, **fmtparams)fastpycsv.read_numpy_batches(path, columns=None, *, predicate=None, cast=True, batch_size=50000, schema="sample", **fmtparams)fastpycsv.write_csv(csvfile, rows, **options)- Low-Level Types
- NumPy and pandas
- Type Casting
- Benchmarks
Build These Docs¶
From the repository root:
python -m pip install -r python/docs/requirements.txt
python -m sphinx -b html python/docs docs/html/python
The generated HTML lands in docs/html/python, matching the GitHub Pages
layout used by the documentation workflow.