Quickstart

fastpycsv is centered around four operations:

  • reader() for lazy row iteration

  • read_numpy() for eager selected-column NumPy export

  • read_numpy_batches() for bounded-memory NumPy export

  • write_csv() for streaming CSV output

Read Rows

fastpycsv.reader() returns lazy, list-like row objects. By default, the first row is consumed as column names, so ordinary ETL code can use string indexing without building a dictionary for every row.

import fastpycsv

for row in fastpycsv.reader("vehicles.csv"):
    if row["region"] == "el paso":
        print(row["price"])

Rows can be indexed by position or column name:

row = next(fastpycsv.reader("vehicles.csv"))

row[0]
row["price"]
len(row)

Use explicit materialization only when the downstream API needs normal Python objects:

reader = fastpycsv.reader("vehicles.csv")

rows = reader.lists(["id", "price", "year"]).all()

For bounded memory, stream materialized batches:

for rows in fastpycsv.reader("vehicles.csv").dicts(["id", "price"]).chunks(50_000):
    send_to_api(rows)

Export NumPy Arrays

Use read_numpy() when the target is pandas, NumPy, or another column-oriented consumer:

import pandas as pd

arrays = fastpycsv.read_numpy("vehicles.csv", columns=["price", "year", "odometer"])
frame = pd.DataFrame(arrays)

Use read_numpy_batches() when the file is large enough that peak memory matters:

for arrays in fastpycsv.read_numpy_batches(
    "vehicles.csv",
    columns=["price", "year", "odometer"],
    schema="sample",
):
    process(arrays)

Filter With Native Predicates

Python predicates are fine for flexible business logic. For common comparisons, native predicates avoid repeated Python callbacks:

predicate = fastpycsv.all_of(
    fastpycsv.equal("manufacturer", "ford", case_sensitive=False),
    fastpycsv.less("price", 10_000),
)

arrays = fastpycsv.read_numpy(
    "vehicles.csv",
    columns=["region", "price", "year", "odometer"],
    predicate=predicate,
)

Chaining reader.filter(...) combines native predicates with all_of() by default. Use append=False when a later filter should replace the earlier one.

Write CSV Output

write_csv() accepts a path or text file-like object plus lazy rows, dictionaries, lists, tuples, and other Python iterables. Fields are stringified before writing; None becomes an empty CSV field.

reader = fastpycsv.reader("vehicles.csv")

fastpycsv.write_csv(
    "cheap_el_paso_fords.csv",
    (row for row in reader if row["region"] == "el paso" and row["manufacturer"] == "ford"),
    fieldnames=["id", "price", "year", "region"],
)

with open("cheap_el_paso_fords.csv", "w", newline="", encoding="utf-8") as out:
    fastpycsv.write_csv(out, [["id", "price"], [1, 9000]], write_header=False)

Installation And Local Builds

Install from the repository root while developing:

python -m pip install -e E:\GitHub\csv-parser

Or build the native extension directly:

cmake -S . -B build/fastpycsv -DBUILD_PYTHON=ON -DCMAKE_BUILD_TYPE=Release
cmake --build build/fastpycsv --target fastpycsv --config Release

For an existing top-level build, the fastpycsv target bootstraps its own Python build tree:

cmake --build build/x64-Release --target fastpycsv --config Release