# NumPy and pandas

`fastpycsv` has a native column export path for workflows that need arrays rather
than row objects. This is the right API when the next step is pandas, NumPy,
scientific code, or a model input pipeline.

Use `fastpycsv.read_numpy(path, columns=None, *, cast=True, predicate=None)` for
eager column arrays:

```python
import fastpycsv
import pandas as pd

arrays = fastpycsv.read_numpy("data.csv")
frame = pd.DataFrame(arrays)
```

`read_numpy()` returns a dictionary keyed by column name.

Column behavior:

- String columns use NumPy 2.x `StringDType`.
- Non-null integer, float, and boolean columns use dense NumPy arrays.
- Nullable numeric and boolean columns widen to `float64` with `NaN`.
- Object arrays are intentionally avoided.

Selected-column reads keep the Python handoff smaller:

```python
arrays = fastpycsv.read_numpy("vehicles.csv", columns=["price", "year", "odometer"])
```

Native predicates can filter before arrays are materialized:

```python
predicate = fastpycsv.all_of(
    fastpycsv.equal("region", "el paso", case_sensitive=False),
    fastpycsv.less("price", 10_000),
)

arrays = fastpycsv.read_numpy(
    "vehicles.csv",
    columns=["price", "year", "odometer"],
    predicate=predicate,
)
```

Use `fastpycsv.read_numpy_batches()` for streaming dictionaries of NumPy arrays:

```python
for arrays in fastpycsv.read_numpy_batches(
    "vehicles.csv",
    columns=["price", "year"],
    schema="sample",
):
    consume(arrays)
```

Batch schema modes trade dtype stability against streaming cost:

- `schema="sample"` is the default. It infers from the first bounded batch and
  then streams once with that schema.
- `schema="global"` pre-scans the file to keep inferred dtypes stable across
  all batches, matching `read_numpy()` behavior.
- `schema="batch"` infers each emitted batch independently for true one-pass
  bounded-memory streaming.

`cast=False` returns string-only batches and skips schema inference. Explicit
`dtypes={column: dtype}` overrides are a planned follow-up.

The native export path batches rows through `DataFrame` column views and
`DataFrameExecutor`. Remaining fixed costs are usually NumPy string-array
construction for string-heavy data and pandas materialization after the arrays
have been built.