|
Vince's CSV Parser
|
Main class for parsing CSVs from files and in-memory sources. More...
#include <csv_reader.hpp>
Classes | |
| class | iterator |
| An input iterator capable of handling large files. More... | |
Public Member Functions | |
| CSVReader (const CSVReader &)=delete | |
| Not copyable. | |
| CSVReader & | operator= (const CSVReader &)=delete |
| Not copyable. | |
| CSVReader (CSVReader &&other) noexcept | |
| Move constructor. | |
| CSVReader & | operator= (CSVReader &&other) noexcept |
| Move assignment. | |
Constructors | |
Constructors for iterating over large files and parsing in-memory sources. | |
| CSVReader (csv::string_view filename, const CSVFormat &format=CSVFormat::guess_csv()) | |
| Construct CSVReader from filename. | |
| template<typename TStream , csv::enable_if_t< std::is_base_of< std::istream, TStream >::value, int > = 0> | |
| CSVReader (TStream &source, CSVFormat format=CSVFormat::guess_csv()) | |
| Construct CSVReader from std::istream. | |
| CSVReader (std::unique_ptr< std::istream > source, const CSVFormat &format=CSVFormat::guess_csv()) | |
| Construct CSVReader from an owned std::istream. | |
Retrieving CSV Rows | |
| bool | read_row (CSVRow &row) |
Retrieve the next CSV row, returning true while more rows are available. | |
| bool | read_chunk (std::vector< CSVRow > &out, size_t max_rows) |
Read up to max_rows rows into a caller-owned batch buffer. | |
| iterator | begin () |
| Return an iterator to the first row in the reader. | |
| CSV_CONST iterator | end () const noexcept |
| A placeholder for the imaginary past-the-end row in a CSV. | |
| bool | eof () const noexcept |
| Returns true if we have reached end of file. | |
CSV Metadata | |
| CSVFormat | get_format () const |
| Return the resolved parsing format for this CSV source. | |
| const std::vector< std::string > & | get_col_names () const |
| Return the active column names in CSV order. | |
| internals::ConstColNamesPtr | col_names_ptr () const noexcept |
| Internal accessor for preserving resolved column-name lookup policy across helper types. | |
| int | index_of (csv::string_view col_name) const |
Return the index of col_name, or csv::CSV_NOT_FOUND if absent. | |
CSV Metadata: Attributes | |
| CONSTEXPR bool | empty () const noexcept |
| Whether or not the file or stream contains valid CSV rows, not including the header. | |
| CONSTEXPR size_t | n_rows () const noexcept |
| Retrieves the number of rows that have been read so far. | |
| bool | utf8_bom () const noexcept |
| Whether or not CSV was prefixed with a UTF-8 bom. | |
| internals::SpeculativeParseDiagnostics | speculative_diagnostics () const noexcept |
| Return speculative-parsing counters for filename-backed readers. | |
| size_t | parse_worker_count () const noexcept |
| Return the number of parser worker threads used by the active parser. | |
Protected Member Functions | |
| void | set_col_names (const std::vector< std::string > &) |
| Sets this reader's column names and associated data. | |
Worker Reading Functions | |
Functions that actively drive parser execution and produce rows. | |
| bool | read_csv (size_t bytes=internals::CSV_CHUNK_SIZE_DEFAULT) |
| Read a chunk of CSV data. | |
Protected Attributes | |
CSV Settings | |
| CSVFormat | _format |
Parser State | |
| internals::ColNamesPtr | col_names = std::make_shared<internals::ColNames>() |
| Pointer to a object containing column information. | |
| std::unique_ptr< internals::parser::CSVParserDriverBase > | parser = nullptr |
| Helper class which actually does the parsing. | |
| std::unique_ptr< RowCollection > | records {new RowCollection(100)} |
| Queue of parsed CSV rows. | |
| std::unique_ptr< std::istream > | owned_stream = nullptr |
| Optional owned stream used by two paths: 1) Emscripten filename-constructor fallback to stream parsing 2) Opt-in ownership constructor taking std::unique_ptr<std::istream> | |
| size_t | n_cols = 0 |
| The number of columns in this CSV. | |
| size_t | _n_rows = 0 |
| How many rows (minus header) have been read so far. | |
Main class for parsing CSVs from files and in-memory sources.
All rows are compared to the column names for length consistency
Streaming semantics: CSVReader is a single-pass streaming reader. Every read operation — read_row(), the iterator interface — pulls rows permanently from the internal queue. Rows consumed by one interface are not visible to another. There is no rewind or seek.
Ownership and sharing: CSVReader is non-copyable and move-enabled. It manages live parsing state (worker thread, internal queue, and optional owned stream), so ownership transfer should be explicit. To share or transfer a reader, wrap it in a std::unique_ptr<CSVReader>:
Definition at line 49 of file csv_reader.hpp.
|
inline |
Construct CSVReader from filename.
Native builds use CODE PATH 1 of 2: MmapParser with mio for maximum performance. Emscripten builds fall back to the stream-based implementation because mmap is unavailable.
During construction, parser installation performs an initial synchronous metadata read so delimiter and header information are resolved before user reads begin.
Definition at line 122 of file csv_reader.hpp.
|
inline |
Construct CSVReader from std::istream.
Uses StreamParser. On native builds this is CODE PATH 2 of 2 and remains independent from the filename-based mmap path. On Emscripten, the filename constructor also funnels through this implementation.
| TStream | An input stream deriving from std::istream |
Definition at line 156 of file csv_reader.hpp.
|
inline |
Construct CSVReader from an owned std::istream.
This is an opt-in safety switch for stream lifetime management. CSVReader takes ownership and guarantees the stream outlives parsing.
Definition at line 167 of file csv_reader.hpp.
|
inlinenoexcept |
Move constructor.
Required so C++11 builds can return CSVReader by value from helpers like csv::parse()/csvparse_unsafe(), where copy elision is not guaranteed.
Any active read scheduler on the source is joined before moving parser state to avoid a thread continuing to run against the source object's address.
Definition at line 191 of file csv_reader.hpp.
|
inline |
Definition at line 213 of file csv_reader.hpp.
| CSVReader::iterator csv::CSVReader::begin | ( | ) |
Return an iterator to the first row in the reader.
Definition at line 9 of file csv_reader_iterator.cpp.
|
inlinenoexcept |
Internal accessor for preserving resolved column-name lookup policy across helper types.
Definition at line 284 of file csv_reader.hpp.
|
inlinenoexcept |
Whether or not the file or stream contains valid CSV rows, not including the header.
Definition at line 302 of file csv_reader.hpp.
|
noexcept |
A placeholder for the imaginary past-the-end row in a CSV.
Attempting to dereference this iterator is undefined.
Definition at line 20 of file csv_reader_iterator.cpp.
|
inlinenoexcept |
Returns true if we have reached end of file.
Definition at line 265 of file csv_reader.hpp.
|
inline |
Return the active column names in CSV order.
Definition at line 278 of file csv_reader.hpp.
| CSVFormat csv::CSVReader::get_format | ( | ) | const |
Return the resolved parsing format for this CSV source.
The returned format reflects delimiter/header inference and the active column names after construction.
Definition at line 62 of file csv_reader.cpp.
|
inline |
Return the index of col_name, or csv::CSV_NOT_FOUND if absent.
Definition at line 289 of file csv_reader.hpp.
|
inlinenoexcept |
Retrieves the number of rows that have been read so far.
Definition at line 305 of file csv_reader.hpp.
Move assignment.
Joins active workers on both sides before transferring parser state.
Definition at line 201 of file csv_reader.hpp.
|
inlinenoexcept |
Return the number of parser worker threads used by the active parser.
Definition at line 317 of file csv_reader.hpp.
| bool csv::CSVReader::read_chunk | ( | std::vector< CSVRow > & | out, |
| size_t | max_rows | ||
| ) |
Read up to max_rows rows into a caller-owned batch buffer.
This is the easiest way to process a CSV in bounded batches without materializing the entire file. Each call clears out, then appends up to max_rows newly parsed rows in stream order.
true if this call produced any rows. Returns false only after end-of-stream is reached and no rows were produced. A final partial chunk still returns true.| [out] | out | Destination batch buffer. Existing contents are discarded. |
| [in] | max_rows | Maximum number of rows to place into out. |
std::vector<CSVRow> is intentionally the only supported container: it matches the public batch-consumption pattern better than the internal deque-based producer/consumer queue.Example:
Definition at line 211 of file csv_reader.cpp.
| bool csv::CSVReader::read_row | ( | CSVRow & | row | ) |
Retrieve the next CSV row, returning true while more rows are available.
This is the lowest-level row-consumption API on CSVReader. Each successful call overwrites row with the next parsed record in stream order.
true if a row was produced, false after end-of-stream is reached.csv::internals::CSV_CHUNK_SIZE_DEFAULT bytes large at a time by default.CSVRow and CSVField.Example:
Definition at line 199 of file csv_reader.cpp.
|
inlinenoexcept |
Return speculative-parsing counters for filename-backed readers.
Definition at line 311 of file csv_reader.hpp.
|
inlinenoexcept |
Whether or not CSV was prefixed with a UTF-8 bom.
Definition at line 308 of file csv_reader.hpp.