|
Vince's CSV Parser
|
Main class for parsing CSVs from files and in-memory sources. More...
#include <csv_reader.hpp>
Classes | |
| class | iterator |
| An input iterator capable of handling large files. More... | |
Public Member Functions | |
| CSVReader (const CSVReader &)=delete | |
| Not copyable. | |
| CSVReader & | operator= (const CSVReader &)=delete |
| Not copyable. | |
| CSVReader (CSVReader &&other) noexcept | |
| Move constructor. | |
| CSVReader & | operator= (CSVReader &&other) noexcept |
| Move assignment. | |
Constructors | |
Constructors for iterating over large files and parsing in-memory sources. | |
| CSVReader (csv::string_view filename, const CSVFormat &format=CSVFormat::guess_csv()) | |
| Construct CSVReader from filename. | |
| template<typename TStream , csv::enable_if_t< std::is_base_of< std::istream, TStream >::value, int > = 0> | |
| CSVReader (TStream &source, CSVFormat format=CSVFormat::guess_csv()) | |
| Construct CSVReader from std::istream. | |
| CSVReader (std::unique_ptr< std::istream > source, const CSVFormat &format=CSVFormat::guess_csv()) | |
| Construct CSVReader from an owned std::istream. | |
Retrieving CSV Rows | |
| bool | read_row (CSVRow &row) |
| Retrieve rows as CSVRow objects, returning true if more rows are available. | |
| iterator | begin () |
| Return an iterator to the first row in the reader. | |
| CSV_CONST iterator | end () const noexcept |
| A placeholder for the imaginary past-the-end row in a CSV. | |
| bool | eof () const noexcept |
| Returns true if we have reached end of file. | |
CSV Metadata | |
| CSVFormat | get_format () const |
| Return the format of the original raw CSV. | |
| std::vector< std::string > | get_col_names () const |
| Return the CSV's column names as a vector of strings. | |
| int | index_of (csv::string_view col_name) const |
| Return the index of the column name if found or csv::CSV_NOT_FOUND otherwise. | |
CSV Metadata: Attributes | |
| CONSTEXPR bool | empty () const noexcept |
| Whether or not the file or stream contains valid CSV rows, not including the header. | |
| CONSTEXPR size_t | n_rows () const noexcept |
| Retrieves the number of rows that have been read so far. | |
| bool | utf8_bom () const noexcept |
| Whether or not CSV was prefixed with a UTF-8 bom. | |
Protected Member Functions | |
| void | set_col_names (const std::vector< std::string > &) |
| Sets this reader's column names and associated data. | |
Multi-Threaded File Reading Functions | |
| bool | read_csv (size_t bytes=internals::CSV_CHUNK_SIZE_DEFAULT) |
| Read a chunk of CSV data. | |
Protected Attributes | |
CSV Settings | |
| CSVFormat | _format |
Parser State | |
| internals::ColNamesPtr | col_names = std::make_shared<internals::ColNames>() |
| Pointer to a object containing column information. | |
| std::unique_ptr< internals::IBasicCSVParser > | parser = nullptr |
| Helper class which actually does the parsing. | |
| std::unique_ptr< RowCollection > | records {new RowCollection(100)} |
| Queue of parsed CSV rows. | |
| std::unique_ptr< std::istream > | owned_stream = nullptr |
| Optional owned stream used by two paths: 1) Emscripten filename-constructor fallback to stream parsing 2) Opt-in ownership constructor taking std::unique_ptr<std::istream> | |
| size_t | n_cols = 0 |
| The number of columns in this CSV. | |
| size_t | _n_rows = 0 |
| How many rows (minus header) have been read so far. | |
Main class for parsing CSVs from files and in-memory sources.
All rows are compared to the column names for length consistency
Streaming semantics: CSVReader is a single-pass streaming reader. Every read operation — read_row(), the iterator interface — pulls rows permanently from the internal queue. Rows consumed by one interface are not visible to another. There is no rewind or seek.
Ownership and sharing: CSVReader is non-copyable and move-enabled. It manages live parsing state (worker thread, internal queue, and optional owned stream), so ownership transfer should be explicit. To share or transfer a reader, wrap it in a std::unique_ptr<CSVReader>:
Definition at line 61 of file csv_reader.hpp.
|
inline |
Construct CSVReader from filename.
Native builds use CODE PATH 1 of 2: MmapParser with mio for maximum performance. Emscripten builds fall back to the stream-based implementation because mmap is unavailable.
During construction, parser installation performs an initial synchronous metadata read so delimiter and header information are resolved before user reads begin.
Definition at line 134 of file csv_reader.hpp.
|
inline |
Construct CSVReader from std::istream.
Uses StreamParser. On native builds this is CODE PATH 2 of 2 and remains independent from the filename-based mmap path. On Emscripten, the filename constructor also funnels through this implementation.
| TStream | An input stream deriving from std::istream |
Definition at line 171 of file csv_reader.hpp.
|
inline |
Construct CSVReader from an owned std::istream.
This is an opt-in safety switch for stream lifetime management. CSVReader takes ownership and guarantees the stream outlives parsing.
Definition at line 180 of file csv_reader.hpp.
|
inlinenoexcept |
Move constructor.
Required so C++11 builds can return CSVReader by value from helpers like csv::parse()/csvparse_unsafe(), where copy elision is not guaranteed.
Any active worker on the source is joined before moving parser state to avoid a thread continuing to run against the source object's address.
Definition at line 201 of file csv_reader.hpp.
|
inline |
Definition at line 255 of file csv_reader.hpp.
| CSVReader::iterator csv::CSVReader::begin | ( | ) |
Return an iterator to the first row in the reader.
Definition at line 9 of file csv_reader_iterator.cpp.
|
inlinenoexcept |
Whether or not the file or stream contains valid CSV rows, not including the header.
Definition at line 284 of file csv_reader.hpp.
|
noexcept |
A placeholder for the imaginary past-the-end row in a CSV.
Attempting to dereference this iterator is undefined.
Definition at line 20 of file csv_reader_iterator.cpp.
|
inlinenoexcept |
Returns true if we have reached end of file.
Definition at line 266 of file csv_reader.hpp.
| std::vector< std::string > csv::CSVReader::get_col_names | ( | ) | const |
Return the CSV's column names as a vector of strings.
Definition at line 38 of file csv_reader.cpp.
| CSVFormat csv::CSVReader::get_format | ( | ) | const |
Return the format of the original raw CSV.
Definition at line 25 of file csv_reader.cpp.
| int csv::CSVReader::index_of | ( | csv::string_view | col_name | ) | const |
Return the index of the column name if found or csv::CSV_NOT_FOUND otherwise.
Definition at line 46 of file csv_reader.cpp.
|
inlinenoexcept |
Retrieves the number of rows that have been read so far.
Definition at line 287 of file csv_reader.hpp.
Move assignment.
Joins active workers on both sides before transferring parser state.
Definition at line 226 of file csv_reader.hpp.
| bool csv::CSVReader::read_row | ( | CSVRow & | row | ) |
Retrieve rows as CSVRow objects, returning true if more rows are available.
Example:
Definition at line 130 of file csv_reader.cpp.
|
inlinenoexcept |
Whether or not CSV was prefixed with a UTF-8 bom.
Definition at line 290 of file csv_reader.hpp.