Vince's CSV Parser
Loading...
Searching...
No Matches
csv::CSVReader Class Reference

Main class for parsing CSVs from files and in-memory sources. More...

#include <csv_reader.hpp>

Classes

class  iterator
 An input iterator capable of handling large files. More...
 

Public Member Functions

 CSVReader (const CSVReader &)=delete
 Not copyable.
 
CSVReaderoperator= (const CSVReader &)=delete
 Not copyable.
 
 CSVReader (CSVReader &&other) noexcept
 Move constructor.
 
CSVReaderoperator= (CSVReader &&other) noexcept
 Move assignment.
 
Constructors

Constructors for iterating over large files and parsing in-memory sources.

 CSVReader (csv::string_view filename, const CSVFormat &format=CSVFormat::guess_csv())
 Construct CSVReader from filename.
 
template<typename TStream , csv::enable_if_t< std::is_base_of< std::istream, TStream >::value, int > = 0>
 CSVReader (TStream &source, CSVFormat format=CSVFormat::guess_csv())
 Construct CSVReader from std::istream.
 
 CSVReader (std::unique_ptr< std::istream > source, const CSVFormat &format=CSVFormat::guess_csv())
 Construct CSVReader from an owned std::istream.
 
Retrieving CSV Rows
bool read_row (CSVRow &row)
 Retrieve the next CSV row, returning true while more rows are available.
 
bool read_chunk (std::vector< CSVRow > &out, size_t max_rows)
 Read up to max_rows rows into a caller-owned batch buffer.
 
iterator begin ()
 Return an iterator to the first row in the reader.
 
CSV_CONST iterator end () const noexcept
 A placeholder for the imaginary past-the-end row in a CSV.
 
bool eof () const noexcept
 Returns true if we have reached end of file.
 
CSV Metadata
CSVFormat get_format () const
 Return the resolved parsing format for this CSV source.
 
const std::vector< std::string > & get_col_names () const
 Return the active column names in CSV order.
 
internals::ConstColNamesPtr col_names_ptr () const noexcept
 Internal accessor for preserving resolved column-name lookup policy across helper types.
 
int index_of (csv::string_view col_name) const
 Return the index of col_name, or csv::CSV_NOT_FOUND if absent.
 
CSV Metadata: Attributes
CONSTEXPR bool empty () const noexcept
 Whether or not the file or stream contains valid CSV rows, not including the header.
 
CONSTEXPR size_t n_rows () const noexcept
 Retrieves the number of rows that have been read so far.
 
bool utf8_bom () const noexcept
 Whether or not CSV was prefixed with a UTF-8 bom.
 
internals::SpeculativeParseDiagnostics speculative_diagnostics () const noexcept
 Return speculative-parsing counters for filename-backed readers.
 
size_t parse_worker_count () const noexcept
 Return the number of parser worker threads used by the active parser.
 

Protected Member Functions

void set_col_names (const std::vector< std::string > &)
 Sets this reader's column names and associated data.
 
Worker Reading Functions

Functions that actively drive parser execution and produce rows.

bool read_csv (size_t bytes=internals::CSV_CHUNK_SIZE_DEFAULT)
 Read a chunk of CSV data.
 

Protected Attributes

CSV Settings
CSVFormat _format
 
Parser State
internals::ColNamesPtr col_names = std::make_shared<internals::ColNames>()
 Pointer to a object containing column information.
 
std::unique_ptr< internals::parser::CSVParserDriverBase > parser = nullptr
 Helper class which actually does the parsing.
 
std::unique_ptr< RowCollection > records {new RowCollection(100)}
 Queue of parsed CSV rows.
 
std::unique_ptr< std::istream > owned_stream = nullptr
 Optional owned stream used by two paths: 1) Emscripten filename-constructor fallback to stream parsing 2) Opt-in ownership constructor taking std::unique_ptr<std::istream>
 
size_t n_cols = 0
 The number of columns in this CSV.
 
size_t _n_rows = 0
 How many rows (minus header) have been read so far.
 

Detailed Description

Main class for parsing CSVs from files and in-memory sources.

All rows are compared to the column names for length consistency

  • By default, rows that are too short or too long are dropped
  • Custom behavior can be defined by overriding bad_row_handler in a subclass

Streaming semantics: CSVReader is a single-pass streaming reader. Every read operation — read_row(), the iterator interface — pulls rows permanently from the internal queue. Rows consumed by one interface are not visible to another. There is no rewind or seek.

Ownership and sharing: CSVReader is non-copyable and move-enabled. It manages live parsing state (worker thread, internal queue, and optional owned stream), so ownership transfer should be explicit. To share or transfer a reader, wrap it in a std::unique_ptr<CSVReader>:

auto reader = std::make_unique<csv::CSVReader>("data.csv");
process(std::move(reader)); // transfer ownership

Definition at line 49 of file csv_reader.hpp.

Constructor & Destructor Documentation

◆ CSVReader() [1/4]

csv::CSVReader::CSVReader ( csv::string_view  filename,
const CSVFormat format = CSVFormat::guess_csv() 
)
inline

Construct CSVReader from filename.

Native builds use CODE PATH 1 of 2: MmapParser with mio for maximum performance. Emscripten builds fall back to the stream-based implementation because mmap is unavailable.

During construction, parser installation performs an initial synchronous metadata read so delimiter and header information are resolved before user reads begin.

Note
On native builds, bugs can exist in this path independently of the stream path.
When writing tests that validate I/O behavior, test both filename and stream constructors.
See also
StreamParser for the stream-based alternative.

Definition at line 122 of file csv_reader.hpp.

◆ CSVReader() [2/4]

template<typename TStream , csv::enable_if_t< std::is_base_of< std::istream, TStream >::value, int > = 0>
csv::CSVReader::CSVReader ( TStream &  source,
CSVFormat  format = CSVFormat::guess_csv() 
)
inline

Construct CSVReader from std::istream.

Uses StreamParser. On native builds this is CODE PATH 2 of 2 and remains independent from the filename-based mmap path. On Emscripten, the filename constructor also funnels through this implementation.

Template Parameters
TStreamAn input stream deriving from std::istream
Note
Delimiter/header guessing is still available by default via CSVFormat::guess_csv(). For deterministic parsing of known dialects, pass an explicit CSVFormat.
On native builds, tests that validate I/O behavior should cover both constructors
See also
MmapParser for the memory-mapped alternative

Definition at line 156 of file csv_reader.hpp.

◆ CSVReader() [3/4]

csv::CSVReader::CSVReader ( std::unique_ptr< std::istream >  source,
const CSVFormat format = CSVFormat::guess_csv() 
)
inline

Construct CSVReader from an owned std::istream.

This is an opt-in safety switch for stream lifetime management. CSVReader takes ownership and guarantees the stream outlives parsing.

Definition at line 167 of file csv_reader.hpp.

◆ CSVReader() [4/4]

csv::CSVReader::CSVReader ( CSVReader &&  other)
inlinenoexcept

Move constructor.

Required so C++11 builds can return CSVReader by value from helpers like csv::parse()/csvparse_unsafe(), where copy elision is not guaranteed.

Any active read scheduler on the source is joined before moving parser state to avoid a thread continuing to run against the source object's address.

Definition at line 191 of file csv_reader.hpp.

◆ ~CSVReader()

csv::CSVReader::~CSVReader ( )
inline

Definition at line 213 of file csv_reader.hpp.

Member Function Documentation

◆ begin()

CSVReader::iterator csv::CSVReader::begin ( )

Return an iterator to the first row in the reader.

Definition at line 9 of file csv_reader_iterator.cpp.

◆ col_names_ptr()

internals::ConstColNamesPtr csv::CSVReader::col_names_ptr ( ) const
inlinenoexcept

Internal accessor for preserving resolved column-name lookup policy across helper types.

Definition at line 284 of file csv_reader.hpp.

◆ empty()

CONSTEXPR bool csv::CSVReader::empty ( ) const
inlinenoexcept

Whether or not the file or stream contains valid CSV rows, not including the header.

Note
Gives an accurate answer regardless of when it is called.

Definition at line 302 of file csv_reader.hpp.

◆ end()

CSV_CONST CSVReader::iterator csv::CSVReader::end ( ) const
noexcept

A placeholder for the imaginary past-the-end row in a CSV.

Attempting to dereference this iterator is undefined.

Definition at line 20 of file csv_reader_iterator.cpp.

◆ eof()

bool csv::CSVReader::eof ( ) const
inlinenoexcept

Returns true if we have reached end of file.

Definition at line 265 of file csv_reader.hpp.

◆ get_col_names()

const std::vector< std::string > & csv::CSVReader::get_col_names ( ) const
inline

Return the active column names in CSV order.

Definition at line 278 of file csv_reader.hpp.

◆ get_format()

CSVFormat csv::CSVReader::get_format ( ) const

Return the resolved parsing format for this CSV source.

The returned format reflects delimiter/header inference and the active column names after construction.

Definition at line 62 of file csv_reader.cpp.

◆ index_of()

int csv::CSVReader::index_of ( csv::string_view  col_name) const
inline

Return the index of col_name, or csv::CSV_NOT_FOUND if absent.

Definition at line 289 of file csv_reader.hpp.

◆ n_rows()

CONSTEXPR size_t csv::CSVReader::n_rows ( ) const
inlinenoexcept

Retrieves the number of rows that have been read so far.

Definition at line 305 of file csv_reader.hpp.

◆ operator=()

CSVReader & csv::CSVReader::operator= ( CSVReader &&  other)
inlinenoexcept

Move assignment.

Joins active workers on both sides before transferring parser state.

Definition at line 201 of file csv_reader.hpp.

◆ parse_worker_count()

size_t csv::CSVReader::parse_worker_count ( ) const
inlinenoexcept

Return the number of parser worker threads used by the active parser.

Definition at line 317 of file csv_reader.hpp.

◆ read_chunk()

bool csv::CSVReader::read_chunk ( std::vector< CSVRow > &  out,
size_t  max_rows 
)

Read up to max_rows rows into a caller-owned batch buffer.

This is the easiest way to process a CSV in bounded batches without materializing the entire file. Each call clears out, then appends up to max_rows newly parsed rows in stream order.

Returns
true if this call produced any rows. Returns false only after end-of-stream is reached and no rows were produced. A final partial chunk still returns true.
Parameters
[out]outDestination batch buffer. Existing contents are discarded.
[in]max_rowsMaximum number of rows to place into out.
Note
Like read_row(), this permanently consumes rows from the stream.
std::vector<CSVRow> is intentionally the only supported container: it matches the public batch-consumption pattern better than the internal deque-based producer/consumer queue.

Example:

std::vector<CSVRow> chunk;
REQUIRE(reader.read_chunk(chunk, 2));
REQUIRE(chunk.size() == 2);
REQUIRE(chunk[0]["id"].get<std::string>() == "1");
REQUIRE(chunk[0]["name"].get<std::string>() == "Alice");
REQUIRE(chunk[0]["value"].get<std::string>() == "10");
REQUIRE(chunk[1]["id"].get<std::string>() == "2");
REQUIRE(chunk[1]["name"].get<std::string>() == "Bob");
REQUIRE(chunk[1]["value"].get<std::string>() == "20");
REQUIRE(reader.read_chunk(chunk, 2));
REQUIRE(chunk.size() == 2);
REQUIRE(chunk[0]["id"].get<std::string>() == "3");
REQUIRE(chunk[0]["name"].get<std::string>() == "Carol");
REQUIRE(chunk[0]["value"].get<std::string>() == "30");
REQUIRE(chunk[1]["id"].get<std::string>() == "4");
REQUIRE(chunk[1]["name"].get<std::string>() == "Dave");
REQUIRE(chunk[1]["value"].get<std::string>() == "40");
REQUIRE(reader.read_chunk(chunk, 2));
REQUIRE(chunk.size() == 1);
REQUIRE(chunk[0]["id"].get<std::string>() == "5");
REQUIRE(chunk[0]["name"].get<std::string>() == "Eve");
REQUIRE(chunk[0]["value"].get<std::string>() == "50");
REQUIRE_FALSE(reader.read_chunk(chunk, 2));
REQUIRE(chunk.empty());
REQUIRE_FALSE(reader.read_chunk(chunk, 2));
REQUIRE(chunk.empty());

Definition at line 211 of file csv_reader.cpp.

◆ read_row()

bool csv::CSVReader::read_row ( CSVRow row)

Retrieve the next CSV row, returning true while more rows are available.

This is the lowest-level row-consumption API on CSVReader. Each successful call overwrites row with the next parsed record in stream order.

Returns
true if a row was produced, false after end-of-stream is reached.
Note
This permanently consumes rows from the stream.
Performance Notes

Example:

Definition at line 199 of file csv_reader.cpp.

◆ speculative_diagnostics()

internals::SpeculativeParseDiagnostics csv::CSVReader::speculative_diagnostics ( ) const
inlinenoexcept

Return speculative-parsing counters for filename-backed readers.

Definition at line 311 of file csv_reader.hpp.

◆ utf8_bom()

bool csv::CSVReader::utf8_bom ( ) const
inlinenoexcept

Whether or not CSV was prefixed with a UTF-8 bom.

Definition at line 308 of file csv_reader.hpp.


The documentation for this class was generated from the following files: