Vince's CSV Parser
Loading...
Searching...
No Matches
csv::CSVReader Class Reference

Main class for parsing CSVs from files and in-memory sources. More...

#include <csv_reader.hpp>

Classes

class  iterator
 An input iterator capable of handling large files. More...
 

Public Member Functions

 CSVReader (const CSVReader &)=delete
 Not copyable.
 
CSVReaderoperator= (const CSVReader &)=delete
 Not copyable.
 
 CSVReader (CSVReader &&other) noexcept
 Move constructor.
 
CSVReaderoperator= (CSVReader &&other) noexcept
 Move assignment.
 
Constructors

Constructors for iterating over large files and parsing in-memory sources.

 CSVReader (csv::string_view filename, const CSVFormat &format=CSVFormat::guess_csv())
 Construct CSVReader from filename.
 
template<typename TStream , csv::enable_if_t< std::is_base_of< std::istream, TStream >::value, int > = 0>
 CSVReader (TStream &source, CSVFormat format=CSVFormat::guess_csv())
 Construct CSVReader from std::istream.
 
 CSVReader (std::unique_ptr< std::istream > source, const CSVFormat &format=CSVFormat::guess_csv())
 Construct CSVReader from an owned std::istream.
 
Retrieving CSV Rows
bool read_row (CSVRow &row)
 Retrieve rows as CSVRow objects, returning true if more rows are available.
 
iterator begin ()
 Return an iterator to the first row in the reader.
 
CSV_CONST iterator end () const noexcept
 A placeholder for the imaginary past-the-end row in a CSV.
 
bool eof () const noexcept
 Returns true if we have reached end of file.
 
CSV Metadata
CSVFormat get_format () const
 Return the format of the original raw CSV.
 
std::vector< std::string > get_col_names () const
 Return the CSV's column names as a vector of strings.
 
int index_of (csv::string_view col_name) const
 Return the index of the column name if found or csv::CSV_NOT_FOUND otherwise.
 
CSV Metadata: Attributes
CONSTEXPR bool empty () const noexcept
 Whether or not the file or stream contains valid CSV rows, not including the header.
 
CONSTEXPR size_t n_rows () const noexcept
 Retrieves the number of rows that have been read so far.
 
bool utf8_bom () const noexcept
 Whether or not CSV was prefixed with a UTF-8 bom.
 

Protected Member Functions

void set_col_names (const std::vector< std::string > &)
 Sets this reader's column names and associated data.
 
Multi-Threaded File Reading Functions
bool read_csv (size_t bytes=internals::CSV_CHUNK_SIZE_DEFAULT)
 Read a chunk of CSV data.
 

Protected Attributes

CSV Settings
CSVFormat _format
 
Parser State
internals::ColNamesPtr col_names = std::make_shared<internals::ColNames>()
 Pointer to a object containing column information.
 
std::unique_ptr< internals::IBasicCSVParserparser = nullptr
 Helper class which actually does the parsing.
 
std::unique_ptr< RowCollectionrecords {new RowCollection(100)}
 Queue of parsed CSV rows.
 
std::unique_ptr< std::istream > owned_stream = nullptr
 Optional owned stream used by two paths: 1) Emscripten filename-constructor fallback to stream parsing 2) Opt-in ownership constructor taking std::unique_ptr<std::istream>
 
size_t n_cols = 0
 The number of columns in this CSV.
 
size_t _n_rows = 0
 How many rows (minus header) have been read so far.
 

Detailed Description

Main class for parsing CSVs from files and in-memory sources.

All rows are compared to the column names for length consistency

  • By default, rows that are too short or too long are dropped
  • Custom behavior can be defined by overriding bad_row_handler in a subclass

Streaming semantics: CSVReader is a single-pass streaming reader. Every read operation — read_row(), the iterator interface — pulls rows permanently from the internal queue. Rows consumed by one interface are not visible to another. There is no rewind or seek.

Ownership and sharing: CSVReader is non-copyable and move-enabled. It manages live parsing state (worker thread, internal queue, and optional owned stream), so ownership transfer should be explicit. To share or transfer a reader, wrap it in a std::unique_ptr<CSVReader>:

auto reader = std::make_unique<csv::CSVReader>("data.csv");
process(std::move(reader)); // transfer ownership

Definition at line 61 of file csv_reader.hpp.

Constructor & Destructor Documentation

◆ CSVReader() [1/4]

csv::CSVReader::CSVReader ( csv::string_view  filename,
const CSVFormat format = CSVFormat::guess_csv() 
)
inline

Construct CSVReader from filename.

Native builds use CODE PATH 1 of 2: MmapParser with mio for maximum performance. Emscripten builds fall back to the stream-based implementation because mmap is unavailable.

During construction, parser installation performs an initial synchronous metadata read so delimiter and header information are resolved before user reads begin.

Note
On native builds, bugs can exist in this path independently of the stream path.
When writing tests that validate I/O behavior, test both filename and stream constructors.
See also
StreamParser for the stream-based alternative.

Definition at line 134 of file csv_reader.hpp.

◆ CSVReader() [2/4]

template<typename TStream , csv::enable_if_t< std::is_base_of< std::istream, TStream >::value, int > = 0>
csv::CSVReader::CSVReader ( TStream &  source,
CSVFormat  format = CSVFormat::guess_csv() 
)
inline

Construct CSVReader from std::istream.

Uses StreamParser. On native builds this is CODE PATH 2 of 2 and remains independent from the filename-based mmap path. On Emscripten, the filename constructor also funnels through this implementation.

Template Parameters
TStreamAn input stream deriving from std::istream
Note
Delimiter/header guessing is still available by default via CSVFormat::guess_csv(). For deterministic parsing of known dialects, pass an explicit CSVFormat.
On native builds, tests that validate I/O behavior should cover both constructors
See also
MmapParser for the memory-mapped alternative

Definition at line 171 of file csv_reader.hpp.

◆ CSVReader() [3/4]

csv::CSVReader::CSVReader ( std::unique_ptr< std::istream >  source,
const CSVFormat format = CSVFormat::guess_csv() 
)
inline

Construct CSVReader from an owned std::istream.

This is an opt-in safety switch for stream lifetime management. CSVReader takes ownership and guarantees the stream outlives parsing.

Definition at line 180 of file csv_reader.hpp.

◆ CSVReader() [4/4]

csv::CSVReader::CSVReader ( CSVReader &&  other)
inlinenoexcept

Move constructor.

Required so C++11 builds can return CSVReader by value from helpers like csv::parse()/csvparse_unsafe(), where copy elision is not guaranteed.

Any active worker on the source is joined before moving parser state to avoid a thread continuing to run against the source object's address.

Definition at line 201 of file csv_reader.hpp.

◆ ~CSVReader()

csv::CSVReader::~CSVReader ( )
inline

Definition at line 255 of file csv_reader.hpp.

Member Function Documentation

◆ begin()

CSVReader::iterator csv::CSVReader::begin ( )

Return an iterator to the first row in the reader.

Definition at line 9 of file csv_reader_iterator.cpp.

◆ empty()

CONSTEXPR bool csv::CSVReader::empty ( ) const
inlinenoexcept

Whether or not the file or stream contains valid CSV rows, not including the header.

Note
Gives an accurate answer regardless of when it is called.

Definition at line 284 of file csv_reader.hpp.

◆ end()

CSV_CONST CSVReader::iterator csv::CSVReader::end ( ) const
noexcept

A placeholder for the imaginary past-the-end row in a CSV.

Attempting to dereference this iterator is undefined.

Definition at line 20 of file csv_reader_iterator.cpp.

◆ eof()

bool csv::CSVReader::eof ( ) const
inlinenoexcept

Returns true if we have reached end of file.

Definition at line 266 of file csv_reader.hpp.

◆ get_col_names()

std::vector< std::string > csv::CSVReader::get_col_names ( ) const

Return the CSV's column names as a vector of strings.

Definition at line 38 of file csv_reader.cpp.

◆ get_format()

CSVFormat csv::CSVReader::get_format ( ) const

Return the format of the original raw CSV.

Definition at line 25 of file csv_reader.cpp.

◆ index_of()

int csv::CSVReader::index_of ( csv::string_view  col_name) const

Return the index of the column name if found or csv::CSV_NOT_FOUND otherwise.

Definition at line 46 of file csv_reader.cpp.

◆ n_rows()

CONSTEXPR size_t csv::CSVReader::n_rows ( ) const
inlinenoexcept

Retrieves the number of rows that have been read so far.

Definition at line 287 of file csv_reader.hpp.

◆ operator=()

CSVReader & csv::CSVReader::operator= ( CSVReader &&  other)
inlinenoexcept

Move assignment.

Joins active workers on both sides before transferring parser state.

Definition at line 226 of file csv_reader.hpp.

◆ read_row()

bool csv::CSVReader::read_row ( CSVRow row)

Retrieve rows as CSVRow objects, returning true if more rows are available.

Performance Notes
See also
CSVRow, CSVField

Example:

Definition at line 130 of file csv_reader.cpp.

◆ utf8_bom()

bool csv::CSVReader::utf8_bom ( ) const
inlinenoexcept

Whether or not CSV was prefixed with a UTF-8 bom.

Definition at line 290 of file csv_reader.hpp.


The documentation for this class was generated from the following files: