Vince's CSV Parser
Loading...
Searching...
No Matches
csv::CSVReader Class Reference

Main class for parsing CSVs from files and in-memory sources. More...

#include <csv_reader.hpp>

Classes

class  iterator
 An input iterator capable of handling large files. More...
 

Public Member Functions

 CSVReader (const CSVReader &)=delete
 Not copyable.
 
 CSVReader (CSVReader &&)=delete
 Not movable: contains std::mutex.
 
CSVReaderoperator= (const CSVReader &)=delete
 Not copyable.
 
CSVReaderoperator= (CSVReader &&)=delete
 Not movable: contains std::mutex.
 
Constructors

Constructors for iterating over large files and parsing in-memory sources.

 CSVReader (csv::string_view filename, CSVFormat format=CSVFormat::guess_csv())
 Construct CSVReader from filename using memory-mapped I/O.
 
template<typename TStream , csv::enable_if_t< std::is_base_of< std::istream, TStream >::value, int > = 0>
 CSVReader (TStream &source, CSVFormat format=CSVFormat::guess_csv())
 Construct CSVReader from std::istream.
 
Retrieving CSV Rows
bool read_row (CSVRow &row)
 Retrieve rows as CSVRow objects, returning true if more rows are available.
 
iterator begin ()
 Return an iterator to the first row in the reader.
 
CSV_CONST iterator end () const noexcept
 A placeholder for the imaginary past the end row in a CSV.
 
bool eof () const noexcept
 Returns true if we have reached end of file.
 
CSV Metadata
CSVFormat get_format () const
 Return the format of the original raw CSV.
 
std::vector< std::string > get_col_names () const
 Return the CSV's column names as a vector of strings.
 
int index_of (csv::string_view col_name) const
 Return the index of the column name if found or csv::CSV_NOT_FOUND otherwise.
 
CSV Metadata: Attributes
CONSTEXPR bool empty () const noexcept
 Whether or not the file or stream contains valid CSV rows, not including the header.
 
CONSTEXPR size_t n_rows () const noexcept
 Retrieves the number of rows that have been read so far.
 
bool utf8_bom () const noexcept
 Whether or not CSV was prefixed with a UTF-8 bom.
 

Protected Member Functions

void set_col_names (const std::vector< std::string > &)
 Sets this reader's column names and associated data.
 
Multi-Threaded File Reading Functions
bool read_csv (size_t bytes=internals::ITERATION_CHUNK_SIZE)
 Read a chunk of CSV data.
 

Protected Attributes

CSV Settings
CSVFormat _format
 
Parser State
internals::ColNamesPtr col_names = std::make_shared<internals::ColNames>()
 Pointer to a object containing column information.
 
std::unique_ptr< internals::IBasicCSVParserparser = nullptr
 Helper class which actually does the parsing.
 
std::unique_ptr< RowCollectionrecords {new RowCollection(100)}
 Queue of parsed CSV rows.
 
size_t n_cols = 0
 The number of columns in this CSV.
 
size_t _n_rows = 0
 How many rows (minus header) have been read so far.
 

Detailed Description

Main class for parsing CSVs from files and in-memory sources.

All rows are compared to the column names for length consistency

  • By default, rows that are too short or too long are dropped
  • Custom behavior can be defined by overriding bad_row_handler in a subclass

Definition at line 77 of file csv_reader.hpp.

Constructor & Destructor Documentation

◆ CSVReader() [1/2]

csv::CSVReader::CSVReader ( csv::string_view  filename,
CSVFormat  format = CSVFormat::guess_csv() 
)

Construct CSVReader from filename using memory-mapped I/O.

Reads an arbitrarily large CSV file using memory-mapped IO.

CODE PATH 1 of 2: Uses MmapParser with mio library for maximum performance. This is fundamentally different from the stream-based constructor below.

Note
Bugs can exist in this path independently of the stream path (and vice versa)
When writing tests that validate I/O behavior, BOTH paths must be tested
See also
StreamParser for the alternative implementation

Details: Reads the first block of a CSV file synchronously to get information such as column names and delimiting character.

Parameters
[in]filenamePath to CSV file
[in]formatFormat of the CSV file

Guess delimiter and header row

Definition at line 167 of file csv_reader.cpp.

◆ CSVReader() [2/2]

template<typename TStream , csv::enable_if_t< std::is_base_of< std::istream, TStream >::value, int > = 0>
csv::CSVReader::CSVReader ( TStream &  source,
CSVFormat  format = CSVFormat::guess_csv() 
)
inline

Construct CSVReader from std::istream.

CODE PATH 2 of 2: Uses StreamParser with different internal implementation than the memory-mapped constructor above. Issue #281 was specific to THIS path only.

Template Parameters
TStreamAn input stream deriving from std::istream
Note
CSV format guessing works differently here - must manually specify dialect
When writing tests that validate I/O behavior, BOTH paths must be tested
See also
MmapParser for the memory-mapped alternative

Definition at line 183 of file csv_reader.hpp.

◆ ~CSVReader()

csv::CSVReader::~CSVReader ( )
inline

Definition at line 215 of file csv_reader.hpp.

Member Function Documentation

◆ begin()

CSVReader::iterator csv::CSVReader::begin ( )

Return an iterator to the first row in the reader.

Definition at line 9 of file csv_reader_iterator.cpp.

◆ empty()

CONSTEXPR bool csv::CSVReader::empty ( ) const
inlinenoexcept

Whether or not the file or stream contains valid CSV rows, not including the header.

Note
Gives an accurate answer regardless of when it is called.

Definition at line 246 of file csv_reader.hpp.

◆ end()

CSV_CONST CSVReader::iterator csv::CSVReader::end ( ) const
noexcept

A placeholder for the imaginary past the end row in a CSV.

Attempting to deference this will lead to bad things.

Definition at line 20 of file csv_reader_iterator.cpp.

◆ eof()

bool csv::CSVReader::eof ( ) const
inlinenoexcept

Returns true if we have reached end of file.

Definition at line 228 of file csv_reader.hpp.

◆ get_col_names()

std::vector< std::string > csv::CSVReader::get_col_names ( ) const

Return the CSV's column names as a vector of strings.

Definition at line 207 of file csv_reader.cpp.

◆ get_format()

CSVFormat csv::CSVReader::get_format ( ) const

Return the format of the original raw CSV.

Definition at line 194 of file csv_reader.cpp.

◆ index_of()

int csv::CSVReader::index_of ( csv::string_view  col_name) const

Return the index of the column name if found or csv::CSV_NOT_FOUND otherwise.

Definition at line 218 of file csv_reader.cpp.

◆ n_rows()

CONSTEXPR size_t csv::CSVReader::n_rows ( ) const
inlinenoexcept

Retrieves the number of rows that have been read so far.

Definition at line 249 of file csv_reader.hpp.

◆ read_row()

bool csv::CSVReader::read_row ( CSVRow row)

Retrieve rows as CSVRow objects, returning true if more rows are available.

Performance Notes
Parameters
[out]rowThe variable where the parsed row will be stored
See also
CSVRow, CSVField

Example:

Definition at line 310 of file csv_reader.cpp.

◆ utf8_bom()

bool csv::CSVReader::utf8_bom ( ) const
inlinenoexcept

Whether or not CSV was prefixed with a UTF-8 bom.

Definition at line 252 of file csv_reader.hpp.


The documentation for this class was generated from the following files: