Vince's CSV Parser
Loading...
Searching...
No Matches
csv Namespace Reference

The all encompassing namespace. More...

Classes

class  CSVField
 Data type representing individual CSV values. More...
 
struct  CSVFileInfo
 Returned by get_file_info() More...
 
class  CSVFormat
 Stores information about how to parse a CSV file. More...
 
struct  CSVGuessResult
 Stores the inferred format of a CSV file. More...
 
class  CSVReader
 Main class for parsing CSVs from files and in-memory sources. More...
 
class  CSVRow
 Data structure for representing CSV rows. More...
 
class  CSVStat
 Class for calculating statistics from CSV files and in-memory sources. More...
 
class  DataFrame
 
class  DataFrameOptions
 Allows configuration of DataFrame behavior. More...
 
class  DataFrameRow
 Proxy class that wraps a CSVRow and intercepts field access to check for edits. More...
 
class  DelimWriter
 Class for writing delimiter separated values files. More...
 

Typedefs

using RowCollection = internals::ThreadSafeDeque< CSVRow >
 Standard type for storing collection of rows.
 
using string_view = nonstd::string_view
 The string_view class used by this library.
 
template<bool B, class T = void>
using enable_if_t = typename std::enable_if< B, T >::type
 
template<typename F , typename... Args>
using invoke_result_t = typename std::result_of< F(Args...)>::type
 

Enumerations

enum class  VariableColumnPolicy { THROW = -1 , IGNORE_ROW = 0 , KEEP = 1 , KEEP_NON_EMPTY = 2 }
 Determines how to handle rows that are shorter or longer than the majority. More...
 
enum class  ColumnNamePolicy { EXACT = 0 , CASE_INSENSITIVE = 1 }
 Determines how column name lookups are performed. More...
 
enum class  DataType {
  UNKNOWN = -1 , CSV_NULL , CSV_STRING , CSV_INT8 ,
  CSV_INT16 , CSV_INT32 , CSV_INT64 , CSV_BIGINT ,
  CSV_DOUBLE
}
 Enumerates the different CSV field types that are recognized by this library. More...
 

Functions

CSVGuessResult guess_format (csv::string_view filename, const std::vector< char > &delims={ ',', '|', '\t', ';', '^', '~' })
 Guess the delimiter, header row, and mode column count of a CSV file.
 
 CSVRow::operator std::vector< std::string > () const
 
 CSV_NON_NULL (2) CSVRow
 
template<>
std::string CSVField::get< std::string > ()
 Retrieve this field's original string.
 
template<>
CONSTEXPR_14 csv::string_view CSVField::get< csv::string_view > ()
 Retrieve a view over this field's string.
 
template<>
bool CSVField::try_get< std::string > (std::string &out) noexcept
 Non-throwing retrieval of field as std::string.
 
template<>
CONSTEXPR_14 bool CSVField::try_get< csv::string_view > (csv::string_view &out) noexcept
 Non-throwing retrieval of field as csv::string_view.
 
template<class OutputStream , char Delim, char Quote, bool Flush>
DelimWriter< OutputStream, Delim, Quote, Flush > & operator<< (DelimWriter< OutputStream, Delim, Quote, Flush > &writer, const CSVRow &row)
 
template<class OutputStream , char Delim, char Quote, bool Flush, typename KeyType >
DelimWriter< OutputStream, Delim, Quote, Flush > & operator<< (DelimWriter< OutputStream, Delim, Quote, Flush > &writer, const DataFrameRow< KeyType > &row)
 Overload for writing a DataFrameRow (respects sparse overlay edits).
 
Utility Functions
std::unordered_map< std::string, DataTypecsv_data_types (const std::string &filename)
 Useful for uploading CSV files to SQL databases.
 
CSVFileInfo get_file_info (const std::string &filename)
 Get basic information about a CSV file.
 
std::vector< std::string > get_col_names (csv::string_view filename, const CSVFormat &format=CSVFormat::guess_csv())
 Get the column names of a CSV file using just the first 500KB.
 
long long get_col_pos (csv::string_view filename, csv::string_view col_name, const CSVFormat &format=CSVFormat::guess_csv())
 Find the position of a column in a CSV file or CSV_NOT_FOUND otherwise.
 
Shorthand Parsing Functions

Convenience functions for parsing small strings

CSVReader parse (csv::string_view in, const CSVFormat &format=CSVFormat::guess_csv())
 Parse CSV from a string view, copying the input into an owned buffer.
 
CSVReader parse_unsafe (csv::string_view in, CSVFormat format=CSVFormat::guess_csv())
 Parse CSV from an in-memory view with zero copy.
 
CSVReader parse_no_header (csv::string_view in)
 Parses a CSV string with no headers.
 
CSVReader operator""_csv (const char *in, size_t n)
 Parse a RFC 4180 CSV string.
 
CSVReader operator""_csv_no_header (const char *in, size_t n)
 A shorthand for csv::parse_no_header().
 

Variables

constexpr int CSV_NOT_FOUND = -1
 Integer indicating a requested column wasn't found.
 
constexpr unsigned CHAR_OFFSET = std::numeric_limits<char>::is_signed ? 128 : 0
 Offset to convert char into array index.
 

CSV Writing

template<class OutputStream , bool Flush = true>
using CSVWriter = DelimWriter< OutputStream, ',', '"', Flush>
 An alias for csv::DelimWriter for writing standard CSV files.
 
template<class OutputStream , bool Flush = true>
using TSVWriter = DelimWriter< OutputStream, '\t', '"', Flush>
 Class for writing tab-separated values files.
 
template<class OutputStream >
CSVWriter< OutputStream > make_csv_writer (OutputStream &out, bool quote_minimal=true)
 Return a csv::CSVWriter over the output stream.
 
template<class OutputStream >
CSVWriter< OutputStream, false > make_csv_writer_buffered (OutputStream &out, bool quote_minimal=true)
 Return a buffered csv::CSVWriter over the output stream (does not auto flush)
 
template<class OutputStream >
TSVWriter< OutputStream > make_tsv_writer (OutputStream &out, bool quote_minimal=true)
 Return a csv::TSVWriter over the output stream.
 
template<class OutputStream >
TSVWriter< OutputStream, false > make_tsv_writer_buffered (OutputStream &out, bool quote_minimal=true)
 Return a buffered csv::TSVWriter over the output stream (does not auto flush)
 

Detailed Description

The all encompassing namespace.

Typedef Documentation

◆ CSVWriter

template<class OutputStream , bool Flush = true>
using csv::CSVWriter = typedef DelimWriter<OutputStream, ',', '"', Flush>

An alias for csv::DelimWriter for writing standard CSV files.

See also
csv::DelimWriter::operator<<()
Note
Use csv::make_csv_writer() to in instatiate this class over an actual output stream.

Definition at line 509 of file csv_writer.hpp.

◆ enable_if_t

template<bool B, class T = void>
using csv::enable_if_t = typedef typename std::enable_if<B, T>::type

Definition at line 158 of file common.hpp.

◆ invoke_result_t

template<typename F , typename... Args>
using csv::invoke_result_t = typedef typename std::result_of<F(Args...)>::type

Definition at line 169 of file common.hpp.

◆ RowCollection

Standard type for storing collection of rows.

Definition at line 168 of file basic_csv_parser.hpp.

◆ string_view

The string_view class used by this library.

Definition at line 135 of file common.hpp.

◆ TSVWriter

template<class OutputStream , bool Flush = true>
using csv::TSVWriter = typedef DelimWriter<OutputStream, '\t', '"', Flush>

Class for writing tab-separated values files.

See also
csv::DelimWriter::write_row()
csv::DelimWriter::operator<<()
Note
Use csv::make_tsv_writer() to in instatiate this class over an actual output stream.

Definition at line 520 of file csv_writer.hpp.

Enumeration Type Documentation

◆ ColumnNamePolicy

enum class csv::ColumnNamePolicy
strong

Determines how column name lookups are performed.

Enumerator
EXACT 

Case-sensitive match (default)

CASE_INSENSITIVE 

Case-insensitive match.

Definition at line 29 of file csv_format.hpp.

◆ DataType

enum class csv::DataType
strong

Enumerates the different CSV field types that are recognized by this library.

Note
Overflowing integers will be stored and classified as doubles.
Unlike previous releases, integer enums here are platform agnostic.
Enumerator
CSV_NULL 

Empty string.

CSV_STRING 

Non-numeric string.

CSV_INT8 

8-bit integer

CSV_INT16 

16-bit integer (short on MSVC/GCC)

CSV_INT32 

32-bit integer (int on MSVC/GCC)

CSV_INT64 

64-bit integer (long long on MSVC/GCC)

CSV_BIGINT 

Value too big to fit in a 64-bit in.

CSV_DOUBLE 

Floating point value.

Definition at line 20 of file data_type.hpp.

◆ VariableColumnPolicy

enum class csv::VariableColumnPolicy
strong

Determines how to handle rows that are shorter or longer than the majority.

Definition at line 21 of file csv_format.hpp.

Function Documentation

◆ csv_data_types()

std::unordered_map< std::string, DataType > csv::csv_data_types ( const std::string &  filename)

Useful for uploading CSV files to SQL databases.

Return a data type for each column such that every value in a column can be converted to the corresponding data type without data loss.

Definition at line 228 of file csv_stat.cpp.

◆ CSV_NON_NULL()

csv::CSV_NON_NULL ( )

Definition at line 159 of file csv_row.cpp.

◆ CSVField::get< csv::string_view >()

template<>
CONSTEXPR_14 csv::string_view csv::CSVField::get< csv::string_view > ( )

Retrieve a view over this field's string.

Warning
This string_view is only guaranteed to be valid as long as this CSVRow is still alive.

Definition at line 465 of file csv_row.hpp.

◆ CSVField::get< std::string >()

template<>
std::string csv::CSVField::get< std::string > ( )
inline

Retrieve this field's original string.

Definition at line 455 of file csv_row.hpp.

◆ CSVField::try_get< csv::string_view >()

template<>
CONSTEXPR_14 bool csv::CSVField::try_get< csv::string_view > ( csv::string_view out)
noexcept

Non-throwing retrieval of field as csv::string_view.

Definition at line 487 of file csv_row.hpp.

◆ CSVField::try_get< std::string >()

template<>
bool csv::CSVField::try_get< std::string > ( std::string &  out)
inlinenoexcept

Non-throwing retrieval of field as std::string.

Definition at line 480 of file csv_row.hpp.

◆ CSVRow::operator std::vector< std::string >()

csv::CSVRow::operator std::vector< std::string > ( ) const

Definition at line 58 of file csv_row.cpp.

◆ get_col_names()

std::vector< std::string > csv::get_col_names ( csv::string_view  filename,
const CSVFormat format = CSVFormat::guess_csv() 
)
inline

Get the column names of a CSV file using just the first 500KB.

Definition at line 109 of file csv_utility.hpp.

◆ get_col_pos()

long long csv::get_col_pos ( csv::string_view  filename,
csv::string_view  col_name,
const CSVFormat format = CSVFormat::guess_csv() 
)
inline

Find the position of a column in a CSV file or CSV_NOT_FOUND otherwise.

Definition at line 117 of file csv_utility.hpp.

◆ get_file_info()

CSVFileInfo csv::get_file_info ( const std::string &  filename)
inline

Get basic information about a CSV file.

#include "csv.hpp"
#include <iostream>
int main(int argc, char** argv) {
using namespace csv;
if (argc < 2) {
std::cout << "Usage: " << argv[0] << " [file]" << std::endl;
exit(1);
}
std::string file = argv[1];
auto info = get_file_info(file);
std::cout << file << std::endl << "Columns: ";
for (size_t i = 0; i < info.col_names.size(); i++) {
if (i) std::cout << ", ";
std::cout << info.col_names[i];
}
std::cout << std::endl
<< "Dimensions: " << info.n_rows << " rows x " << info.n_cols << " columns" << std::endl
<< "Delimiter: " << info.delim << std::endl;
return 0;
}
The all encompassing namespace.
CSVFileInfo get_file_info(const std::string &filename)
Get basic information about a CSV file.

Definition at line 94 of file csv_utility.hpp.

◆ guess_format()

CSVGuessResult csv::guess_format ( csv::string_view  filename,
const std::vector< char > &  delims = { ',', '|', '\t', ';', '^', '~' } 
)
inline

Guess the delimiter, header row, and mode column count of a CSV file.

**Heuristic:** For each candidate delimiter, calculate a score based on
the most common row length (mode). The delimiter with the highest score wins.

**Header Detection:**
- If the first row has >= columns than the mode, it's treated as the header
- Otherwise, the first row with the mode length is treated as the header

This approach handles:
- Headers with trailing delimiters or optional columns (wider than data rows)
- Comment lines before the actual header (first row shorter than mode)
- Standard CSVs where first row is the header
Note
Score = (row_length � count_of_rows_with_that_length)
Also returns inferred mode-width column count (CSVGuessResult::n_cols)

Definition at line 161 of file basic_csv_parser.hpp.

◆ make_csv_writer()

template<class OutputStream >
CSVWriter< OutputStream > csv::make_csv_writer ( OutputStream &  out,
bool  quote_minimal = true 
)
inline

Return a csv::CSVWriter over the output stream.

Definition at line 524 of file csv_writer.hpp.

◆ make_csv_writer_buffered()

template<class OutputStream >
CSVWriter< OutputStream, false > csv::make_csv_writer_buffered ( OutputStream &  out,
bool  quote_minimal = true 
)
inline

Return a buffered csv::CSVWriter over the output stream (does not auto flush)

Definition at line 530 of file csv_writer.hpp.

◆ make_tsv_writer()

template<class OutputStream >
TSVWriter< OutputStream > csv::make_tsv_writer ( OutputStream &  out,
bool  quote_minimal = true 
)
inline

Return a csv::TSVWriter over the output stream.

Definition at line 536 of file csv_writer.hpp.

◆ make_tsv_writer_buffered()

template<class OutputStream >
TSVWriter< OutputStream, false > csv::make_tsv_writer_buffered ( OutputStream &  out,
bool  quote_minimal = true 
)
inline

Return a buffered csv::TSVWriter over the output stream (does not auto flush)

Definition at line 542 of file csv_writer.hpp.

◆ operator""_csv()

CSVReader csv::operator""_csv ( const char *  in,
size_t  n 
)
inline

Parse a RFC 4180 CSV string.

String literals have static storage duration, so the zero-copy path is safe here.

Example
TEST_CASE( "Test Escaped Comma", "[read_csv_comma]" ) {
auto rows = "A,B,C\r\n" // Header row
"123,\"234,345\",456\r\n"
"1,2,3\r\n"
"1,2,3"_csv;
CSVRow row;
rows.read_row(row);
REQUIRE( vector<string>(row) ==
vector<string>({"123", "234,345", "456"}));
}
Data structure for representing CSV rows.
Definition csv_row.hpp:264

Definition at line 71 of file csv_utility.hpp.

◆ operator""_csv_no_header()

CSVReader csv::operator""_csv_no_header ( const char *  in,
size_t  n 
)
inline

A shorthand for csv::parse_no_header().

String literals have static storage duration, so the zero-copy path is safe here.

Definition at line 80 of file csv_utility.hpp.

◆ operator<<() [1/2]

template<class OutputStream , char Delim, char Quote, bool Flush>
DelimWriter< OutputStream, Delim, Quote, Flush > & csv::operator<< ( DelimWriter< OutputStream, Delim, Quote, Flush > &  writer,
const CSVRow row 
)

Definition at line 1 of file csv_writer_extensions.hpp.

◆ operator<<() [2/2]

template<class OutputStream , char Delim, char Quote, bool Flush, typename KeyType >
DelimWriter< OutputStream, Delim, Quote, Flush > & csv::operator<< ( DelimWriter< OutputStream, Delim, Quote, Flush > &  writer,
const DataFrameRow< KeyType > &  row 
)

Overload for writing a DataFrameRow (respects sparse overlay edits).

Definition at line 1 of file csv_writer_extensions.hpp.

◆ parse()

CSVReader csv::parse ( csv::string_view  in,
const CSVFormat format = CSVFormat::guess_csv() 
)
inline

Parse CSV from a string view, copying the input into an owned buffer.

Safe for any string_view regardless of the caller's ownership of the underlying memory.

Example
TEST_CASE( "Test Escaped Quote", "[read_csv_quote]" ) {
// Per RFC 1480, escaped quotes should be doubled up
auto csv_string = GENERATE(as<std::string> {},
(
"A,B,C\r\n" // Header row
"123,\"234\"\"345\",456\r\n"
"123,\"234\"345\",456\r\n" // Unescaped single quote (not strictly valid)
"123,\"234\"345\",\"456\"" // Quoted field at the end
"123, \"234\"345\",\"456\"" // Quoted field w/ leading whitespace
),
(
"\"A\",\"B\",\"C\"\r\n" // Header row
"123,\"234\"\"345\",456\r\n"
"123,\"234\"345\",456\r\n" // Unescaped single quote (not strictly valid)
"123,\"234\"345\",\"456\"" // Quoted field at the end
"123,\"234\"345\",\"456\"" // Quoted field w/ leading whitespace
)
);
SECTION("Escaped Quote") {
auto rows = parse(csv_string);
REQUIRE(rows.get_col_names() == vector<string>({ "A", "B", "C" }));
// Expected Results: Double " is an escape for a single "
vector<string> correct_row = { "123", "234\"345", "456" };
for (auto& row : rows) {
REQUIRE(vector<string>(row) == correct_row);
}
}
}
CSVReader parse(csv::string_view in, const CSVFormat &format=CSVFormat::guess_csv())
Parse CSV from a string view, copying the input into an owned buffer.

Definition at line 37 of file csv_utility.hpp.

◆ parse_no_header()

CSVReader csv::parse_no_header ( csv::string_view  in)
inline

Parses a CSV string with no headers.

Definition at line 57 of file csv_utility.hpp.

◆ parse_unsafe()

CSVReader csv::parse_unsafe ( csv::string_view  in,
CSVFormat  format = CSVFormat::guess_csv() 
)
inline

Parse CSV from an in-memory view with zero copy.

WARNING: Non-owning path. The caller must ensure in's backing memory remains valid and immutable while the reader may request additional rows from the source stream.

Rows already obtained from the reader remain valid, but unread rows still depend on the source view staying alive.

Definition at line 51 of file csv_utility.hpp.

Variable Documentation

◆ CHAR_OFFSET

constexpr unsigned csv::CHAR_OFFSET = std::numeric_limits<char>::is_signed ? 128 : 0
constexpr

Offset to convert char into array index.

Definition at line 299 of file common.hpp.

◆ CSV_NOT_FOUND

constexpr int csv::CSV_NOT_FOUND = -1
constexpr

Integer indicating a requested column wasn't found.

Definition at line 296 of file common.hpp.