Vince's CSV Parser
Loading...
Searching...
No Matches
basic_csv_parser.hpp File Reference

Contains the main CSV parsing algorithm and various utility functions. More...

#include <algorithm>
#include <array>
#include <fstream>
#include <memory>
#include <unordered_map>
#include <unordered_set>
#include <vector>
#include "../external/mio.hpp"
#include "basic_csv_parser_simd.hpp"
#include "col_names.hpp"
#include "common.hpp"
#include "csv_format.hpp"
#include "csv_row.hpp"
#include "row_deque.hpp"

Go to the source code of this file.

Classes

struct  csv::internals::GuessScore
 
struct  csv::internals::ResolvedFormat
 
class  csv::internals::IBasicCSVParser
 Abstract base class which provides CSV parsing logic. More...
 
class  csv::internals::StreamParser< TStream >
 A class for parsing CSV data from any std::istream, including non-seekable sources such as pipes and decompression filters. More...
 
class  csv::internals::MmapParser
 Parser for memory-mapped files. More...
 

Namespaces

namespace  csv
 The all encompassing namespace.
 

Typedefs

using csv::RowCollection = internals::ThreadSafeDeque< CSVRow >
 Standard type for storing collection of rows.
 

Functions

template<typename OutArray , typename T = typename OutArray::type>
CSV_CONST CONSTEXPR_17 OutArray csv::internals::arrayToDefault (T &&value)
 Helper constexpr function to initialize an array with all the elements set to value.
 
GuessScore csv::internals::calculate_score (csv::string_view head, const CSVFormat &format)
 
CSVGuessResult csv::internals::guess_format (csv::string_view head, const std::vector< char > &delims={ ',', '|', '\t', ';', '^', '~' })
 Guess the delimiter used by a delimiter-separated values file.
 
CSV_CONST CONSTEXPR_17 ParseFlagMap csv::internals::make_parse_flags (char delimiter)
 Create a vector v where each index i corresponds to the ASCII number for a character and, v[i + 128] labels it according to the CSVReader::ParseFlags enum.
 
CSV_CONST CONSTEXPR_17 ParseFlagMap csv::internals::make_parse_flags (char delimiter, char quote_char)
 Create a vector v where each index i corresponds to the ASCII number for a character and, v[i + 128] labels it according to the CSVReader::ParseFlags enum.
 
char csv::internals::infer_char_for_flag (const ParseFlagMap &parse_flags, ParseFlags target, char fallback) noexcept
 
char csv::internals::infer_delimiter (const ParseFlagMap &parse_flags) noexcept
 
char csv::internals::infer_quote_char (const ParseFlagMap &parse_flags, char fallback='"') noexcept
 
CSV_CONST CONSTEXPR_17 WhitespaceMap csv::internals::make_ws_flags (const char *ws_chars, size_t n_chars)
 Create a vector v where each index i corresponds to the ASCII number for a character c and, v[i + 128] is true if c is a whitespace character.
 
WhitespaceMap csv::internals::make_ws_flags (const std::vector< char > &flags)
 
template<typename TStream , csv::enable_if_t< std::is_base_of< std::istream, TStream >::value, int > = 0>
std::string csv::internals::get_csv_head_stream (TStream &source)
 Read the first 500KB from a seekless stream source.
 
std::pair< std::string, size_tcsv::internals::get_csv_head_mmap (csv::string_view filename)
 Read the first 500KB from a filename using mmap.
 
std::string csv::internals::get_csv_head (csv::string_view filename)
 Compatibility shim selecting stream on Emscripten and mmap otherwise.
 
CSVGuessResult csv::guess_format (csv::string_view filename, const std::vector< char > &delims={ ',', '|', '\t', ';', '^', '~' })
 Guess the delimiter, header row, and mode column count of a CSV file.
 

Variables

constexpr const int csv::internals::UNINITIALIZED_FIELD = -1
 

Detailed Description

Contains the main CSV parsing algorithm and various utility functions.

Definition in file basic_csv_parser.hpp.

Function Documentation

◆ arrayToDefault()

template<typename OutArray , typename T = typename OutArray::type>
CSV_CONST CONSTEXPR_17 OutArray csv::internals::arrayToDefault ( T &&  value)

Helper constexpr function to initialize an array with all the elements set to value.

Definition at line 31 of file basic_csv_parser.hpp.

◆ calculate_score()

GuessScore csv::internals::calculate_score ( csv::string_view  head,
const CSVFormat format 
)

Definition at line 8 of file basic_csv_parser_guessing.cpp.

◆ get_csv_head()

std::string csv::internals::get_csv_head ( csv::string_view  filename)

Compatibility shim selecting stream on Emscripten and mmap otherwise.

Definition at line 37 of file basic_csv_parser.cpp.

◆ get_csv_head_mmap()

std::pair< std::string, size_t > csv::internals::get_csv_head_mmap ( csv::string_view  filename)

Read the first 500KB from a filename using mmap.

Also returns the total file size so callers avoid a second mmap open.

Definition at line 24 of file basic_csv_parser.cpp.

◆ get_csv_head_stream()

template<typename TStream , csv::enable_if_t< std::is_base_of< std::istream, TStream >::value, int > = 0>
std::string csv::internals::get_csv_head_stream ( TStream source)

Read the first 500KB from a seekless stream source.

Definition at line 118 of file basic_csv_parser.hpp.

◆ guess_format()

CSVGuessResult csv::internals::guess_format ( csv::string_view  head,
const std::vector< char > &  delims = { ',', '|', '\t', ';', '^', '~' } 
)

Guess the delimiter used by a delimiter-separated values file.

For each delimiter, find out which row length was most common (mode). The delimiter with the highest score (row_length * count) wins.

Header detection: If first row has >= columns than mode, use row 0. Otherwise use the first row with the mode length.

For each delimiter, find out which row length was most common (mode). The delimiter with the highest score (row_length * count) wins.

Header detection: If first row has >= columns than mode, use row 0. Otherwise use the first row with the mode length.

Definition at line 69 of file basic_csv_parser_guessing.cpp.

◆ infer_char_for_flag()

char csv::internals::infer_char_for_flag ( const ParseFlagMap parse_flags,
ParseFlags  target,
char  fallback 
)
inlinenoexcept

Definition at line 75 of file basic_csv_parser.hpp.

◆ infer_delimiter()

char csv::internals::infer_delimiter ( const ParseFlagMap parse_flags)
inlinenoexcept

Definition at line 89 of file basic_csv_parser.hpp.

◆ infer_quote_char()

char csv::internals::infer_quote_char ( const ParseFlagMap parse_flags,
char  fallback = '"' 
)
inlinenoexcept

Definition at line 95 of file basic_csv_parser.hpp.

◆ make_parse_flags() [1/2]

CSV_CONST CONSTEXPR_17 ParseFlagMap csv::internals::make_parse_flags ( char  delimiter)

Create a vector v where each index i corresponds to the ASCII number for a character and, v[i + 128] labels it according to the CSVReader::ParseFlags enum.

Definition at line 57 of file basic_csv_parser.hpp.

◆ make_parse_flags() [2/2]

CSV_CONST CONSTEXPR_17 ParseFlagMap csv::internals::make_parse_flags ( char  delimiter,
char  quote_char 
)

Create a vector v where each index i corresponds to the ASCII number for a character and, v[i + 128] labels it according to the CSVReader::ParseFlags enum.

Definition at line 69 of file basic_csv_parser.hpp.

◆ make_ws_flags() [1/2]

CSV_CONST CONSTEXPR_17 WhitespaceMap csv::internals::make_ws_flags ( const char ws_chars,
size_t  n_chars 
)

Create a vector v where each index i corresponds to the ASCII number for a character c and, v[i + 128] is true if c is a whitespace character.

Definition at line 103 of file basic_csv_parser.hpp.

◆ make_ws_flags() [2/2]

WhitespaceMap csv::internals::make_ws_flags ( const std::vector< char > &  flags)
inline

Definition at line 111 of file basic_csv_parser.hpp.

Variable Documentation

◆ UNINITIALIZED_FIELD

constexpr const int csv::internals::UNINITIALIZED_FIELD = -1
constexpr

Definition at line 26 of file basic_csv_parser.hpp.