Vince's CSV Parser
Loading...
Searching...
No Matches
csv::internals Namespace Reference

Stuff that is generally not of interest to end-users. More...

Classes

struct  ColNames
 A data structure for handling column name information. More...
 
class  CSVFieldList
 A class used for efficiently storing RawCSVField objects and expanding as necessary. More...
 
struct  GuessScore
 
class  IBasicCSVParser
 Abstract base class which provides CSV parsing logic. More...
 
class  is_equality_comparable
 
class  is_hashable
 
class  MmapParser
 Parser for memory-mapped files. More...
 
struct  RawCSVData
 A class for storing raw CSV data and associated metadata. More...
 
struct  RawCSVField
 A barebones class used for describing CSV fields. More...
 
class  StreamParser
 A class for parsing CSV data from a std::stringstream or an std::ifstream More...
 
class  ThreadSafeDeque
 A std::deque wrapper which allows multiple read and write threads to concurrently access it along with providing read threads the ability to wait for the deque to become populated. More...
 

Typedefs

using ColNamesPtr = std::shared_ptr< ColNames >
 
using ParseFlagMap = std::array< ParseFlags, 256 >
 An array which maps ASCII chars to a parsing flag.
 
using WhitespaceMap = std::array< bool, 256 >
 An array which maps ASCII chars to a flag indicating if it is whitespace.
 
using RawCSVDataPtr = std::shared_ptr< RawCSVData >
 

Enumerations

enum class  ParseFlags {
  QUOTE_ESCAPE_QUOTE = 0 , QUOTE = 2 | 1 , NOT_SPECIAL = 4 , DELIMITER = 4 | 2 ,
  NEWLINE = 4 | 2 | 1
}
 An enum used for describing the significance of each character with respect to CSV parsing. More...
 

Functions

size_t get_file_size (csv::string_view filename)
 
std::string get_csv_head (csv::string_view filename)
 
std::string get_csv_head (csv::string_view filename, size_t file_size)
 Read the first 500KB of a CSV file.
 
template<typename OutArray , typename T = typename OutArray::type>
CSV_CONST CONSTEXPR_17 OutArray arrayToDefault (T &&value)
 Helper constexpr function to initialize an array with all the elements set to value.
 
CSV_CONST CONSTEXPR_17 ParseFlagMap make_parse_flags (char delimiter)
 Create a vector v where each index i corresponds to the ASCII number for a character and, v[i + 128] labels it according to the CSVReader::ParseFlags enum.
 
CSV_CONST CONSTEXPR_17 ParseFlagMap make_parse_flags (char delimiter, char quote_char)
 Create a vector v where each index i corresponds to the ASCII number for a character and, v[i + 128] labels it according to the CSVReader::ParseFlags enum.
 
CSV_CONST CONSTEXPR_17 WhitespaceMap make_ws_flags (const char *ws_chars, size_t n_chars)
 Create a vector v where each index i corresponds to the ASCII number for a character c and, v[i + 128] is true if c is a whitespace character.
 
WhitespaceMap make_ws_flags (const std::vector< char > &flags)
 
template<typename TStream , csv::enable_if_t< std::is_base_of< std::istream, TStream >::value, int > = 0>
std::string get_csv_head (TStream &source)
 
template<typename T >
bool is_equal (T a, T b, T epsilon=0.001)
 
constexpr ParseFlags quote_escape_flag (ParseFlags flag, bool quote_escape) noexcept
 Transform the ParseFlags given the context of whether or not the current field is quote escaped.
 
 STATIC_ASSERT (ParseFlags::DELIMITER< ParseFlags::NEWLINE)
 
 STATIC_ASSERT (quote_escape_flag(ParseFlags::NOT_SPECIAL, false)==ParseFlags::NOT_SPECIAL)
 Optimizations for reducing branching in parsing loop.
 
 STATIC_ASSERT (quote_escape_flag(ParseFlags::QUOTE, false)==ParseFlags::QUOTE)
 
 STATIC_ASSERT (quote_escape_flag(ParseFlags::DELIMITER, false)==ParseFlags::DELIMITER)
 
 STATIC_ASSERT (quote_escape_flag(ParseFlags::NEWLINE, false)==ParseFlags::NEWLINE)
 
 STATIC_ASSERT (quote_escape_flag(ParseFlags::NOT_SPECIAL, true)==ParseFlags::NOT_SPECIAL)
 
 STATIC_ASSERT (quote_escape_flag(ParseFlags::QUOTE, true)==ParseFlags::QUOTE_ESCAPE_QUOTE)
 
 STATIC_ASSERT (quote_escape_flag(ParseFlags::DELIMITER, true)==ParseFlags::NOT_SPECIAL)
 
 STATIC_ASSERT (quote_escape_flag(ParseFlags::NEWLINE, true)==ParseFlags::NOT_SPECIAL)
 
std::string format_row (const std::vector< std::string > &row, csv::string_view delim)
 
std::vector< std::string > _get_col_names (csv::string_view head, CSVFormat format)
 Return a CSV's column names.
 
GuessScore calculate_score (csv::string_view head, const CSVFormat &format)
 
CSVGuessResult _guess_format (csv::string_view head, const std::vector< char > &delims)
 Guess the delimiter used by a delimiter-separated values file.
 
std::string json_escape_string (csv::string_view s) noexcept
 
template<typename T = int>
T csv_abs (T x)
 Calculate the absolute value of a number.
 
template<>
int csv_abs (int x)
 
template<>
long int csv_abs (long int x)
 
template<>
long long int csv_abs (long long int x)
 
template<>
float csv_abs (float x)
 
template<>
double csv_abs (double x)
 
template<>
long double csv_abs (long double x)
 
template<typename T , csv::enable_if_t< std::is_arithmetic< T >::value, int > = 0>
int num_digits (T x)
 Calculate the number of digits in a number.
 
template<typename T , csv::enable_if_t< std::is_unsigned< T >::value, int > = 0>
std::string to_string (T value)
 to_string() for unsigned integers
 
template<typename T >
CSV_CONST CONSTEXPR_14 long double pow10 (const T &n) noexcept
 Compute 10 to the power of n.
 
template<>
CSV_CONST CONSTEXPR_14 long double pow10 (const unsigned &n) noexcept
 Compute 10 to the power of n.
 
template<size_t Bytes>
CONSTEXPR_14 long double get_int_max ()
 Given a byte size, return the largest number than can be stored in an integer of that size.
 
template<size_t Bytes>
CONSTEXPR_14 long double get_uint_max ()
 Given a byte size, return the largest number than can be stored in an unsigned integer of that size.
 
CSV_PRIVATE CONSTEXPR_14 DataType _process_potential_exponential (csv::string_view exponential_part, const long double &coeff, long double *const out)
 Given a pointer to the start of what is start of the exponential part of a number written (possibly) in scientific notation parse the exponent.
 
CSV_PRIVATE CSV_PURE CONSTEXPR_14 DataType _determine_integral_type (const long double &number) noexcept
 Given the absolute value of an integer, determine what numeric type it fits in.
 
CONSTEXPR_14 DataType data_type (csv::string_view in, long double *const out, const char decimalSymbol)
 Distinguishes numeric from other text values.
 
template<typename T >
bool try_parse_hex (csv::string_view sv, T &parsedValue)
 

Variables

constexpr const int UNINITIALIZED_FIELD = -1
 
const int PAGE_SIZE = 4096
 Size of a memory page in bytes.
 
constexpr size_t ITERATION_CHUNK_SIZE = 10000000
 Chunk size for lazy-loading large CSV files.
 
CONSTEXPR_VALUE_14 long double CSV_INT8_MAX = get_int_max<1>()
 Largest number that can be stored in a 8-bit integer.
 
CONSTEXPR_VALUE_14 long double CSV_INT16_MAX = get_int_max<2>()
 Largest number that can be stored in a 16-bit integer.
 
CONSTEXPR_VALUE_14 long double CSV_INT32_MAX = get_int_max<4>()
 Largest number that can be stored in a 32-bit integer.
 
CONSTEXPR_VALUE_14 long double CSV_INT64_MAX = get_int_max<8>()
 Largest number that can be stored in a 64-bit integer.
 
CONSTEXPR_VALUE_14 long double CSV_UINT8_MAX = get_uint_max<1>()
 Largest number that can be stored in a 8-bit ungisned integer.
 
CONSTEXPR_VALUE_14 long double CSV_UINT16_MAX = get_uint_max<2>()
 Largest number that can be stored in a 16-bit unsigned integer.
 
CONSTEXPR_VALUE_14 long double CSV_UINT32_MAX = get_uint_max<4>()
 Largest number that can be stored in a 32-bit unsigned integer.
 
CONSTEXPR_VALUE_14 long double CSV_UINT64_MAX = get_uint_max<8>()
 Largest number that can be stored in a 64-bit unsigned integer.
 

Detailed Description

Stuff that is generally not of interest to end-users.

Typedef Documentation

◆ ColNamesPtr

using csv::internals::ColNamesPtr = typedef std::shared_ptr<ColNames>

Definition at line 13 of file col_names.hpp.

◆ ParseFlagMap

An array which maps ASCII chars to a parsing flag.

Definition at line 239 of file common.hpp.

◆ RawCSVDataPtr

using csv::internals::RawCSVDataPtr = typedef std::shared_ptr<RawCSVData>

Definition at line 171 of file raw_csv_data.hpp.

◆ WhitespaceMap

An array which maps ASCII chars to a flag indicating if it is whitespace.

Definition at line 242 of file common.hpp.

Enumeration Type Documentation

◆ ParseFlags

An enum used for describing the significance of each character with respect to CSV parsing.

See also
quote_escape_flag
Enumerator
QUOTE_ESCAPE_QUOTE 

A quote inside or terminating a quote_escaped field.

QUOTE 

Characters which may signify a quote escape.

NOT_SPECIAL 

Characters with no special meaning or escaped delimiters and newlines.

DELIMITER 

Characters which signify a new field.

NEWLINE 

Characters which signify a new row.

Definition at line 205 of file common.hpp.

Function Documentation

◆ _determine_integral_type()

CSV_PRIVATE CSV_PURE CONSTEXPR_14 DataType csv::internals::_determine_integral_type ( const long double number)
noexcept

Given the absolute value of an integer, determine what numeric type it fits in.

Definition at line 200 of file data_type.hpp.

◆ _get_col_names()

std::vector< std::string > csv::internals::_get_col_names ( csv::string_view  head,
CSVFormat  format 
)

Return a CSV's column names.

Parameters
[in]filenamePath to CSV file
[in]formatFormat of the CSV file

Definition at line 28 of file csv_reader.cpp.

◆ _guess_format()

CSVGuessResult csv::internals::_guess_format ( csv::string_view  head,
const std::vector< char > &  delims 
)

Guess the delimiter used by a delimiter-separated values file.

For each delimiter, find out which row length was most common (mode). The delimiter with the highest score (row_length × count) wins.

Header detection: If first row has >= columns than mode, use row 0. Otherwise use the first row with the mode length.

See csv::guess_format() public API documentation for detailed heuristic explanation.

Definition at line 103 of file csv_reader.cpp.

◆ _process_potential_exponential()

CSV_PRIVATE CONSTEXPR_14 DataType csv::internals::_process_potential_exponential ( csv::string_view  exponential_part,
const long double coeff,
long double *const  out 
)

Given a pointer to the start of what is start of the exponential part of a number written (possibly) in scientific notation parse the exponent.

Definition at line 180 of file data_type.hpp.

◆ arrayToDefault()

template<typename OutArray , typename T = typename OutArray::type>
CSV_CONST CONSTEXPR_17 OutArray csv::internals::arrayToDefault ( T &&  value)

Helper constexpr function to initialize an array with all the elements set to value.

Definition at line 29 of file basic_csv_parser.hpp.

◆ calculate_score()

GuessScore csv::internals::calculate_score ( csv::string_view  head,
const CSVFormat format 
)

Definition at line 41 of file csv_reader.cpp.

◆ csv_abs() [1/7]

template<>
double csv::internals::csv_abs ( double  x)
inline

Definition at line 51 of file csv_writer.hpp.

◆ csv_abs() [2/7]

template<>
float csv::internals::csv_abs ( float  x)
inline

Definition at line 46 of file csv_writer.hpp.

◆ csv_abs() [3/7]

template<>
int csv::internals::csv_abs ( int  x)
inline

Definition at line 31 of file csv_writer.hpp.

◆ csv_abs() [4/7]

template<>
long double csv::internals::csv_abs ( long double  x)
inline

Definition at line 56 of file csv_writer.hpp.

◆ csv_abs() [5/7]

template<>
long int csv::internals::csv_abs ( long int  x)
inline

Definition at line 36 of file csv_writer.hpp.

◆ csv_abs() [6/7]

template<>
long long int csv::internals::csv_abs ( long long int  x)
inline

Definition at line 41 of file csv_writer.hpp.

◆ csv_abs() [7/7]

template<typename T = int>
T csv::internals::csv_abs ( T  x)
inline

Calculate the absolute value of a number.

Definition at line 26 of file csv_writer.hpp.

◆ data_type()

CONSTEXPR_14 DataType csv::internals::data_type ( csv::string_view  in,
long double *const  out,
const char  decimalSymbol 
)

Distinguishes numeric from other text values.

Used by various type casting functions, like csv_parser::CSVReader::read_row()

Rules

  • Leading and trailing whitespace ("padding") ignored
  • A string of just whitespace is NULL
Parameters
[in]inString value to be examined
[out]outPointer to long double where results of numeric parsing get stored
[in]decimalSymbolthe character separating integral and decimal part, defaults to '.' if omitted

Definition at line 230 of file data_type.hpp.

◆ format_row()

std::string csv::internals::format_row ( const std::vector< std::string > &  row,
csv::string_view  delim 
)

Print a CSV row

Definition at line 9 of file csv_reader.cpp.

◆ get_csv_head() [1/3]

std::string csv::internals::get_csv_head ( csv::string_view  filename)

Definition at line 16 of file basic_csv_parser.cpp.

◆ get_csv_head() [2/3]

std::string csv::internals::get_csv_head ( csv::string_view  filename,
size_t  file_size 
)

Read the first 500KB of a CSV file.

Definition at line 20 of file basic_csv_parser.cpp.

◆ get_csv_head() [3/3]

template<typename TStream , csv::enable_if_t< std::is_base_of< std::istream, TStream >::value, int > = 0>
std::string csv::internals::get_csv_head ( TStream source)

Definition at line 194 of file basic_csv_parser.hpp.

◆ get_file_size()

size_t csv::internals::get_file_size ( csv::string_view  filename)

Definition at line 7 of file basic_csv_parser.cpp.

◆ get_int_max()

template<size_t Bytes>
CONSTEXPR_14 long double csv::internals::get_int_max ( )

Given a byte size, return the largest number than can be stored in an integer of that size.

Note: Provides a platform-agnostic way of mapping names like "long int" to byte sizes

Definition at line 105 of file data_type.hpp.

◆ get_uint_max()

template<size_t Bytes>
CONSTEXPR_14 long double csv::internals::get_uint_max ( )

Given a byte size, return the largest number than can be stored in an unsigned integer of that size.

Definition at line 130 of file data_type.hpp.

◆ is_equal()

template<typename T >
bool csv::internals::is_equal ( T  a,
T  b,
T  epsilon = 0.001 
)
inline

Returns true if two floating point values are about the same

Definition at line 193 of file common.hpp.

◆ json_escape_string()

std::string csv::internals::json_escape_string ( csv::string_view  s)
noexcept

Definition at line 88 of file csv_row_json.cpp.

◆ make_parse_flags() [1/2]

CSV_CONST CONSTEXPR_17 ParseFlagMap csv::internals::make_parse_flags ( char  delimiter)

Create a vector v where each index i corresponds to the ASCII number for a character and, v[i + 128] labels it according to the CSVReader::ParseFlags enum.

Definition at line 41 of file basic_csv_parser.hpp.

◆ make_parse_flags() [2/2]

CSV_CONST CONSTEXPR_17 ParseFlagMap csv::internals::make_parse_flags ( char  delimiter,
char  quote_char 
)

Create a vector v where each index i corresponds to the ASCII number for a character and, v[i + 128] labels it according to the CSVReader::ParseFlags enum.

Definition at line 53 of file basic_csv_parser.hpp.

◆ make_ws_flags() [1/2]

CSV_CONST CONSTEXPR_17 WhitespaceMap csv::internals::make_ws_flags ( const char ws_chars,
size_t  n_chars 
)

Create a vector v where each index i corresponds to the ASCII number for a character c and, v[i + 128] is true if c is a whitespace character.

Definition at line 63 of file basic_csv_parser.hpp.

◆ make_ws_flags() [2/2]

WhitespaceMap csv::internals::make_ws_flags ( const std::vector< char > &  flags)
inline

Definition at line 71 of file basic_csv_parser.hpp.

◆ num_digits()

template<typename T , csv::enable_if_t< std::is_arithmetic< T >::value, int > = 0>
int csv::internals::num_digits ( T  x)

Calculate the number of digits in a number.

Definition at line 67 of file csv_writer.hpp.

◆ pow10() [1/2]

template<typename T >
CSV_CONST CONSTEXPR_14 long double csv::internals::pow10 ( const T n)
noexcept

Compute 10 to the power of n.

Definition at line 40 of file data_type.hpp.

◆ pow10() [2/2]

template<>
CSV_CONST CONSTEXPR_14 long double csv::internals::pow10 ( const unsigned n)
noexcept

Compute 10 to the power of n.

Definition at line 57 of file data_type.hpp.

◆ quote_escape_flag()

constexpr ParseFlags csv::internals::quote_escape_flag ( ParseFlags  flag,
bool  quote_escape 
)
constexprnoexcept

Transform the ParseFlags given the context of whether or not the current field is quote escaped.

Definition at line 215 of file common.hpp.

◆ STATIC_ASSERT()

Optimizations for reducing branching in parsing loop.

Idea: The meaning of all non-quote characters changes depending on whether or not the parser is in a quote-escaped mode (0 or 1)

◆ to_string()

template<typename T , csv::enable_if_t< std::is_unsigned< T >::value, int > = 0>
std::string csv::internals::to_string ( T  value)
inline

to_string() for unsigned integers

to_string() for floating point numbers

to_string() for signed integers

Definition at line 84 of file csv_writer.hpp.

◆ try_parse_hex()

template<typename T >
bool csv::internals::try_parse_hex ( csv::string_view  sv,
T parsedValue 
)

Definition at line 14 of file parse_hex.hpp.

Variable Documentation

◆ CSV_INT16_MAX

CONSTEXPR_VALUE_14 long double csv::internals::CSV_INT16_MAX = get_int_max<2>()

Largest number that can be stored in a 16-bit integer.

Definition at line 155 of file data_type.hpp.

◆ CSV_INT32_MAX

CONSTEXPR_VALUE_14 long double csv::internals::CSV_INT32_MAX = get_int_max<4>()

Largest number that can be stored in a 32-bit integer.

Definition at line 158 of file data_type.hpp.

◆ CSV_INT64_MAX

CONSTEXPR_VALUE_14 long double csv::internals::CSV_INT64_MAX = get_int_max<8>()

Largest number that can be stored in a 64-bit integer.

Definition at line 161 of file data_type.hpp.

◆ CSV_INT8_MAX

CONSTEXPR_VALUE_14 long double csv::internals::CSV_INT8_MAX = get_int_max<1>()

Largest number that can be stored in a 8-bit integer.

Definition at line 152 of file data_type.hpp.

◆ CSV_UINT16_MAX

CONSTEXPR_VALUE_14 long double csv::internals::CSV_UINT16_MAX = get_uint_max<2>()

Largest number that can be stored in a 16-bit unsigned integer.

Definition at line 167 of file data_type.hpp.

◆ CSV_UINT32_MAX

CONSTEXPR_VALUE_14 long double csv::internals::CSV_UINT32_MAX = get_uint_max<4>()

Largest number that can be stored in a 32-bit unsigned integer.

Definition at line 170 of file data_type.hpp.

◆ CSV_UINT64_MAX

CONSTEXPR_VALUE_14 long double csv::internals::CSV_UINT64_MAX = get_uint_max<8>()

Largest number that can be stored in a 64-bit unsigned integer.

Definition at line 173 of file data_type.hpp.

◆ CSV_UINT8_MAX

CONSTEXPR_VALUE_14 long double csv::internals::CSV_UINT8_MAX = get_uint_max<1>()

Largest number that can be stored in a 8-bit ungisned integer.

Definition at line 164 of file data_type.hpp.

◆ ITERATION_CHUNK_SIZE

constexpr size_t csv::internals::ITERATION_CHUNK_SIZE = 10000000
constexpr

Chunk size for lazy-loading large CSV files.

The worker thread reads this many bytes at a time (10MB).

CRITICAL INVARIANT: Field boundaries at chunk transitions must be preserved. Bug #280 was caused by fields spanning chunk boundaries being corrupted.

Note
Tests must write >10MB of data to cross chunk boundaries
See also
basic_csv_parser.cpp MmapParser::next() for chunk transition logic

Definition at line 190 of file common.hpp.

◆ PAGE_SIZE

const int csv::internals::PAGE_SIZE = 4096

Size of a memory page in bytes.

Used by csv::internals::CSVFieldArray when allocating blocks.

Definition at line 177 of file common.hpp.

◆ UNINITIALIZED_FIELD

constexpr const int csv::internals::UNINITIALIZED_FIELD = -1
constexpr

Definition at line 24 of file basic_csv_parser.hpp.