|
Vince's CSV Parser
|
Stuff that is generally not of interest to end-users. More...
Classes | |
| struct | ColNames |
| A data structure for handling column name information. More... | |
| class | CSVFieldList |
| A class used for efficiently storing RawCSVField objects and expanding as necessary. More... | |
| struct | GuessScore |
| class | IBasicCSVParser |
| Abstract base class which provides CSV parsing logic. More... | |
| class | is_equality_comparable |
| class | is_hashable |
| class | MmapParser |
| Parser for memory-mapped files. More... | |
| struct | RawCSVData |
| A class for storing raw CSV data and associated metadata. More... | |
| struct | RawCSVField |
| A barebones class used for describing CSV fields. More... | |
| class | StreamParser |
A class for parsing CSV data from a std::stringstream or an std::ifstream More... | |
| class | ThreadSafeDeque |
| A std::deque wrapper which allows multiple read and write threads to concurrently access it along with providing read threads the ability to wait for the deque to become populated. More... | |
Typedefs | |
| using | ColNamesPtr = std::shared_ptr< ColNames > |
| using | ParseFlagMap = std::array< ParseFlags, 256 > |
| An array which maps ASCII chars to a parsing flag. | |
| using | WhitespaceMap = std::array< bool, 256 > |
| An array which maps ASCII chars to a flag indicating if it is whitespace. | |
| using | RawCSVDataPtr = std::shared_ptr< RawCSVData > |
Enumerations | |
| enum class | ParseFlags { QUOTE_ESCAPE_QUOTE = 0 , QUOTE = 2 | 1 , NOT_SPECIAL = 4 , DELIMITER = 4 | 2 , NEWLINE = 4 | 2 | 1 } |
| An enum used for describing the significance of each character with respect to CSV parsing. More... | |
Functions | |
| size_t | get_file_size (csv::string_view filename) |
| std::string | get_csv_head (csv::string_view filename) |
| std::string | get_csv_head (csv::string_view filename, size_t file_size) |
| Read the first 500KB of a CSV file. | |
| template<typename OutArray , typename T = typename OutArray::type> | |
| CSV_CONST CONSTEXPR_17 OutArray | arrayToDefault (T &&value) |
| Helper constexpr function to initialize an array with all the elements set to value. | |
| CSV_CONST CONSTEXPR_17 ParseFlagMap | make_parse_flags (char delimiter) |
| Create a vector v where each index i corresponds to the ASCII number for a character and, v[i + 128] labels it according to the CSVReader::ParseFlags enum. | |
| CSV_CONST CONSTEXPR_17 ParseFlagMap | make_parse_flags (char delimiter, char quote_char) |
| Create a vector v where each index i corresponds to the ASCII number for a character and, v[i + 128] labels it according to the CSVReader::ParseFlags enum. | |
| CSV_CONST CONSTEXPR_17 WhitespaceMap | make_ws_flags (const char *ws_chars, size_t n_chars) |
| Create a vector v where each index i corresponds to the ASCII number for a character c and, v[i + 128] is true if c is a whitespace character. | |
| WhitespaceMap | make_ws_flags (const std::vector< char > &flags) |
| template<typename TStream , csv::enable_if_t< std::is_base_of< std::istream, TStream >::value, int > = 0> | |
| std::string | get_csv_head (TStream &source) |
| template<typename T > | |
| bool | is_equal (T a, T b, T epsilon=0.001) |
| constexpr ParseFlags | quote_escape_flag (ParseFlags flag, bool quote_escape) noexcept |
| Transform the ParseFlags given the context of whether or not the current field is quote escaped. | |
| STATIC_ASSERT (ParseFlags::DELIMITER< ParseFlags::NEWLINE) | |
| STATIC_ASSERT (quote_escape_flag(ParseFlags::NOT_SPECIAL, false)==ParseFlags::NOT_SPECIAL) | |
| Optimizations for reducing branching in parsing loop. | |
| STATIC_ASSERT (quote_escape_flag(ParseFlags::QUOTE, false)==ParseFlags::QUOTE) | |
| STATIC_ASSERT (quote_escape_flag(ParseFlags::DELIMITER, false)==ParseFlags::DELIMITER) | |
| STATIC_ASSERT (quote_escape_flag(ParseFlags::NEWLINE, false)==ParseFlags::NEWLINE) | |
| STATIC_ASSERT (quote_escape_flag(ParseFlags::NOT_SPECIAL, true)==ParseFlags::NOT_SPECIAL) | |
| STATIC_ASSERT (quote_escape_flag(ParseFlags::QUOTE, true)==ParseFlags::QUOTE_ESCAPE_QUOTE) | |
| STATIC_ASSERT (quote_escape_flag(ParseFlags::DELIMITER, true)==ParseFlags::NOT_SPECIAL) | |
| STATIC_ASSERT (quote_escape_flag(ParseFlags::NEWLINE, true)==ParseFlags::NOT_SPECIAL) | |
| std::string | format_row (const std::vector< std::string > &row, csv::string_view delim) |
| std::vector< std::string > | _get_col_names (csv::string_view head, CSVFormat format) |
| Return a CSV's column names. | |
| GuessScore | calculate_score (csv::string_view head, const CSVFormat &format) |
| CSVGuessResult | _guess_format (csv::string_view head, const std::vector< char > &delims) |
| Guess the delimiter used by a delimiter-separated values file. | |
| std::string | json_escape_string (csv::string_view s) noexcept |
| template<typename T = int> | |
| T | csv_abs (T x) |
| Calculate the absolute value of a number. | |
| template<> | |
| int | csv_abs (int x) |
| template<> | |
| long int | csv_abs (long int x) |
| template<> | |
| long long int | csv_abs (long long int x) |
| template<> | |
| float | csv_abs (float x) |
| template<> | |
| double | csv_abs (double x) |
| template<> | |
| long double | csv_abs (long double x) |
| template<typename T , csv::enable_if_t< std::is_arithmetic< T >::value, int > = 0> | |
| int | num_digits (T x) |
| Calculate the number of digits in a number. | |
| template<typename T , csv::enable_if_t< std::is_unsigned< T >::value, int > = 0> | |
| std::string | to_string (T value) |
| to_string() for unsigned integers | |
| template<typename T > | |
| CSV_CONST CONSTEXPR_14 long double | pow10 (const T &n) noexcept |
| Compute 10 to the power of n. | |
| template<> | |
| CSV_CONST CONSTEXPR_14 long double | pow10 (const unsigned &n) noexcept |
| Compute 10 to the power of n. | |
| template<size_t Bytes> | |
| CONSTEXPR_14 long double | get_int_max () |
| Given a byte size, return the largest number than can be stored in an integer of that size. | |
| template<size_t Bytes> | |
| CONSTEXPR_14 long double | get_uint_max () |
| Given a byte size, return the largest number than can be stored in an unsigned integer of that size. | |
| CSV_PRIVATE CONSTEXPR_14 DataType | _process_potential_exponential (csv::string_view exponential_part, const long double &coeff, long double *const out) |
| Given a pointer to the start of what is start of the exponential part of a number written (possibly) in scientific notation parse the exponent. | |
| CSV_PRIVATE CSV_PURE CONSTEXPR_14 DataType | _determine_integral_type (const long double &number) noexcept |
| Given the absolute value of an integer, determine what numeric type it fits in. | |
| CONSTEXPR_14 DataType | data_type (csv::string_view in, long double *const out, const char decimalSymbol) |
| Distinguishes numeric from other text values. | |
| template<typename T > | |
| bool | try_parse_hex (csv::string_view sv, T &parsedValue) |
Variables | |
| constexpr const int | UNINITIALIZED_FIELD = -1 |
| const int | PAGE_SIZE = 4096 |
| Size of a memory page in bytes. | |
| constexpr size_t | ITERATION_CHUNK_SIZE = 10000000 |
| Chunk size for lazy-loading large CSV files. | |
| CONSTEXPR_VALUE_14 long double | CSV_INT8_MAX = get_int_max<1>() |
| Largest number that can be stored in a 8-bit integer. | |
| CONSTEXPR_VALUE_14 long double | CSV_INT16_MAX = get_int_max<2>() |
| Largest number that can be stored in a 16-bit integer. | |
| CONSTEXPR_VALUE_14 long double | CSV_INT32_MAX = get_int_max<4>() |
| Largest number that can be stored in a 32-bit integer. | |
| CONSTEXPR_VALUE_14 long double | CSV_INT64_MAX = get_int_max<8>() |
| Largest number that can be stored in a 64-bit integer. | |
| CONSTEXPR_VALUE_14 long double | CSV_UINT8_MAX = get_uint_max<1>() |
| Largest number that can be stored in a 8-bit ungisned integer. | |
| CONSTEXPR_VALUE_14 long double | CSV_UINT16_MAX = get_uint_max<2>() |
| Largest number that can be stored in a 16-bit unsigned integer. | |
| CONSTEXPR_VALUE_14 long double | CSV_UINT32_MAX = get_uint_max<4>() |
| Largest number that can be stored in a 32-bit unsigned integer. | |
| CONSTEXPR_VALUE_14 long double | CSV_UINT64_MAX = get_uint_max<8>() |
| Largest number that can be stored in a 64-bit unsigned integer. | |
Stuff that is generally not of interest to end-users.
Definition at line 13 of file col_names.hpp.
| using csv::internals::ParseFlagMap = typedef std::array<ParseFlags, 256> |
An array which maps ASCII chars to a parsing flag.
Definition at line 239 of file common.hpp.
| using csv::internals::RawCSVDataPtr = typedef std::shared_ptr<RawCSVData> |
Definition at line 171 of file raw_csv_data.hpp.
| using csv::internals::WhitespaceMap = typedef std::array<bool, 256> |
An array which maps ASCII chars to a flag indicating if it is whitespace.
Definition at line 242 of file common.hpp.
|
strong |
An enum used for describing the significance of each character with respect to CSV parsing.
Definition at line 205 of file common.hpp.
|
noexcept |
Given the absolute value of an integer, determine what numeric type it fits in.
Definition at line 200 of file data_type.hpp.
| std::vector< std::string > csv::internals::_get_col_names | ( | csv::string_view | head, |
| CSVFormat | format | ||
| ) |
Return a CSV's column names.
| [in] | filename | Path to CSV file |
| [in] | format | Format of the CSV file |
Definition at line 28 of file csv_reader.cpp.
| CSVGuessResult csv::internals::_guess_format | ( | csv::string_view | head, |
| const std::vector< char > & | delims | ||
| ) |
Guess the delimiter used by a delimiter-separated values file.
For each delimiter, find out which row length was most common (mode). The delimiter with the highest score (row_length × count) wins.
Header detection: If first row has >= columns than mode, use row 0. Otherwise use the first row with the mode length.
See csv::guess_format() public API documentation for detailed heuristic explanation.
Definition at line 103 of file csv_reader.cpp.
| CSV_PRIVATE CONSTEXPR_14 DataType csv::internals::_process_potential_exponential | ( | csv::string_view | exponential_part, |
| const long double & | coeff, | ||
| long double *const | out | ||
| ) |
Given a pointer to the start of what is start of the exponential part of a number written (possibly) in scientific notation parse the exponent.
Definition at line 180 of file data_type.hpp.
| CSV_CONST CONSTEXPR_17 OutArray csv::internals::arrayToDefault | ( | T && | value | ) |
Helper constexpr function to initialize an array with all the elements set to value.
Definition at line 29 of file basic_csv_parser.hpp.
| GuessScore csv::internals::calculate_score | ( | csv::string_view | head, |
| const CSVFormat & | format | ||
| ) |
Definition at line 41 of file csv_reader.cpp.
Definition at line 51 of file csv_writer.hpp.
Definition at line 46 of file csv_writer.hpp.
Definition at line 31 of file csv_writer.hpp.
Definition at line 56 of file csv_writer.hpp.
Definition at line 36 of file csv_writer.hpp.
Definition at line 41 of file csv_writer.hpp.
Calculate the absolute value of a number.
Definition at line 26 of file csv_writer.hpp.
| CONSTEXPR_14 DataType csv::internals::data_type | ( | csv::string_view | in, |
| long double *const | out, | ||
| const char | decimalSymbol | ||
| ) |
Distinguishes numeric from other text values.
Used by various type casting functions, like csv_parser::CSVReader::read_row()
| [in] | in | String value to be examined |
| [out] | out | Pointer to long double where results of numeric parsing get stored |
| [in] | decimalSymbol | the character separating integral and decimal part, defaults to '.' if omitted |
Definition at line 230 of file data_type.hpp.
| std::string csv::internals::format_row | ( | const std::vector< std::string > & | row, |
| csv::string_view | delim | ||
| ) |
Print a CSV row
Definition at line 9 of file csv_reader.cpp.
| std::string csv::internals::get_csv_head | ( | csv::string_view | filename | ) |
Definition at line 16 of file basic_csv_parser.cpp.
| std::string csv::internals::get_csv_head | ( | csv::string_view | filename, |
| size_t | file_size | ||
| ) |
Read the first 500KB of a CSV file.
Definition at line 20 of file basic_csv_parser.cpp.
| std::string csv::internals::get_csv_head | ( | TStream & | source | ) |
Definition at line 194 of file basic_csv_parser.hpp.
| size_t csv::internals::get_file_size | ( | csv::string_view | filename | ) |
Definition at line 7 of file basic_csv_parser.cpp.
Given a byte size, return the largest number than can be stored in an integer of that size.
Note: Provides a platform-agnostic way of mapping names like "long int" to byte sizes
Definition at line 105 of file data_type.hpp.
Given a byte size, return the largest number than can be stored in an unsigned integer of that size.
Definition at line 130 of file data_type.hpp.
Returns true if two floating point values are about the same
Definition at line 193 of file common.hpp.
|
noexcept |
Definition at line 88 of file csv_row_json.cpp.
| CSV_CONST CONSTEXPR_17 ParseFlagMap csv::internals::make_parse_flags | ( | char | delimiter | ) |
Create a vector v where each index i corresponds to the ASCII number for a character and, v[i + 128] labels it according to the CSVReader::ParseFlags enum.
Definition at line 41 of file basic_csv_parser.hpp.
| CSV_CONST CONSTEXPR_17 ParseFlagMap csv::internals::make_parse_flags | ( | char | delimiter, |
| char | quote_char | ||
| ) |
Create a vector v where each index i corresponds to the ASCII number for a character and, v[i + 128] labels it according to the CSVReader::ParseFlags enum.
Definition at line 53 of file basic_csv_parser.hpp.
| CSV_CONST CONSTEXPR_17 WhitespaceMap csv::internals::make_ws_flags | ( | const char * | ws_chars, |
| size_t | n_chars | ||
| ) |
Create a vector v where each index i corresponds to the ASCII number for a character c and, v[i + 128] is true if c is a whitespace character.
Definition at line 63 of file basic_csv_parser.hpp.
|
inline |
Definition at line 71 of file basic_csv_parser.hpp.
Calculate the number of digits in a number.
Definition at line 67 of file csv_writer.hpp.
Compute 10 to the power of n.
Definition at line 40 of file data_type.hpp.
Compute 10 to the power of n.
Definition at line 57 of file data_type.hpp.
|
constexprnoexcept |
Transform the ParseFlags given the context of whether or not the current field is quote escaped.
Definition at line 215 of file common.hpp.
| csv::internals::STATIC_ASSERT | ( | quote_escape_flag(ParseFlags::NOT_SPECIAL, false) | = =ParseFlags::NOT_SPECIAL | ) |
Optimizations for reducing branching in parsing loop.
Idea: The meaning of all non-quote characters changes depending on whether or not the parser is in a quote-escaped mode (0 or 1)
|
inline |
to_string() for unsigned integers
to_string() for floating point numbers
to_string() for signed integers
Definition at line 84 of file csv_writer.hpp.
| bool csv::internals::try_parse_hex | ( | csv::string_view | sv, |
| T & | parsedValue | ||
| ) |
Definition at line 14 of file parse_hex.hpp.
| CONSTEXPR_VALUE_14 long double csv::internals::CSV_INT16_MAX = get_int_max<2>() |
Largest number that can be stored in a 16-bit integer.
Definition at line 155 of file data_type.hpp.
| CONSTEXPR_VALUE_14 long double csv::internals::CSV_INT32_MAX = get_int_max<4>() |
Largest number that can be stored in a 32-bit integer.
Definition at line 158 of file data_type.hpp.
| CONSTEXPR_VALUE_14 long double csv::internals::CSV_INT64_MAX = get_int_max<8>() |
Largest number that can be stored in a 64-bit integer.
Definition at line 161 of file data_type.hpp.
| CONSTEXPR_VALUE_14 long double csv::internals::CSV_INT8_MAX = get_int_max<1>() |
Largest number that can be stored in a 8-bit integer.
Definition at line 152 of file data_type.hpp.
| CONSTEXPR_VALUE_14 long double csv::internals::CSV_UINT16_MAX = get_uint_max<2>() |
Largest number that can be stored in a 16-bit unsigned integer.
Definition at line 167 of file data_type.hpp.
| CONSTEXPR_VALUE_14 long double csv::internals::CSV_UINT32_MAX = get_uint_max<4>() |
Largest number that can be stored in a 32-bit unsigned integer.
Definition at line 170 of file data_type.hpp.
| CONSTEXPR_VALUE_14 long double csv::internals::CSV_UINT64_MAX = get_uint_max<8>() |
Largest number that can be stored in a 64-bit unsigned integer.
Definition at line 173 of file data_type.hpp.
| CONSTEXPR_VALUE_14 long double csv::internals::CSV_UINT8_MAX = get_uint_max<1>() |
Largest number that can be stored in a 8-bit ungisned integer.
Definition at line 164 of file data_type.hpp.
Chunk size for lazy-loading large CSV files.
The worker thread reads this many bytes at a time (10MB).
CRITICAL INVARIANT: Field boundaries at chunk transitions must be preserved. Bug #280 was caused by fields spanning chunk boundaries being corrupted.
Definition at line 190 of file common.hpp.
Size of a memory page in bytes.
Used by csv::internals::CSVFieldArray when allocating blocks.
Definition at line 177 of file common.hpp.
Definition at line 24 of file basic_csv_parser.hpp.