|
Vince's CSV Parser
|
Contains the main CSV parsing algorithm and various utility functions. More...
#include <algorithm>#include <array>#include <fstream>#include <memory>#include <unordered_map>#include <unordered_set>#include <vector>#include "../external/mio.hpp"#include "basic_csv_parser_simd.hpp"#include "col_names.hpp"#include "common.hpp"#include "csv_format.hpp"#include "csv_row.hpp"#include "row_deque.hpp"Go to the source code of this file.
Classes | |
| struct | csv::internals::GuessScore |
| struct | csv::internals::ResolvedFormat |
| class | csv::internals::IBasicCSVParser |
| Abstract base class which provides CSV parsing logic. More... | |
| class | csv::internals::StreamParser< TStream > |
| A class for parsing CSV data from any std::istream, including non-seekable sources such as pipes and decompression filters. More... | |
| class | csv::internals::MmapParser |
| Parser for memory-mapped files. More... | |
Namespaces | |
| namespace | csv |
| The all encompassing namespace. | |
Typedefs | |
| using | csv::RowCollection = internals::ThreadSafeDeque< CSVRow > |
| Standard type for storing collection of rows. | |
Functions | |
| template<typename OutArray , typename T = typename OutArray::type> | |
| CSV_CONST CONSTEXPR_17 OutArray | csv::internals::arrayToDefault (T &&value) |
| Helper constexpr function to initialize an array with all the elements set to value. | |
| GuessScore | csv::internals::calculate_score (csv::string_view head, const CSVFormat &format) |
| CSVGuessResult | csv::internals::guess_format (csv::string_view head, const std::vector< char > &delims={ ',', '|', '\t', ';', '^', '~' }) |
| Guess the delimiter used by a delimiter-separated values file. | |
| CSV_CONST CONSTEXPR_17 ParseFlagMap | csv::internals::make_parse_flags (char delimiter) |
| Create a vector v where each index i corresponds to the ASCII number for a character and, v[i + 128] labels it according to the CSVReader::ParseFlags enum. | |
| CSV_CONST CONSTEXPR_17 ParseFlagMap | csv::internals::make_parse_flags (char delimiter, char quote_char) |
| Create a vector v where each index i corresponds to the ASCII number for a character and, v[i + 128] labels it according to the CSVReader::ParseFlags enum. | |
| char | csv::internals::infer_char_for_flag (const ParseFlagMap &parse_flags, ParseFlags target, char fallback) noexcept |
| char | csv::internals::infer_delimiter (const ParseFlagMap &parse_flags) noexcept |
| char | csv::internals::infer_quote_char (const ParseFlagMap &parse_flags, char fallback='"') noexcept |
| CSV_CONST CONSTEXPR_17 WhitespaceMap | csv::internals::make_ws_flags (const char *ws_chars, size_t n_chars) |
| Create a vector v where each index i corresponds to the ASCII number for a character c and, v[i + 128] is true if c is a whitespace character. | |
| WhitespaceMap | csv::internals::make_ws_flags (const std::vector< char > &flags) |
| template<typename TStream , csv::enable_if_t< std::is_base_of< std::istream, TStream >::value, int > = 0> | |
| std::string | csv::internals::get_csv_head_stream (TStream &source) |
| Read the first 500KB from a seekless stream source. | |
| std::pair< std::string, size_t > | csv::internals::get_csv_head_mmap (csv::string_view filename) |
| Read the first 500KB from a filename using mmap. | |
| std::string | csv::internals::get_csv_head (csv::string_view filename) |
| Compatibility shim selecting stream on Emscripten and mmap otherwise. | |
| CSVGuessResult | csv::guess_format (csv::string_view filename, const std::vector< char > &delims={ ',', '|', '\t', ';', '^', '~' }) |
| Guess the delimiter, header row, and mode column count of a CSV file. | |
Variables | |
| constexpr const int | csv::internals::UNINITIALIZED_FIELD = -1 |
Contains the main CSV parsing algorithm and various utility functions.
Definition in file basic_csv_parser.hpp.
| CSV_CONST CONSTEXPR_17 OutArray csv::internals::arrayToDefault | ( | T && | value | ) |
Helper constexpr function to initialize an array with all the elements set to value.
Definition at line 31 of file basic_csv_parser.hpp.
| GuessScore csv::internals::calculate_score | ( | csv::string_view | head, |
| const CSVFormat & | format | ||
| ) |
Definition at line 8 of file basic_csv_parser_guessing.cpp.
| std::string csv::internals::get_csv_head | ( | csv::string_view | filename | ) |
Compatibility shim selecting stream on Emscripten and mmap otherwise.
Definition at line 37 of file basic_csv_parser.cpp.
| std::pair< std::string, size_t > csv::internals::get_csv_head_mmap | ( | csv::string_view | filename | ) |
Read the first 500KB from a filename using mmap.
Also returns the total file size so callers avoid a second mmap open.
Definition at line 24 of file basic_csv_parser.cpp.
| std::string csv::internals::get_csv_head_stream | ( | TStream & | source | ) |
Read the first 500KB from a seekless stream source.
Definition at line 118 of file basic_csv_parser.hpp.
| CSVGuessResult csv::internals::guess_format | ( | csv::string_view | head, |
| const std::vector< char > & | delims = { ',', '|', '\t', ';', '^', '~' } |
||
| ) |
Guess the delimiter used by a delimiter-separated values file.
For each delimiter, find out which row length was most common (mode). The delimiter with the highest score (row_length * count) wins.
Header detection: If first row has >= columns than mode, use row 0. Otherwise use the first row with the mode length.
For each delimiter, find out which row length was most common (mode). The delimiter with the highest score (row_length * count) wins.
Header detection: If first row has >= columns than mode, use row 0. Otherwise use the first row with the mode length.
Definition at line 69 of file basic_csv_parser_guessing.cpp.
|
inlinenoexcept |
Definition at line 75 of file basic_csv_parser.hpp.
|
inlinenoexcept |
Definition at line 89 of file basic_csv_parser.hpp.
|
inlinenoexcept |
Definition at line 95 of file basic_csv_parser.hpp.
| CSV_CONST CONSTEXPR_17 ParseFlagMap csv::internals::make_parse_flags | ( | char | delimiter | ) |
Create a vector v where each index i corresponds to the ASCII number for a character and, v[i + 128] labels it according to the CSVReader::ParseFlags enum.
Definition at line 57 of file basic_csv_parser.hpp.
| CSV_CONST CONSTEXPR_17 ParseFlagMap csv::internals::make_parse_flags | ( | char | delimiter, |
| char | quote_char | ||
| ) |
Create a vector v where each index i corresponds to the ASCII number for a character and, v[i + 128] labels it according to the CSVReader::ParseFlags enum.
Definition at line 69 of file basic_csv_parser.hpp.
| CSV_CONST CONSTEXPR_17 WhitespaceMap csv::internals::make_ws_flags | ( | const char * | ws_chars, |
| size_t | n_chars | ||
| ) |
Create a vector v where each index i corresponds to the ASCII number for a character c and, v[i + 128] is true if c is a whitespace character.
Definition at line 103 of file basic_csv_parser.hpp.
Definition at line 111 of file basic_csv_parser.hpp.
|
constexpr |
Definition at line 26 of file basic_csv_parser.hpp.