|
Vince's CSV Parser
|
The all encompassing namespace. More...
Classes | |
| class | CSVField |
| Data type representing individual CSV values. More... | |
| struct | CSVFileInfo |
| Returned by get_file_info() More... | |
| class | CSVFormat |
| Stores information about how to parse a CSV file. More... | |
| struct | CSVGuessResult |
| Stores the inferred format of a CSV file. More... | |
| class | CSVReader |
| Main class for parsing CSVs from files and in-memory sources. More... | |
| class | CSVRow |
| Data structure for representing CSV rows. More... | |
| class | CSVStat |
| Class for calculating statistics from CSV files and in-memory sources. More... | |
| class | DataFrame |
| class | DataFrameOptions |
| Allows configuration of DataFrame behavior. More... | |
| class | DataFrameRow |
| Proxy class that wraps a CSVRow and intercepts field access to check for edits. More... | |
| class | DelimWriter |
| Class for writing delimiter separated values files. More... | |
Typedefs | |
| using | RowCollection = internals::ThreadSafeDeque< CSVRow > |
| Standard type for storing collection of rows. | |
| using | string_view = nonstd::string_view |
| The string_view class used by this library. | |
| template<bool B, class T = void> | |
| using | enable_if_t = typename std::enable_if< B, T >::type |
| template<typename F , typename... Args> | |
| using | invoke_result_t = typename std::result_of< F(Args...)>::type |
Enumerations | |
| enum class | VariableColumnPolicy { THROW = -1 , IGNORE_ROW = 0 , KEEP = 1 , KEEP_NON_EMPTY = 2 } |
| Determines how to handle rows that are shorter or longer than the majority. More... | |
| enum class | ColumnNamePolicy { EXACT = 0 , CASE_INSENSITIVE = 1 } |
| Determines how column name lookups are performed. More... | |
| enum class | DataType { UNKNOWN = -1 , CSV_NULL , CSV_STRING , CSV_INT8 , CSV_INT16 , CSV_INT32 , CSV_INT64 , CSV_BIGINT , CSV_DOUBLE } |
| Enumerates the different CSV field types that are recognized by this library. More... | |
Functions | |
| CSVGuessResult | guess_format (csv::string_view filename, const std::vector< char > &delims={ ',', '|', '\t', ';', '^', '~' }) |
| Guess the delimiter, header row, and mode column count of a CSV file. | |
| CSVRow::operator std::vector< std::string > () const | |
| CSV_NON_NULL (2) CSVRow | |
| template<> | |
| std::string | CSVField::get< std::string > () |
| Retrieve this field's original string. | |
| template<> | |
| CONSTEXPR_14 csv::string_view | CSVField::get< csv::string_view > () |
| Retrieve a view over this field's string. | |
| template<> | |
| bool | CSVField::try_get< std::string > (std::string &out) noexcept |
| Non-throwing retrieval of field as std::string. | |
| template<> | |
| CONSTEXPR_14 bool | CSVField::try_get< csv::string_view > (csv::string_view &out) noexcept |
| Non-throwing retrieval of field as csv::string_view. | |
| template<class OutputStream , char Delim, char Quote, bool Flush> | |
| DelimWriter< OutputStream, Delim, Quote, Flush > & | operator<< (DelimWriter< OutputStream, Delim, Quote, Flush > &writer, const CSVRow &row) |
| template<class OutputStream , char Delim, char Quote, bool Flush, typename KeyType > | |
| DelimWriter< OutputStream, Delim, Quote, Flush > & | operator<< (DelimWriter< OutputStream, Delim, Quote, Flush > &writer, const DataFrameRow< KeyType > &row) |
| Overload for writing a DataFrameRow (respects sparse overlay edits). | |
Utility Functions | |
| std::unordered_map< std::string, DataType > | csv_data_types (const std::string &filename) |
| Useful for uploading CSV files to SQL databases. | |
| CSVFileInfo | get_file_info (const std::string &filename) |
| Get basic information about a CSV file. | |
| std::vector< std::string > | get_col_names (csv::string_view filename, const CSVFormat &format=CSVFormat::guess_csv()) |
| Get the column names of a CSV file using just the first 500KB. | |
| long long | get_col_pos (csv::string_view filename, csv::string_view col_name, const CSVFormat &format=CSVFormat::guess_csv()) |
| Find the position of a column in a CSV file or CSV_NOT_FOUND otherwise. | |
Shorthand Parsing Functions | |
Convenience functions for parsing small strings | |
| CSVReader | parse (csv::string_view in, const CSVFormat &format=CSVFormat::guess_csv()) |
| Parse CSV from a string view, copying the input into an owned buffer. | |
| CSVReader | parse_unsafe (csv::string_view in, CSVFormat format=CSVFormat::guess_csv()) |
| Parse CSV from an in-memory view with zero copy. | |
| CSVReader | parse_no_header (csv::string_view in) |
| Parses a CSV string with no headers. | |
| CSVReader | operator""_csv (const char *in, size_t n) |
| Parse a RFC 4180 CSV string. | |
| CSVReader | operator""_csv_no_header (const char *in, size_t n) |
| A shorthand for csv::parse_no_header(). | |
Variables | |
| constexpr int | CSV_NOT_FOUND = -1 |
| Integer indicating a requested column wasn't found. | |
| constexpr unsigned | CHAR_OFFSET = std::numeric_limits<char>::is_signed ? 128 : 0 |
| Offset to convert char into array index. | |
CSV Writing | |
| template<class OutputStream , bool Flush = true> | |
| using | CSVWriter = DelimWriter< OutputStream, ',', '"', Flush> |
| An alias for csv::DelimWriter for writing standard CSV files. | |
| template<class OutputStream , bool Flush = true> | |
| using | TSVWriter = DelimWriter< OutputStream, '\t', '"', Flush> |
| Class for writing tab-separated values files. | |
| template<class OutputStream > | |
| CSVWriter< OutputStream > | make_csv_writer (OutputStream &out, bool quote_minimal=true) |
| Return a csv::CSVWriter over the output stream. | |
| template<class OutputStream > | |
| CSVWriter< OutputStream, false > | make_csv_writer_buffered (OutputStream &out, bool quote_minimal=true) |
| Return a buffered csv::CSVWriter over the output stream (does not auto flush) | |
| template<class OutputStream > | |
| TSVWriter< OutputStream > | make_tsv_writer (OutputStream &out, bool quote_minimal=true) |
| Return a csv::TSVWriter over the output stream. | |
| template<class OutputStream > | |
| TSVWriter< OutputStream, false > | make_tsv_writer_buffered (OutputStream &out, bool quote_minimal=true) |
| Return a buffered csv::TSVWriter over the output stream (does not auto flush) | |
The all encompassing namespace.
| using csv::CSVWriter = typedef DelimWriter<OutputStream, ',', '"', Flush> |
An alias for csv::DelimWriter for writing standard CSV files.
csv::make_csv_writer() to in instatiate this class over an actual output stream. Definition at line 509 of file csv_writer.hpp.
| using csv::enable_if_t = typedef typename std::enable_if<B, T>::type |
Definition at line 158 of file common.hpp.
| using csv::invoke_result_t = typedef typename std::result_of<F(Args...)>::type |
Definition at line 169 of file common.hpp.
| using csv::RowCollection = typedef internals::ThreadSafeDeque<CSVRow> |
Standard type for storing collection of rows.
Definition at line 168 of file basic_csv_parser.hpp.
The string_view class used by this library.
Definition at line 135 of file common.hpp.
| using csv::TSVWriter = typedef DelimWriter<OutputStream, '\t', '"', Flush> |
Class for writing tab-separated values files.
csv::make_tsv_writer() to in instatiate this class over an actual output stream. Definition at line 520 of file csv_writer.hpp.
|
strong |
Determines how column name lookups are performed.
| Enumerator | |
|---|---|
| EXACT | Case-sensitive match (default) |
| CASE_INSENSITIVE | Case-insensitive match. |
Definition at line 29 of file csv_format.hpp.
|
strong |
Enumerates the different CSV field types that are recognized by this library.
Definition at line 20 of file data_type.hpp.
|
strong |
Determines how to handle rows that are shorter or longer than the majority.
Definition at line 21 of file csv_format.hpp.
| std::unordered_map< std::string, DataType > csv::csv_data_types | ( | const std::string & | filename | ) |
Useful for uploading CSV files to SQL databases.
Return a data type for each column such that every value in a column can be converted to the corresponding data type without data loss.
Definition at line 228 of file csv_stat.cpp.
| csv::CSV_NON_NULL | ( | 2 | ) |
Definition at line 159 of file csv_row.cpp.
| CONSTEXPR_14 csv::string_view csv::CSVField::get< csv::string_view > | ( | ) |
Retrieve a view over this field's string.
Definition at line 465 of file csv_row.hpp.
|
inline |
Retrieve this field's original string.
Definition at line 455 of file csv_row.hpp.
|
noexcept |
Non-throwing retrieval of field as csv::string_view.
Definition at line 487 of file csv_row.hpp.
|
inlinenoexcept |
Non-throwing retrieval of field as std::string.
Definition at line 480 of file csv_row.hpp.
| csv::CSVRow::operator std::vector< std::string > | ( | ) | const |
Definition at line 58 of file csv_row.cpp.
|
inline |
Get the column names of a CSV file using just the first 500KB.
Definition at line 109 of file csv_utility.hpp.
|
inline |
Find the position of a column in a CSV file or CSV_NOT_FOUND otherwise.
Definition at line 117 of file csv_utility.hpp.
|
inline |
Get basic information about a CSV file.
Definition at line 94 of file csv_utility.hpp.
|
inline |
Guess the delimiter, header row, and mode column count of a CSV file.
**Heuristic:** For each candidate delimiter, calculate a score based on the most common row length (mode). The delimiter with the highest score wins. **Header Detection:** - If the first row has >= columns than the mode, it's treated as the header - Otherwise, the first row with the mode length is treated as the header This approach handles: - Headers with trailing delimiters or optional columns (wider than data rows) - Comment lines before the actual header (first row shorter than mode) - Standard CSVs where first row is the header
Definition at line 161 of file basic_csv_parser.hpp.
|
inline |
Return a csv::CSVWriter over the output stream.
Definition at line 524 of file csv_writer.hpp.
|
inline |
Return a buffered csv::CSVWriter over the output stream (does not auto flush)
Definition at line 530 of file csv_writer.hpp.
|
inline |
Return a csv::TSVWriter over the output stream.
Definition at line 536 of file csv_writer.hpp.
|
inline |
Return a buffered csv::TSVWriter over the output stream (does not auto flush)
Definition at line 542 of file csv_writer.hpp.
|
inline |
Parse a RFC 4180 CSV string.
String literals have static storage duration, so the zero-copy path is safe here.
Definition at line 71 of file csv_utility.hpp.
|
inline |
A shorthand for csv::parse_no_header().
String literals have static storage duration, so the zero-copy path is safe here.
Definition at line 80 of file csv_utility.hpp.
| DelimWriter< OutputStream, Delim, Quote, Flush > & csv::operator<< | ( | DelimWriter< OutputStream, Delim, Quote, Flush > & | writer, |
| const CSVRow & | row | ||
| ) |
Definition at line 1 of file csv_writer_extensions.hpp.
| DelimWriter< OutputStream, Delim, Quote, Flush > & csv::operator<< | ( | DelimWriter< OutputStream, Delim, Quote, Flush > & | writer, |
| const DataFrameRow< KeyType > & | row | ||
| ) |
Overload for writing a DataFrameRow (respects sparse overlay edits).
Definition at line 1 of file csv_writer_extensions.hpp.
|
inline |
Parse CSV from a string view, copying the input into an owned buffer.
Safe for any string_view regardless of the caller's ownership of the underlying memory.
Definition at line 37 of file csv_utility.hpp.
|
inline |
Parses a CSV string with no headers.
Definition at line 57 of file csv_utility.hpp.
|
inline |
Parse CSV from an in-memory view with zero copy.
WARNING: Non-owning path. The caller must ensure in's backing memory remains valid and immutable while the reader may request additional rows from the source stream.
Rows already obtained from the reader remain valid, but unread rows still depend on the source view staying alive.
Definition at line 51 of file csv_utility.hpp.
|
constexpr |
Offset to convert char into array index.
Definition at line 299 of file common.hpp.
|
constexpr |
Integer indicating a requested column wasn't found.
Definition at line 296 of file common.hpp.