|
Vince's CSV Parser
|
The all encompassing namespace. More...
Classes | |
| class | CSVField |
| Data type representing individual CSV values. More... | |
| struct | CSVFileInfo |
| Returned by get_file_info() More... | |
| class | CSVFormat |
| Stores information about how to parse a CSV file. More... | |
| struct | CSVGuessResult |
| Stores the inferred format of a CSV file. More... | |
| class | CSVReader |
| Main class for parsing CSVs from files and in-memory sources. More... | |
| class | CSVRow |
| Data structure for representing CSV rows. More... | |
| class | DataFrame |
| class | DataFrameCell |
| class | DataFrameColumn |
| Lightweight non-owning view over one DataFrame column. More... | |
| class | DataFrameExecutor |
| Persistent execution backend for batch-oriented DataFrame column work. More... | |
| class | DataFrameOptions |
| Allows configuration of DataFrame behavior. More... | |
| class | DataFrameRow |
| Proxy class that wraps a CSVRow and intercepts field access to check for edits. More... | |
| class | DelimWriter |
| Class for writing delimiter separated values files. More... | |
| struct | is_invocable_returning |
| struct | is_invocable_returning_impl |
| struct | is_invocable_returning_impl< F, ReturnType, void_t< invoke_result_t< F, Args... > >, Args... > |
| struct | RowOverlay |
| struct | RowOverlaySlot |
Typedefs | |
| using | string_view = std::string_view |
| The string_view class used by this library. | |
| template<bool B, class T = void> | |
| using | enable_if_t = typename std::enable_if< B, T >::type |
| template<typename F , typename... Args> | |
| using | invoke_result_t = typename std::invoke_result< F, Args... >::type |
| template<typename... Ts> | |
| using | void_t = void |
Enumerations | |
| enum class | VariableColumnPolicy { THROW = -1 , IGNORE_ROW = 0 , KEEP = 1 , KEEP_NON_EMPTY = 2 } |
| Determines how to handle rows that are shorter or longer than the majority. More... | |
| enum class | ColumnNamePolicy { EXACT = 0 , CASE_INSENSITIVE = 1 } |
| Determines how column name lookups are performed. More... | |
| enum class | CSVConversionError { None = 0 , NotANumber , Overflow , FloatToInt , NegativeToUnsigned } |
| Non-throwing CSVField conversion result. More... | |
| enum class | DataType { UNKNOWN = classify_scalar::scalar_invalid , CSV_NULL = classify_scalar::scalar_null , CSV_STRING = classify_scalar::scalar_string , CSV_BOOL = classify_scalar::scalar_bool , CSV_INT8 = classify_scalar::scalar_int8 , CSV_INT16 = classify_scalar::scalar_int16 , CSV_INT32 = classify_scalar::scalar_int32 , CSV_INT64 = classify_scalar::scalar_int64 , CSV_BIGINT = classify_scalar::scalar_bigint , CSV_DOUBLE = classify_scalar::scalar_float , CSV_TIMESTAMP = classify_scalar::scalar_timestamp , scalar_custom_begin = classify_scalar::scalar_custom_begin - 1 } |
| Enumerates the different CSV field types recognized by this library. More... | |
Functions | |
| CSVRow::operator std::vector< std::string > () const | |
| CSV_NON_NULL (2) CSVRow | |
| const char * | csv_conversion_error_message (CSVConversionError error) noexcept |
| Return a stable human-readable description for a CSVConversionError. | |
| template<> | |
| std::string | CSVField::get< std::string > () |
| Retrieve this field's original string. | |
| template<> | |
| CONSTEXPR_14 csv::string_view | CSVField::get< csv::string_view > () |
| Retrieve a view over this field's string. | |
| template<> | |
| bool | CSVField::try_get< std::string > (std::string &out) noexcept |
| Non-throwing retrieval of field as std::string. | |
| template<> | |
| CONSTEXPR_14 bool | CSVField::try_get< csv::string_view > (csv::string_view &out) noexcept |
| Non-throwing retrieval of field as csv::string_view. | |
Utility Functions | |
| std::unordered_map< std::string, DataType > | csv_data_types (CSVReader &reader) |
| Infer SQL-friendly column data types from an existing CSVReader. | |
| template<typename... ReaderArgs, csv::enable_if_t< std::is_constructible< CSVReader, ReaderArgs... >::value, int > = 0> | |
| std::unordered_map< std::string, DataType > | csv_data_types (ReaderArgs &&... reader_args) |
| Infer SQL-friendly column data types from any CSVReader constructor input. | |
| template<typename State , typename Fn > | |
| void | chunk_parallel_apply (CSVReader &reader, DataFrameExecutor &executor, std::vector< State > &states, Fn &&fn, size_t chunk_size=50000) |
| Apply a per-column batch function over a CSVReader using a reusable executor. | |
| template<typename State , typename Fn > | |
| void | chunk_parallel_apply (CSVReader &reader, std::vector< State > &states, Fn &&fn, size_t chunk_size=50000) |
| Apply a per-column batch function over a CSVReader with a temporary executor. | |
| CSVFileInfo | get_file_info (const std::string &filename) |
| Get basic information about a CSV file. | |
| std::vector< std::string > | get_col_names (csv::string_view filename, const CSVFormat &format=CSVFormat::guess_csv()) |
| Get the column names of a CSV file using just the first 500KB. | |
| long long | get_col_pos (csv::string_view filename, csv::string_view col_name, const CSVFormat &format=CSVFormat::guess_csv()) |
| Find the position of a column in a CSV file or CSV_NOT_FOUND otherwise. | |
Shorthand Parsing Functions | |
Convenience functions for parsing small strings | |
| CSVReader | parse (csv::string_view in, const CSVFormat &format=CSVFormat::guess_csv()) |
| Parse CSV from a string view, copying the input into an owned buffer. | |
| CSVReader | parse_unsafe (csv::string_view in, CSVFormat format=CSVFormat::guess_csv()) |
| Parse CSV from an in-memory view with zero copy. | |
| CSVReader | parse_no_header (csv::string_view in) |
| Parses a CSV string with no headers. | |
| CSVReader | operator""_csv (const char *in, size_t n) |
| Parse a RFC 4180 CSV string. | |
| CSVReader | operator""_csv_no_header (const char *in, size_t n) |
| A shorthand for csv::parse_no_header(). | |
Variables | |
| constexpr int | CSV_NOT_FOUND = -1 |
| Integer indicating a requested column wasn't found. | |
| constexpr unsigned | CHAR_OFFSET = std::numeric_limits<char>::is_signed ? 128 : 0 |
| Offset to convert char into array index. | |
CSV Writing | |
| template<class OutputStream > | |
| using | CSVWriter = DelimWriter< OutputStream, ',', '"'> |
| An alias for csv::DelimWriter for writing standard CSV files. | |
| template<class OutputStream > | |
| using | TSVWriter = DelimWriter< OutputStream, '\t', '"'> |
| Class for writing tab-separated values files. | |
| template<class OutputStream > | |
| CSVWriter< OutputStream > | make_csv_writer (OutputStream &out, bool quote_minimal=true) |
| Return a csv::CSVWriter over the output stream. | |
| template<class OutputStream > | |
| TSVWriter< OutputStream > | make_tsv_writer (OutputStream &out, bool quote_minimal=true) |
| Return a csv::TSVWriter over the output stream. | |
The all encompassing namespace.
| using csv::CSVWriter = typedef DelimWriter<OutputStream, ',', '"'> |
An alias for csv::DelimWriter for writing standard CSV files.
csv::make_csv_writer() to instantiate this class over an actual output stream. Definition at line 664 of file csv_writer.hpp.
| using csv::enable_if_t = typedef typename std::enable_if<B, T>::type |
Definition at line 202 of file common.hpp.
| using csv::invoke_result_t = typedef typename std::invoke_result<F, Args...>::type |
Definition at line 215 of file common.hpp.
The string_view class used by this library.
Definition at line 174 of file common.hpp.
| using csv::TSVWriter = typedef DelimWriter<OutputStream, '\t', '"'> |
Class for writing tab-separated values files.
csv::make_tsv_writer() to instantiate this class over an actual output stream. Definition at line 675 of file csv_writer.hpp.
| using csv::void_t = typedef void |
Definition at line 222 of file common.hpp.
|
strong |
Determines how column name lookups are performed.
| Enumerator | |
|---|---|
| EXACT | Case-sensitive match (default) |
| CASE_INSENSITIVE | Case-insensitive match. |
Definition at line 34 of file csv_format.hpp.
|
strong |
Non-throwing CSVField conversion result.
Returned by CSVField::as() inside std::expected, and used internally by CSVField::get() and CSVField::try_get() to keep throwing and non-throwing conversions on the same rules.
Definition at line 72 of file csv_row.hpp.
|
strong |
Enumerates the different CSV field types recognized by this library.
Definition at line 14 of file data_type.hpp.
|
strong |
Determines how to handle rows that are shorter or longer than the majority.
Definition at line 26 of file csv_format.hpp.
|
inline |
Apply a per-column batch function over a CSVReader using a reusable executor.
Reads the source in chunks, promotes each chunk into a temporary DataFrame, and applies fn(column, states[column.index()]).
Callbacks may treat each batch DataFrame as read-mostly, and sparse overlay cell edits are synchronized at row granularity. If you need more involved batch orchestration, use CSVReader::read_chunk() and construct a batch-scoped DataFrame yourself.
| std::invalid_argument | if chunk_size == 0 |
Definition at line 139 of file csv_utility.hpp.
|
inline |
Apply a per-column batch function over a CSVReader with a temporary executor.
This is the convenience overload for the common case where callers do not need to reuse worker threads across multiple reader pipelines.
Definition at line 163 of file csv_utility.hpp.
|
inlinenoexcept |
Return a stable human-readable description for a CSVConversionError.
Definition at line 102 of file csv_row.hpp.
Infer SQL-friendly column data types from an existing CSVReader.
This consumes rows from reader using the chunked ETL path and returns one inferred DataType per column name.
Definition at line 5 of file csv_utility.cpp.
|
inline |
Infer SQL-friendly column data types from any CSVReader constructor input.
This convenience overload forwards its arguments directly to CSVReader, so it supports filenames, std::istream sources, owned streams, and custom CSVFormat combinations without additional wrapper code.
Definition at line 121 of file csv_utility.hpp.
| csv::CSV_NON_NULL | ( | 2 | ) |
Definition at line 207 of file csv_row.cpp.
| CONSTEXPR_14 csv::string_view csv::CSVField::get< csv::string_view > | ( | ) |
Retrieve a view over this field's string.
Definition at line 767 of file csv_row.hpp.
|
inline |
Retrieve this field's original string.
Definition at line 757 of file csv_row.hpp.
|
noexcept |
Non-throwing retrieval of field as csv::string_view.
Definition at line 789 of file csv_row.hpp.
|
inlinenoexcept |
Non-throwing retrieval of field as std::string.
Definition at line 782 of file csv_row.hpp.
| csv::CSVRow::operator std::vector< std::string > | ( | ) | const |
Definition at line 58 of file csv_row.cpp.
|
inline |
Get the column names of a CSV file using just the first 500KB.
Definition at line 197 of file csv_utility.hpp.
|
inline |
Find the position of a column in a CSV file or CSV_NOT_FOUND otherwise.
Definition at line 205 of file csv_utility.hpp.
|
inline |
Get basic information about a CSV file.
Definition at line 176 of file csv_utility.hpp.
|
inline |
Return a csv::CSVWriter over the output stream.
Definition at line 679 of file csv_writer.hpp.
|
inline |
Return a csv::TSVWriter over the output stream.
Definition at line 685 of file csv_writer.hpp.
|
inline |
Parse a RFC 4180 CSV string.
String literals have static storage duration, so the zero-copy path is safe here.
Definition at line 76 of file csv_utility.hpp.
|
inline |
A shorthand for csv::parse_no_header().
String literals have static storage duration, so the zero-copy path is safe here.
Definition at line 85 of file csv_utility.hpp.
|
inline |
Parse CSV from a string view, copying the input into an owned buffer.
Safe for any string_view regardless of the caller's ownership of the underlying memory.
Definition at line 42 of file csv_utility.hpp.
|
inline |
Parses a CSV string with no headers.
Definition at line 62 of file csv_utility.hpp.
|
inline |
Parse CSV from an in-memory view with zero copy.
WARNING: Non-owning path. The caller must ensure in's backing memory remains valid and immutable while the reader may request additional rows from the source stream.
Rows already obtained from the reader remain valid, but unread rows still depend on the source view staying alive.
Definition at line 56 of file csv_utility.hpp.
|
constexpr |
Offset to convert char into array index.
Definition at line 482 of file common.hpp.
|
constexpr |
Integer indicating a requested column wasn't found.
Definition at line 479 of file common.hpp.