Vince's CSV Parser
Loading...
Searching...
No Matches
Scalar Conversion Reference

Scalar Conversion Reference

CSVField conversions use the same scalar classification policy as classify_scalar, a separately maintained scalar classification library with its own test suite, configurable scalar grammars, and benchmarking notes.

Use classify_scalar directly if you want csv-parser's scalar behavior outside CSV parsing, or if you want to define a related classification policy for another data format.

Conversion APIs

API Result Failure
field.get<T>() Returns T Throws std::runtime_error
field.try_get<T>(out) Assigns out, returns true Returns false and leaves out unchanged unless documented otherwise
`std::optional<T> value = field` Produces std::optional<T> Produces std::nullopt
field.as<T>() Produces std::expected<T, CSVConversionError> Produces std::unexpected(CSVConversionError)
field.try_parse_hex<T>(out) Assigns an integral T, returns true Returns false
field.try_parse_decimal(out, decimal_symbol) Assigns long double, returns true Returns false
field.try_parse_timestamp<T>(out) Assigns a Unix timestamp, duration, or time point, returns true Returns false

std::optional conversions require C++17. std::expected conversions require C++23 and a standard library that provides std::expected. csv::CSVField::try_parse_timestamp() for uint64_t and std::chrono timestamp targets is available in all supported C++ versions.

csv::CSVField::as() reports conversion failures with csv::CSVConversionError. The enum values are:

CSVConversionError Meaning
None Conversion succeeded.
NotANumber The field is not compatible with the requested target type.
Overflow The parsed value does not fit in the requested target type.
FloatToInt A floating point field was requested as an integral type.
NegativeToUnsigned A negative value was requested as an unsigned type.

Use csv::csv_conversion_error_message() to convert a CSVConversionError to a stable human-readable message.

Classification Policy

data_type() exposes csv-parser's scalar classification policy. It recognizes empty values, strings, signed integer widths, big integers, hexadecimal integers, floating point values, booleans, and timestamps.

Classified value DataType result Notes
Empty field CSV_NULL Empty csv::string_view values are treated as null fields.
Non-scalar text CSV_STRING Strings such as phone numbers stay strings instead of being partly parsed.
Integer CSV_INT8, CSV_INT16, CSV_INT32, or CSV_INT64 The smallest signed width that can hold the value is used.
Integer outside int64_t CSV_BIGINT The value remains numeric for schema inference but is not narrowed into long long.
Hex integer Integer DataType data_type() and csv::CSVField::get() require the 0x prefix for hex classification.
Floating point CSV_DOUBLE Includes scientific notation.
Boolean CSV_BOOL true and false, case-insensitive.
Timestamp CSV_TIMESTAMP ISO 8601-style date/time strings.
REQUIRE(data_type(csv::string_view()) == DataType::CSV_NULL);
REQUIRE(data_type("") == DataType::CSV_NULL);
REQUIRE(data_type("not-a-number") == DataType::CSV_STRING);
REQUIRE(data_type("510-123-4567") == DataType::CSV_STRING);
REQUIRE(data_type("127") == DataType::CSV_INT8);
REQUIRE(data_type("128") == DataType::CSV_INT16);
REQUIRE(data_type("32768") == DataType::CSV_INT32);
REQUIRE(data_type("2147483648") == DataType::CSV_INT64);
std::string too_big = std::to_string((std::numeric_limits<std::int64_t>::max)());
too_big.push_back('1');
REQUIRE(data_type(too_big) == DataType::CSV_BIGINT);
REQUIRE(data_type("0x10") == DataType::CSV_INT8);
REQUIRE(data_type("3.14") == DataType::CSV_DOUBLE);
REQUIRE(data_type("1E-06") == DataType::CSV_DOUBLE);
std::string_view string_view
The string_view class used by this library.
Definition common.hpp:152
REQUIRE(data_type("true") == DataType::CSV_BOOL);
REQUIRE(data_type("false") == DataType::CSV_BOOL);
REQUIRE(data_type("2024-01-31T23:59:58Z") == DataType::CSV_TIMESTAMP);
CSVField true_field("true");
CSVField false_field("false");
CSVField timestamp_field("2024-01-31T23:59:58Z");
REQUIRE(true_field.type() == DataType::CSV_BOOL);
REQUIRE(false_field.type() == DataType::CSV_BOOL);
REQUIRE(timestamp_field.type() == DataType::CSV_TIMESTAMP);
REQUIRE(true_field.get<bool>());
REQUIRE_FALSE(false_field.get<bool>());

Integers and Hex

Integral conversions preserve range checks. Overflow, float-to-int conversion, and negative-to-unsigned conversion are rejected instead of relying on C++'s native cast behavior.

csv::CSVField::get() accepts hexadecimal integers when the field uses the 0x prefix. csv::CSVField::try_parse_hex() accepts hexadecimal values with or without the prefix and rejects values outside the target type's range.

Both paths use classify_scalar's built-in ASCII whitespace trimming before classification or parsing.

REQUIRE(CSVField("0x10").get<long long>() == 16);
REQUIRE(CSVField("-69").get<long long>() == -69);
REQUIRE(CSVField("2018").get<long long>() == 2018);
REQUIRE(internals::is_equal(CSVField("0.15").get<long double>(), 0.15L));
REQUIRE(internals::is_equal(CSVField("-1.5E3").get<long double>(), -1500.0L));
long long value = 0;
SECTION("Valid Hex Values") {
std::unordered_map<std::string, long long> test_cases = {
{" A ", 10},
{"0A", 10},
{"0B", 11},
{"0C", 12},
{"0D", 13},
{"0E", 14},
{"0F", 15},
{"0x10", 16},
{"FF", 255},
{"B00B5", 721077},
{"D3ADB33F", 3551376191},
{" D3ADB33F ", 3551376191}
};
for (auto& _case : test_cases) {
REQUIRE(CSVField(_case.first).try_parse_hex(value));
REQUIRE(value == _case.second);
}
}
SECTION("Invalid Values") {
std::vector<std::string> invalid_test_cases = {
"", " ", "carneasda", "carne asada", "0fg"
};
for (auto& _case : invalid_test_cases) {
REQUIRE(CSVField(_case).try_parse_hex(value) == false);
}
}
SECTION("Reject Values Outside Target Type Range") {
unsigned char byte_value = 0;
REQUIRE(CSVField("FF").try_parse_hex(byte_value));
REQUIRE(byte_value == 255);
REQUIRE_FALSE(CSVField("100").try_parse_hex(byte_value));
signed char signed_byte_value = 0;
REQUIRE(CSVField("7F").try_parse_hex(signed_byte_value));
REQUIRE(signed_byte_value == 127);
REQUIRE_FALSE(CSVField("80").try_parse_hex(signed_byte_value));
unsigned int unsigned_value = 0;
REQUIRE_FALSE(CSVField("-1").try_parse_hex(unsigned_value));
}

Floats

Floating point conversions support decimal values and scientific notation. Converting a floating point field to an integral type is rejected. Loss of floating point precision is not currently checked.

CSVField euler("2.718");
REQUIRE(euler.get<>() == "2.718");
REQUIRE(euler.get<csv::string_view>() == "2.718");
REQUIRE(euler.get<float>() == 2.718f);
REQUIRE(euler.get<double>() == 2.718);
REQUIRE(internals::is_equal(euler.get<long double>(), 2.718L));
float float_out = 0;
REQUIRE(euler.try_get(float_out));
REQUIRE(float_out == Catch::Approx(2.718f));
double double_out = 0;
REQUIRE(euler.try_get(double_out));
REQUIRE(double_out == Catch::Approx(2.718));
long double long_double_out = 0;
REQUIRE(euler.try_get(long_double_out));
REQUIRE(long_double_out == Catch::Approx(2.718l));
int int_out = 0;
REQUIRE_FALSE(euler.try_get(int_out));

Scientific notation

Scientific notation is classified as CSV_DOUBLE and can be materialized through csv::CSVField::get() or csv::CSVField::try_get() with a floating point target. Malformed scientific notation is classified as CSV_STRING.

Supported E-notation may use e or E; the exponent sign is optional, and leading zeroes in the exponent are accepted. Whitespace may surround the field, but not split the exponent marker from its exponent.

REQUIRE(data_type("1E-06") == DataType::CSV_DOUBLE);
REQUIRE(internals::is_equal(CSVField("1E-06").get<long double>(), 0.000001L));
REQUIRE(internals::is_equal(CSVField("-1.5E3").get<long double>(), -1500.0L));
REQUIRE(internals::is_equal(CSVField("+1.5e+003").get<long double>(), 1500.0L));
REQUIRE(data_type("1E -06") == DataType::CSV_STRING);
REQUIRE(data_type("1.5e") == DataType::CSV_STRING);

Decimal separators

csv::CSVField::try_parse_decimal() exists for CSV files that use a decimal separator other than .. This is commonly needed for comma-decimal values such as 3,14. It produces a long double and keeps the normal field classification visible on the CSVField.

SECTION("Test try_parse_decimal() with non-numeric value") {
long double output = 0;
std::string input = "stroustrup";
CSVField testField(input);
REQUIRE(testField.try_parse_decimal(output, ',') == false);
REQUIRE(testField.type() == DataType::CSV_STRING);
}
SECTION("Test try_parse_decimal() with integer value") {
long double output = 0;
std::string input = "2024";
CSVField testField(input);
REQUIRE(testField.try_parse_decimal(output, ',') == true);
REQUIRE(testField.type() == DataType::CSV_INT16);
REQUIRE(internals::is_equal(output, 2024.0l));
}
SECTION("Test try_parse_decimal() with various valid values") {
std::string input;
long double output = 0;
long double expected = 0;
std::tie(input, expected) =
GENERATE(table<std::string, long double>(
csv_test::FLOAT_TEST_CASES));
// Replace '.' with ','
std::replace(input.begin(), input.end(), '.', ',');
CSVField testField(input);
REQUIRE(testField.try_parse_decimal(output, ',') == true);
REQUIRE(testField.type() == DataType::CSV_DOUBLE);
REQUIRE(internals::is_equal(output, expected));
}

Booleans

Boolean conversion is deliberately narrow. true and false are accepted case-insensitively. Numeric values such as 1 are not implicitly converted to true.

SECTION("Numeric fields are not implicitly booleans") {
bool out = false;
REQUIRE_FALSE(CSVField("1").try_get(out));
}
SECTION("Boolean literals parse as booleans") {
bool out = false;
REQUIRE(CSVField("true").try_get(out));
REQUIRE(out);
out = true;
REQUIRE(CSVField("false").try_get(out));
REQUIRE_FALSE(out);
}
SECTION("Other string fields are not implicitly booleans") {
bool out = false;
REQUIRE_FALSE(CSVField("truthy").try_get(out));
}

Timestamps

Timestamp classification supports ISO 8601-style timestamps such as 1970-01-02T00:00:00.123Z. csv::CSVField::try_parse_timestamp() returns Unix time in milliseconds for uint64_t. Users can also convert to std::chrono::duration and std::chrono::system_clock::time_point.

Integer fields can be used with csv::CSVField::try_parse_timestamp(), which lets callers coerce Unix millisecond values into chrono targets explicitly.

CSVField field("1970-01-02T00:00:00.123Z");
REQUIRE(field.type() == DataType::CSV_TIMESTAMP);
std::uint64_t milliseconds = 0;
REQUIRE(field.try_parse_timestamp(milliseconds));
REQUIRE(milliseconds == 86400123);
unsigned long long milliseconds_ull = 0;
REQUIRE(field.try_parse_timestamp(milliseconds_ull));
REQUIRE(milliseconds_ull == 86400123ULL);
std::chrono::milliseconds duration_ms(0);
REQUIRE(field.try_get(duration_ms));
REQUIRE(duration_ms == std::chrono::milliseconds(86400123));
std::chrono::seconds duration_s(0);
REQUIRE(field.try_get(duration_s));
REQUIRE(duration_s == std::chrono::seconds(86400));
std::chrono::system_clock::time_point time_point;
REQUIRE(field.try_get(time_point));
REQUIRE(time_point.time_since_epoch() == std::chrono::milliseconds(86400123));
CSVField integer_timestamp("86400123");
std::chrono::seconds coerced_seconds(0);
REQUIRE(integer_timestamp.try_parse_timestamp(coerced_seconds));
REQUIRE(coerced_seconds == std::chrono::seconds(86400));
std::uint64_t unchanged = 123;
REQUIRE_FALSE(CSVField("not-a-timestamp").try_parse_timestamp(unchanged));
REQUIRE(unchanged == 123);
unchanged = 123;
REQUIRE_FALSE(CSVField("-1").try_parse_timestamp(unchanged));
REQUIRE(unchanged == 123);

std::optional and std::expected

The std::optional conversion operator is a concise wrapper over csv::CSVField::try_get(). csv::CSVField::as() is the structured-error alternative for callers who need to distinguish not-a-number, overflow, float-to-int, and negative-to-unsigned failures.

std::optional<std::uint32_t> number = CSVField("2019");
REQUIRE(number);
REQUIRE(*number == 2019);
std::optional<std::uint32_t> not_number = CSVField("applesauce");
REQUIRE_FALSE(not_number);
std::optional<std::uint32_t> negative_unsigned = CSVField("-1");
REQUIRE_FALSE(negative_unsigned);
std::optional<bool> truth = CSVField("true");
REQUIRE(truth);
REQUIRE(*truth);
std::optional<bool> numeric_bool = CSVField("1");
REQUIRE_FALSE(numeric_bool);
auto number = CSVField("2019").as<std::uint32_t>();
REQUIRE(number);
REQUIRE(*number == 2019);
auto not_number = CSVField("applesauce").as<std::uint32_t>();
REQUIRE_FALSE(not_number);
REQUIRE(not_number.error() == CSVConversionError::NotANumber);
auto overflow = CSVField("2019").as<signed char>();
REQUIRE_FALSE(overflow);
REQUIRE(overflow.error() == CSVConversionError::Overflow);
auto float_to_int = CSVField("2.718").as<int>();
REQUIRE_FALSE(float_to_int);
REQUIRE(float_to_int.error() == CSVConversionError::FloatToInt);
auto negative_to_unsigned = CSVField("-1").as<std::uint32_t>();
REQUIRE_FALSE(negative_to_unsigned);
REQUIRE(negative_to_unsigned.error() == CSVConversionError::NegativeToUnsigned);
REQUIRE(std::string(csv_conversion_error_message(negative_to_unsigned.error())) == csv::internals::ERROR_NEG_TO_UNSIGNED);