Scalar Conversion Reference
CSVField conversions use the same scalar classification policy as classify_scalar, a separately maintained scalar classification library with its own test suite, configurable scalar grammars, and benchmarking notes.
Use classify_scalar directly if you want csv-parser's scalar behavior outside CSV parsing, or if you want to define a related classification policy for another data format.
Conversion APIs
std::optional conversions require C++17. std::expected conversions require C++23 and a standard library that provides std::expected. csv::CSVField::try_parse_timestamp() for uint64_t and std::chrono timestamp targets is available in all supported C++ versions.
csv::CSVField::as() reports conversion failures with csv::CSVConversionError. The enum values are:
| CSVConversionError | Meaning |
None | Conversion succeeded. |
NotANumber | The field is not compatible with the requested target type. |
Overflow | The parsed value does not fit in the requested target type. |
FloatToInt | A floating point field was requested as an integral type. |
NegativeToUnsigned | A negative value was requested as an unsigned type. |
Use csv::csv_conversion_error_message() to convert a CSVConversionError to a stable human-readable message.
Classification Policy
data_type() exposes csv-parser's scalar classification policy. It recognizes empty values, strings, signed integer widths, big integers, hexadecimal integers, floating point values, booleans, and timestamps.
| Classified value | DataType result | Notes |
| Empty field | CSV_NULL | Empty csv::string_view values are treated as null fields. |
| Non-scalar text | CSV_STRING | Strings such as phone numbers stay strings instead of being partly parsed. |
| Integer | CSV_INT8, CSV_INT16, CSV_INT32, or CSV_INT64 | The smallest signed width that can hold the value is used. |
Integer outside int64_t | CSV_BIGINT | The value remains numeric for schema inference but is not narrowed into long long. |
| Hex integer | Integer DataType | data_type() and csv::CSVField::get() require the 0x prefix for hex classification. |
| Floating point | CSV_DOUBLE | Includes scientific notation. |
| Boolean | CSV_BOOL | true and false, case-insensitive. |
| Timestamp | CSV_TIMESTAMP | ISO 8601-style date/time strings. |
REQUIRE(data_type("") == DataType::CSV_NULL);
REQUIRE(data_type("not-a-number") == DataType::CSV_STRING);
REQUIRE(data_type("510-123-4567") == DataType::CSV_STRING);
REQUIRE(data_type("127") == DataType::CSV_INT8);
REQUIRE(data_type("128") == DataType::CSV_INT16);
REQUIRE(data_type("32768") == DataType::CSV_INT32);
REQUIRE(data_type("2147483648") == DataType::CSV_INT64);
std::string too_big = std::to_string((std::numeric_limits<std::int64_t>::max)());
too_big.push_back('1');
REQUIRE(data_type(too_big) == DataType::CSV_BIGINT);
REQUIRE(data_type("0x10") == DataType::CSV_INT8);
REQUIRE(data_type("3.14") == DataType::CSV_DOUBLE);
REQUIRE(data_type("1E-06") == DataType::CSV_DOUBLE);
std::string_view string_view
The string_view class used by this library.
REQUIRE(data_type("true") == DataType::CSV_BOOL);
REQUIRE(data_type("false") == DataType::CSV_BOOL);
REQUIRE(data_type("2024-01-31T23:59:58Z") == DataType::CSV_TIMESTAMP);
CSVField true_field("true");
CSVField false_field("false");
CSVField timestamp_field("2024-01-31T23:59:58Z");
REQUIRE(true_field.type() == DataType::CSV_BOOL);
REQUIRE(false_field.type() == DataType::CSV_BOOL);
REQUIRE(timestamp_field.type() == DataType::CSV_TIMESTAMP);
REQUIRE(true_field.get<bool>());
REQUIRE_FALSE(false_field.get<bool>());
Integers and Hex
Integral conversions preserve range checks. Overflow, float-to-int conversion, and negative-to-unsigned conversion are rejected instead of relying on C++'s native cast behavior.
csv::CSVField::get() accepts hexadecimal integers when the field uses the 0x prefix. csv::CSVField::try_parse_hex() accepts hexadecimal values with or without the prefix and rejects values outside the target type's range.
Both paths use classify_scalar's built-in ASCII whitespace trimming before classification or parsing.
REQUIRE(CSVField("0x10").get<long long>() == 16);
REQUIRE(CSVField("-69").get<long long>() == -69);
REQUIRE(CSVField("2018").get<long long>() == 2018);
REQUIRE(internals::is_equal(CSVField("0.15").get<long double>(), 0.15L));
REQUIRE(internals::is_equal(CSVField("-1.5E3").get<long double>(), -1500.0L));
long long value = 0;
SECTION("Valid Hex Values") {
std::unordered_map<std::string, long long> test_cases = {
{" A ", 10},
{"0A", 10},
{"0B", 11},
{"0C", 12},
{"0D", 13},
{"0E", 14},
{"0F", 15},
{"0x10", 16},
{"FF", 255},
{"B00B5", 721077},
{"D3ADB33F", 3551376191},
{" D3ADB33F ", 3551376191}
};
for (auto& _case : test_cases) {
REQUIRE(CSVField(_case.first).try_parse_hex(value));
REQUIRE(value == _case.second);
}
}
SECTION("Invalid Values") {
std::vector<std::string> invalid_test_cases = {
"", " ", "carneasda", "carne asada", "0fg"
};
for (auto& _case : invalid_test_cases) {
REQUIRE(CSVField(_case).try_parse_hex(value) == false);
}
}
SECTION("Reject Values Outside Target Type Range") {
unsigned char byte_value = 0;
REQUIRE(CSVField("FF").try_parse_hex(byte_value));
REQUIRE(byte_value == 255);
REQUIRE_FALSE(CSVField("100").try_parse_hex(byte_value));
signed char signed_byte_value = 0;
REQUIRE(CSVField("7F").try_parse_hex(signed_byte_value));
REQUIRE(signed_byte_value == 127);
REQUIRE_FALSE(CSVField("80").try_parse_hex(signed_byte_value));
unsigned int unsigned_value = 0;
REQUIRE_FALSE(CSVField("-1").try_parse_hex(unsigned_value));
}
Floats
Floating point conversions support decimal values and scientific notation. Converting a floating point field to an integral type is rejected. Loss of floating point precision is not currently checked.
CSVField euler("2.718");
REQUIRE(euler.get<>() == "2.718");
REQUIRE(euler.get<float>() == 2.718f);
REQUIRE(euler.get<double>() == 2.718);
REQUIRE(internals::is_equal(euler.get<long double>(), 2.718L));
float float_out = 0;
REQUIRE(euler.try_get(float_out));
REQUIRE(float_out == Catch::Approx(2.718f));
double double_out = 0;
REQUIRE(euler.try_get(double_out));
REQUIRE(double_out == Catch::Approx(2.718));
long double long_double_out = 0;
REQUIRE(euler.try_get(long_double_out));
REQUIRE(long_double_out == Catch::Approx(2.718l));
int int_out = 0;
REQUIRE_FALSE(euler.try_get(int_out));
Scientific notation
Scientific notation is classified as CSV_DOUBLE and can be materialized through csv::CSVField::get() or csv::CSVField::try_get() with a floating point target. Malformed scientific notation is classified as CSV_STRING.
Supported E-notation may use e or E; the exponent sign is optional, and leading zeroes in the exponent are accepted. Whitespace may surround the field, but not split the exponent marker from its exponent.
REQUIRE(data_type("1E-06") == DataType::CSV_DOUBLE);
REQUIRE(internals::is_equal(CSVField("1E-06").get<long double>(), 0.000001L));
REQUIRE(internals::is_equal(CSVField("-1.5E3").get<long double>(), -1500.0L));
REQUIRE(internals::is_equal(CSVField("+1.5e+003").get<long double>(), 1500.0L));
REQUIRE(data_type("1E -06") == DataType::CSV_STRING);
REQUIRE(data_type("1.5e") == DataType::CSV_STRING);
Decimal separators
csv::CSVField::try_parse_decimal() exists for CSV files that use a decimal separator other than .. This is commonly needed for comma-decimal values such as 3,14. It produces a long double and keeps the normal field classification visible on the CSVField.
SECTION("Test try_parse_decimal() with non-numeric value") {
long double output = 0;
std::string input = "stroustrup";
CSVField testField(input);
REQUIRE(testField.try_parse_decimal(output, ',') == false);
REQUIRE(testField.type() == DataType::CSV_STRING);
}
SECTION("Test try_parse_decimal() with integer value") {
long double output = 0;
std::string input = "2024";
CSVField testField(input);
REQUIRE(testField.try_parse_decimal(output, ',') == true);
REQUIRE(testField.type() == DataType::CSV_INT16);
REQUIRE(internals::is_equal(output, 2024.0l));
}
SECTION("Test try_parse_decimal() with various valid values") {
std::string input;
long double output = 0;
long double expected = 0;
std::tie(input, expected) =
GENERATE(table<std::string, long double>(
csv_test::FLOAT_TEST_CASES));
std::replace(input.begin(), input.end(), '.', ',');
CSVField testField(input);
REQUIRE(testField.try_parse_decimal(output, ',') == true);
REQUIRE(testField.type() == DataType::CSV_DOUBLE);
REQUIRE(internals::is_equal(output, expected));
}
Booleans
Boolean conversion is deliberately narrow. true and false are accepted case-insensitively. Numeric values such as 1 are not implicitly converted to true.
SECTION("Numeric fields are not implicitly booleans") {
bool out = false;
REQUIRE_FALSE(CSVField("1").try_get(out));
}
SECTION("Boolean literals parse as booleans") {
bool out = false;
REQUIRE(CSVField("true").try_get(out));
REQUIRE(out);
out = true;
REQUIRE(CSVField("false").try_get(out));
REQUIRE_FALSE(out);
}
SECTION("Other string fields are not implicitly booleans") {
bool out = false;
REQUIRE_FALSE(CSVField("truthy").try_get(out));
}
Timestamps
Timestamp classification supports ISO 8601-style timestamps such as 1970-01-02T00:00:00.123Z. csv::CSVField::try_parse_timestamp() returns Unix time in milliseconds for uint64_t. Users can also convert to std::chrono::duration and std::chrono::system_clock::time_point.
Integer fields can be used with csv::CSVField::try_parse_timestamp(), which lets callers coerce Unix millisecond values into chrono targets explicitly.
CSVField field("1970-01-02T00:00:00.123Z");
REQUIRE(field.type() == DataType::CSV_TIMESTAMP);
std::uint64_t milliseconds = 0;
REQUIRE(field.try_parse_timestamp(milliseconds));
REQUIRE(milliseconds == 86400123);
unsigned long long milliseconds_ull = 0;
REQUIRE(field.try_parse_timestamp(milliseconds_ull));
REQUIRE(milliseconds_ull == 86400123ULL);
std::chrono::milliseconds duration_ms(0);
REQUIRE(field.try_get(duration_ms));
REQUIRE(duration_ms == std::chrono::milliseconds(86400123));
std::chrono::seconds duration_s(0);
REQUIRE(field.try_get(duration_s));
REQUIRE(duration_s == std::chrono::seconds(86400));
std::chrono::system_clock::time_point time_point;
REQUIRE(field.try_get(time_point));
REQUIRE(time_point.time_since_epoch() == std::chrono::milliseconds(86400123));
CSVField integer_timestamp("86400123");
std::chrono::seconds coerced_seconds(0);
REQUIRE(integer_timestamp.try_parse_timestamp(coerced_seconds));
REQUIRE(coerced_seconds == std::chrono::seconds(86400));
std::uint64_t unchanged = 123;
REQUIRE_FALSE(CSVField("not-a-timestamp").try_parse_timestamp(unchanged));
REQUIRE(unchanged == 123);
unchanged = 123;
REQUIRE_FALSE(CSVField("-1").try_parse_timestamp(unchanged));
REQUIRE(unchanged == 123);
std::optional and std::expected
The std::optional conversion operator is a concise wrapper over csv::CSVField::try_get(). csv::CSVField::as() is the structured-error alternative for callers who need to distinguish not-a-number, overflow, float-to-int, and negative-to-unsigned failures.
std::optional<std::uint32_t> number = CSVField("2019");
REQUIRE(number);
REQUIRE(*number == 2019);
std::optional<std::uint32_t> not_number = CSVField("applesauce");
REQUIRE_FALSE(not_number);
std::optional<std::uint32_t> negative_unsigned = CSVField("-1");
REQUIRE_FALSE(negative_unsigned);
std::optional<bool> truth = CSVField("true");
REQUIRE(truth);
REQUIRE(*truth);
std::optional<bool> numeric_bool = CSVField("1");
REQUIRE_FALSE(numeric_bool);
auto number = CSVField("2019").as<std::uint32_t>();
REQUIRE(number);
REQUIRE(*number == 2019);
auto not_number = CSVField("applesauce").as<std::uint32_t>();
REQUIRE_FALSE(not_number);
REQUIRE(not_number.error() == CSVConversionError::NotANumber);
auto overflow = CSVField("2019").as<signed char>();
REQUIRE_FALSE(overflow);
REQUIRE(overflow.error() == CSVConversionError::Overflow);
auto float_to_int = CSVField("2.718").as<int>();
REQUIRE_FALSE(float_to_int);
REQUIRE(float_to_int.error() == CSVConversionError::FloatToInt);
auto negative_to_unsigned = CSVField("-1").as<std::uint32_t>();
REQUIRE_FALSE(negative_to_unsigned);
REQUIRE(negative_to_unsigned.error() == CSVConversionError::NegativeToUnsigned);
REQUIRE(std::string(csv_conversion_error_message(negative_to_unsigned.error())) == csv::internals::ERROR_NEG_TO_UNSIGNED);