Scalar Conversion Reference

CSVField conversions use the same scalar classification policy as classify_scalar, a separately maintained scalar classification library with its own test suite, configurable scalar grammars, and benchmarking notes.

Use classify_scalar directly if you want csv-parser's scalar behavior outside CSV parsing, or if you want to define a related classification policy for another data format.

Conversion APIs

API	Result	Failure
`field.get<T>()`	Returns `T`	Throws `std::runtime_error`
`field.try_get<T>(out)`	Assigns `out`, returns `true`	Returns `false` and leaves `out` unchanged unless documented otherwise
`std::optional<T> value = field`	Produces `std::optional<T>`	Produces `std::nullopt`
`field.as<T>()`	Produces `std::expected<T, CSVConversionError>`	Produces `std::unexpected(CSVConversionError)`
`field.try_parse_hex<T>(out)`	Assigns an integral `T`, returns `true`	Returns `false`
`field.try_parse_decimal(out, decimal_symbol)`	Assigns `long double`, returns `true`	Returns `false`
`field.try_parse_timestamp<T>(out)`	Assigns a Unix timestamp, duration, or time point, returns `true`	Returns `false`

std::optional conversions require C++17. std::expected conversions require C++23 and a standard library that provides std::expected. csv::CSVField::try_parse_timestamp() for uint64_t and std::chrono timestamp targets is available in all supported C++ versions.

csv::CSVField::as() reports conversion failures with csv::CSVConversionError. The enum values are:

CSVConversionError	Meaning
`None`	Conversion succeeded.
`NotANumber`	The field is not compatible with the requested target type.
`Overflow`	The parsed value does not fit in the requested target type.
`FloatToInt`	A floating point field was requested as an integral type.
`NegativeToUnsigned`	A negative value was requested as an unsigned type.

Use csv::csv_conversion_error_message() to convert a CSVConversionError to a stable human-readable message.

Classification Policy

data_type() exposes csv-parser's scalar classification policy. It recognizes empty values, strings, signed integer widths, big integers, hexadecimal integers, floating point values, booleans, and timestamps.

Classified value	DataType result	Notes
Empty field	`CSV_NULL`	Empty `csv::string_view` values are treated as null fields.
Non-scalar text	`CSV_STRING`	Strings such as phone numbers stay strings instead of being partly parsed.
Integer	`CSV_INT8`, `CSV_INT16`, `CSV_INT32`, or `CSV_INT64`	The smallest signed width that can hold the value is used.
Integer outside `int64_t`	`CSV_BIGINT`	The value remains numeric for schema inference but is not narrowed into `long long`.
Hex integer	Integer DataType	`data_type()` and csv::CSVField::get() require the `0x` prefix for hex classification.
Floating point	`CSV_DOUBLE`	Includes scientific notation.
Boolean	`CSV_BOOL`	`true` and `false`, case-insensitive.
Timestamp	`CSV_TIMESTAMP`	ISO 8601-style date/time strings.

    REQUIRE(data_type(csv::string_view()) == DataType::CSV_NULL);
    REQUIRE(data_type("") == DataType::CSV_NULL);
    REQUIRE(data_type("not-a-number") == DataType::CSV_STRING);
    REQUIRE(data_type("510-123-4567") == DataType::CSV_STRING);
 
    REQUIRE(data_type("127") == DataType::CSV_INT8);
    REQUIRE(data_type("128") == DataType::CSV_INT16);
    REQUIRE(data_type("32768") == DataType::CSV_INT32);
    REQUIRE(data_type("2147483648") == DataType::CSV_INT64);
 
    std::string too_big = std::to_string((std::numeric_limits<std::int64_t>::max)());
    too_big.push_back('1');
    REQUIRE(data_type(too_big) == DataType::CSV_BIGINT);
 
    REQUIRE(data_type("0x10") == DataType::CSV_INT8);
    REQUIRE(data_type("3.14") == DataType::CSV_DOUBLE);
    REQUIRE(data_type("1E-06") == DataType::CSV_DOUBLE);

    REQUIRE(data_type("true") == DataType::CSV_BOOL);
    REQUIRE(data_type("false") == DataType::CSV_BOOL);
    REQUIRE(data_type("2024-01-31T23:59:58Z") == DataType::CSV_TIMESTAMP);
 
    CSVField true_field("true");
    CSVField false_field("false");
    CSVField timestamp_field("2024-01-31T23:59:58Z");
 
    REQUIRE(true_field.type() == DataType::CSV_BOOL);
    REQUIRE(false_field.type() == DataType::CSV_BOOL);
    REQUIRE(timestamp_field.type() == DataType::CSV_TIMESTAMP);
 
    REQUIRE(true_field.get<bool>());
    REQUIRE_FALSE(false_field.get<bool>());

Integers and Hex

Integral conversions preserve range checks. Overflow, float-to-int conversion, and negative-to-unsigned conversion are rejected instead of relying on C++'s native cast behavior.

csv::CSVField::get() accepts hexadecimal integers when the field uses the 0x prefix. csv::CSVField::try_parse_hex() accepts hexadecimal values with or without the prefix and rejects values outside the target type's range.

Both paths use classify_scalar's built-in ASCII whitespace trimming before classification or parsing.

    REQUIRE(CSVField("0x10").get<long long>() == 16);
    REQUIRE(CSVField("-69").get<long long>() == -69);
    REQUIRE(CSVField("2018").get<long long>() == 2018);
    REQUIRE(internals::is_equal(CSVField("0.15").get<long double>(), 0.15L));
    REQUIRE(internals::is_equal(CSVField("-1.5E3").get<long double>(), -1500.0L));

    long long value = 0;
 
    SECTION("Valid Hex Values") {
        std::unordered_map<std::string, long long> test_cases = {
            {"  A   ", 10},
            {"0A", 10},
            {"0B", 11},
            {"0C", 12},
            {"0D", 13},
            {"0E", 14},
            {"0F", 15},
            {"0x10", 16},
            {"FF", 255},
            {"B00B5", 721077},
            {"D3ADB33F", 3551376191},
            {"  D3ADB33F  ", 3551376191}
        };
 
        for (auto& _case : test_cases) {
            REQUIRE(CSVField(_case.first).try_parse_hex(value));
            REQUIRE(value == _case.second);
        }
    }
 
    SECTION("Invalid Values") {
        std::vector<std::string> invalid_test_cases = {
            "", "    ", "carneasda", "carne asada", "0fg"
        };
 
        for (auto& _case : invalid_test_cases) {
            REQUIRE(CSVField(_case).try_parse_hex(value) == false);
        }
    }
 
    SECTION("Reject Values Outside Target Type Range") {
        unsigned char byte_value = 0;
        REQUIRE(CSVField("FF").try_parse_hex(byte_value));
        REQUIRE(byte_value == 255);
        REQUIRE_FALSE(CSVField("100").try_parse_hex(byte_value));
 
        signed char signed_byte_value = 0;
        REQUIRE(CSVField("7F").try_parse_hex(signed_byte_value));
        REQUIRE(signed_byte_value == 127);
        REQUIRE_FALSE(CSVField("80").try_parse_hex(signed_byte_value));
 
        unsigned int unsigned_value = 0;
        REQUIRE_FALSE(CSVField("-1").try_parse_hex(unsigned_value));
    }

Floats

Floating point conversions support decimal values and scientific notation. Converting a floating point field to an integral type is rejected. Loss of floating point precision is not currently checked.

        CSVField euler("2.718");
        REQUIRE(euler.get<>() == "2.718");
        REQUIRE(euler.get<csv::string_view>() == "2.718");
        REQUIRE(euler.get<float>() == Catch::Approx(2.718f));
        REQUIRE(euler.get<double>() == Catch::Approx(2.718));
        REQUIRE(internals::is_equal(euler.get<long double>(), 2.718L));
 
        float float_out = 0;
        REQUIRE(euler.try_get(float_out));
        REQUIRE(float_out == Catch::Approx(2.718f));
 
        double double_out = 0;
        REQUIRE(euler.try_get(double_out));
        REQUIRE(double_out == Catch::Approx(2.718));
 
        long double long_double_out = 0;
        REQUIRE(euler.try_get(long_double_out));
        REQUIRE(long_double_out == Catch::Approx(2.718l));
 
        int int_out = 0;
        REQUIRE_FALSE(euler.try_get(int_out));

Scientific notation

Scientific notation is classified as CSV_DOUBLE and can be materialized through csv::CSVField::get() or csv::CSVField::try_get() with a floating point target. Malformed scientific notation is classified as CSV_STRING.

Supported E-notation may use e or E; the exponent sign is optional, and leading zeroes in the exponent are accepted. Whitespace may surround the field, but not split the exponent marker from its exponent.

    REQUIRE(data_type("1E-06") == DataType::CSV_DOUBLE);
    REQUIRE(internals::is_equal(CSVField("1E-06").get<long double>(), 0.000001L));
    REQUIRE(internals::is_equal(CSVField("-1.5E3").get<long double>(), -1500.0L));
    REQUIRE(internals::is_equal(CSVField("+1.5e+003").get<long double>(), 1500.0L));
 
    REQUIRE(data_type("1E -06") == DataType::CSV_STRING);
    REQUIRE(data_type("1.5e") == DataType::CSV_STRING);

Decimal separators

csv::CSVField::try_parse_decimal() exists for CSV files that use a decimal separator other than .. This is commonly needed for comma-decimal values such as 3,14. It produces a long double and keeps the normal field classification visible on the CSVField.

    SECTION("Test try_parse_decimal() with non-numeric value") {
        long double output = 0;
        std::string input = "stroustrup";
        CSVField testField(input);
 
        REQUIRE(testField.try_parse_decimal(output, ',') == false);
        REQUIRE(testField.type() == DataType::CSV_STRING);
    }
 
    SECTION("Test try_parse_decimal() with integer value") {
        long double output = 0;
        std::string input = "2024";
        CSVField testField(input);
 
        REQUIRE(testField.try_parse_decimal(output, ',') == true);
        REQUIRE(testField.type() == DataType::CSV_INT16);
        REQUIRE(internals::is_equal(output, 2024.0l));
    }
 
    SECTION("Test try_parse_decimal() with various valid values") {
        std::string input;
        long double output = 0;
        long double expected = 0;
 
        std::tie(input, expected) =
            GENERATE(table<std::string, long double>(
                csv_test::FLOAT_TEST_CASES));
 
        // Replace '.' with ','
        std::replace(input.begin(), input.end(), '.', ',');
 
        CSVField testField(input);
 
        REQUIRE(testField.try_parse_decimal(output, ',') == true);
        REQUIRE(testField.type() == DataType::CSV_DOUBLE);
        REQUIRE(internals::is_equal(output, expected));
    }

Booleans

Boolean conversion is deliberately narrow. true and false are accepted case-insensitively. Numeric values such as 1 are not implicitly converted to true.

    SECTION("Numeric fields are not implicitly booleans") {
        bool out = false;
        REQUIRE_FALSE(CSVField("1").try_get(out));
    }
 
    SECTION("Boolean literals parse as booleans") {
        bool out = false;
        REQUIRE(CSVField("true").try_get(out));
        REQUIRE(out);
 
        out = true;
        REQUIRE(CSVField("false").try_get(out));
        REQUIRE_FALSE(out);
    }
 
    SECTION("Other string fields are not implicitly booleans") {
        bool out = false;
        REQUIRE_FALSE(CSVField("truthy").try_get(out));
    }

Timestamps

Timestamp classification supports ISO 8601-style timestamps such as 1970-01-02T00:00:00.123Z. csv::CSVField::try_parse_timestamp() returns Unix time in milliseconds for uint64_t. Users can also convert to std::chrono::duration and std::chrono::system_clock::time_point.

Integer fields can be used with csv::CSVField::try_parse_timestamp(), which lets callers coerce Unix millisecond values into chrono targets explicitly.

    CSVField field("1970-01-02T00:00:00.123Z");
 
    REQUIRE(field.type() == DataType::CSV_TIMESTAMP);
 
    std::uint64_t milliseconds = 0;
    REQUIRE(field.try_parse_timestamp(milliseconds));
    REQUIRE(milliseconds == 86400123);
 
    unsigned long long milliseconds_ull = 0;
    REQUIRE(field.try_parse_timestamp(milliseconds_ull));
    REQUIRE(milliseconds_ull == 86400123ULL);
 
    std::chrono::milliseconds duration_ms(0);
    REQUIRE(field.try_get(duration_ms));
    REQUIRE(duration_ms == std::chrono::milliseconds(86400123));
 
    std::chrono::seconds duration_s(0);
    REQUIRE(field.try_get(duration_s));
    REQUIRE(duration_s == std::chrono::seconds(86400));
 
    std::chrono::system_clock::time_point time_point;
    REQUIRE(field.try_get(time_point));
    REQUIRE(time_point.time_since_epoch() == std::chrono::milliseconds(86400123));
 
    CSVField integer_timestamp("86400123");
    std::chrono::seconds coerced_seconds(0);
    REQUIRE(integer_timestamp.try_parse_timestamp(coerced_seconds));
    REQUIRE(coerced_seconds == std::chrono::seconds(86400));
 
    std::uint64_t unchanged = 123;
    REQUIRE_FALSE(CSVField("not-a-timestamp").try_parse_timestamp(unchanged));
    REQUIRE(unchanged == 123);
 
    unchanged = 123;
    REQUIRE_FALSE(CSVField("-1").try_parse_timestamp(unchanged));
    REQUIRE(unchanged == 123);

std::optional and std::expected

The std::optional conversion operator is a concise wrapper over csv::CSVField::try_get(). csv::CSVField::as() is the structured-error alternative for callers who need to distinguish not-a-number, overflow, float-to-int, and negative-to-unsigned failures.

    std::optional<std::uint32_t> number = CSVField("2019");
    REQUIRE(number);
    REQUIRE(*number == 2019);
 
    std::optional<std::uint32_t> not_number = CSVField("applesauce");
    REQUIRE_FALSE(not_number);
 
    std::optional<std::uint32_t> negative_unsigned = CSVField("-1");
    REQUIRE_FALSE(negative_unsigned);
 
    std::optional<bool> truth = CSVField("true");
    REQUIRE(truth);
    REQUIRE(*truth);
 
    std::optional<bool> numeric_bool = CSVField("1");
    REQUIRE_FALSE(numeric_bool);

    auto number = CSVField("2019").as<std::uint32_t>();
    REQUIRE(number);
    REQUIRE(*number == 2019);
 
    auto not_number = CSVField("applesauce").as<std::uint32_t>();
    REQUIRE_FALSE(not_number);
    REQUIRE(not_number.error() == CSVConversionError::NotANumber);
 
    auto overflow = CSVField("2019").as<signed char>();
    REQUIRE_FALSE(overflow);
    REQUIRE(overflow.error() == CSVConversionError::Overflow);
 
    auto float_to_int = CSVField("2.718").as<int>();
    REQUIRE_FALSE(float_to_int);
    REQUIRE(float_to_int.error() == CSVConversionError::FloatToInt);
 
    auto negative_to_unsigned = CSVField("-1").as<std::uint32_t>();
    REQUIRE_FALSE(negative_to_unsigned);
    REQUIRE(negative_to_unsigned.error() == CSVConversionError::NegativeToUnsigned);
    REQUIRE(std::string(csv_conversion_error_message(negative_to_unsigned.error())) == csv::internals::ERROR_NEG_TO_UNSIGNED);