Vince's CSV Parser
Loading...
Searching...
No Matches
csv::CSVReader::iterator Class Reference

An input iterator capable of handling large files. More...

#include <csv_reader.hpp>

Public Member Functions

 iterator (CSVReader *reader)
 
 iterator (CSVReader *, CSVRow &&)
 
CONSTEXPR_14 reference operator* ()
 Access the CSVRow held by the iterator.
 
CONSTEXPR_14 reference operator* () const
 
CONSTEXPR_14 pointer operator-> ()
 Return a pointer to the CSVRow the iterator has stopped at.
 
CONSTEXPR_14 pointer operator-> () const
 
iteratoroperator++ ()
 Pre-increment iterator.
 
iterator operator++ (int)
 Post-increment iterator.
 
CONSTEXPR bool operator== (const iterator &other) const noexcept
 Returns true if iterators were constructed from the same CSVReader and point to the same row.
 
CONSTEXPR bool operator!= (const iterator &other) const noexcept
 

Detailed Description

An input iterator capable of handling large files.

Note
Created by CSVReader::begin() and CSVReader::end().
Iterating over a file
TEST_CASE("Basic CSVReader Iterator Test", "[read_ints_iter]") {
// A file with 100 rows and columns A, B, ... J
// where every value in the ith row is the number i
CSVReader reader("./tests/data/fake_data/ints.csv");
std::vector<std::string> col_names = {
"A", "B", "C", "D", "E", "F", "G", "H", "I", "J"
};
int i = 1;
SECTION("Basic Iterator") {
for (auto it = reader.begin(); it != reader.end(); ++it) {
REQUIRE((*it)[0].get<int>() == i);
i++;
}
}
SECTION("Iterator Post-Increment") {
auto it = reader.begin();
REQUIRE((it++)->operator[]("A").get<int>() == 1);
REQUIRE(it->operator[]("A").get<int>() == 2);
}
SECTION("Range-Based For Loop") {
for (auto& row : reader) {
for (auto& j : col_names) REQUIRE(row[j].get<int>() == i);
i++;
}
}
}
Main class for parsing CSVs from files and in-memory sources.
internals::ColNamesPtr col_names
Pointer to a object containing column information.
Using with <algorithm> library
TEST_CASE("CSVReader Iterator + Algorithms Requiring ForwardIterator", "[iter_algorithms]") {
SECTION("std::max_element - CORRECT approach using vector") {
// The first is such that each value in the ith row is the number i
// There are 100 rows
CSVReader reader("./tests/data/fake_data/ints.csv");
// Copy rows to vector to enable ForwardIterator algorithms
auto rows = std::vector<CSVRow>(reader.begin(), reader.end());
REQUIRE(rows.size() == 100);
// Find largest number
auto int_finder = [](const CSVRow& left, const CSVRow& right) {
return (left["A"].get<int>() < right["A"].get<int>());
};
auto max_int = std::max_element(rows.begin(), rows.end(), int_finder);
REQUIRE((*max_int)["A"] == 100);
}
SECTION("std::max_element - Large File using vector") {
// The second file is a database of California state employee salaries
CSVReader reader("./tests/data/real_data/2015_StateDepartment.csv");
// Copy rows to vector to enable ForwardIterator algorithms
auto rows = std::vector<CSVRow>(reader.begin(), reader.end());
// Find highest salary
auto wage_finder = [](const CSVRow& left, const CSVRow& right) {
return (left["Total Wages"].get<double>() < right["Total Wages"].get<double>());
};
auto max_wage = std::max_element(rows.begin(), rows.end(), wage_finder);
REQUIRE((*max_wage)["Total Wages"] == "812064.87");
}
}
Data structure for representing CSV rows.
Definition csv_row.hpp:280
Warning
STREAMING CONSTRAINT - DO NOT ATTEMPT TO CACHE ALL DATA This iterator is intentionally std::input_iterator_tag (single-pass) to support streaming large CSV files that may exceed available RAM (e.g., 50+ GB files).
CRITICAL DESIGN CONSTRAINT:
  • The underlying RawCSVData chunks are automatically freed as the iterator advances
  • Only the CURRENT row's data is kept alive (via CSVRow's RawCSVDataPtr member)
  • Previous chunks are freed to allow unlimited file sizes with bounded memory
WHY FORWARD ITERATOR IS NOT POSSIBLE:
  • ForwardIterator requires multi-pass guarantees (can hold multiple valid positions)
  • Supporting this would require caching ALL RawCSVDataPtr chunks in the iterator
  • This defeats the streaming purpose: a 50 GB CSV would require 50+ GB of RAM
  • The entire library design depends on automatic chunk cleanup for memory efficiency
IMPLICATIONS FOR ALGORITHM USE:
  • Algorithms requiring ForwardIterator (std::max_element, std::sort, etc.) may appear to work in tests with small files, but will cause heap-use-after-free when the algorithm tries to access iterators pointing to freed chunks
  • CORRECT approach: Copy rows to std::vector first, then use algorithms
  • Example: auto rows = std::vector<CSVRow>(reader.begin(), reader.end()); auto max_row = std::max_element(rows.begin(), rows.end(), ...);
FOR AI AGENTS AND MAINTAINERS:
If you encounter heap-use-after-free with ForwardIterator algorithms, the solution is NOT to cache all RawCSVDataPtr chunks - this violates the fundamental streaming architecture. Instead, document the limitation or provide vector-based workarounds.

Definition at line 117 of file csv_reader.hpp.

Constructor & Destructor Documentation

◆ iterator() [1/2]

csv::CSVReader::iterator::iterator ( CSVReader reader)
inline

Definition at line 128 of file csv_reader.hpp.

◆ iterator() [2/2]

csv::CSVReader::iterator::iterator ( CSVReader _daddy,
CSVRow &&  _row 
)

Definition at line 28 of file csv_reader_iterator.cpp.

Member Function Documentation

◆ operator!=()

CONSTEXPR bool csv::CSVReader::iterator::operator!= ( const iterator other) const
inlinenoexcept

Definition at line 149 of file csv_reader.hpp.

◆ operator*() [1/2]

CONSTEXPR_14 reference csv::CSVReader::iterator::operator* ( )
inline

Access the CSVRow held by the iterator.

Definition at line 132 of file csv_reader.hpp.

◆ operator*() [2/2]

CONSTEXPR_14 reference csv::CSVReader::iterator::operator* ( ) const
inline

Definition at line 133 of file csv_reader.hpp.

◆ operator++() [1/2]

CSVReader::iterator & csv::CSVReader::iterator::operator++ ( )

Pre-increment iterator.

Advance the iterator by one row.

If this CSVReader has an associated file, then the iterator will lazily pull more data from that file until the end of file is reached.

Note
This iterator does not block the thread responsible for parsing CSV.

Definition at line 40 of file csv_reader_iterator.cpp.

◆ operator++() [2/2]

CSVReader::iterator csv::CSVReader::iterator::operator++ ( int  )

Post-increment iterator.

Definition at line 49 of file csv_reader_iterator.cpp.

◆ operator->() [1/2]

CONSTEXPR_14 pointer csv::CSVReader::iterator::operator-> ( )
inline

Return a pointer to the CSVRow the iterator has stopped at.

Definition at line 136 of file csv_reader.hpp.

◆ operator->() [2/2]

CONSTEXPR_14 pointer csv::CSVReader::iterator::operator-> ( ) const
inline

Definition at line 137 of file csv_reader.hpp.

◆ operator==()

CONSTEXPR bool csv::CSVReader::iterator::operator== ( const iterator other) const
inlinenoexcept

Returns true if iterators were constructed from the same CSVReader and point to the same row.

Definition at line 145 of file csv_reader.hpp.


The documentation for this class was generated from the following files: