|
Vince's CSV Parser
|
DataFrameExecutor and speculative::ParallelCSVParser both contain a persistent indexed-task worker pool. They should eventually share a small internal executor instead of maintaining two near-identical scheduling loops.
Both implementations:
std::exception_ptr in the worker and rethrow on the caller threadThe common abstraction is an indexed task pool:
worker_index matters because the speculative parser needs worker-local parser state (ChunkParserCore). DataFrameExecutor can ignore it.
DataFrameExecutor exposes a public-ish batch API for DataFrame operations.ParallelCSVParser is an internal parser implementation detail.Create an internal header, likely:
Potential class:
Behavior:
worker_count <= 1, run serially.task_count == 0, return immediately.(worker_index, task_index) to the callback.IndexedTaskPool and tests for:DataFrameExecutor to delegate to IndexedTaskPool.ParallelCSVParser to delegate to IndexedTaskPool, using worker_index to address a worker-local ChunkParserCore.This is a cleanup/risk-reduction refactor, but it is not the next performance hot path. The tuple/parser-core policy refactor should land first because it directly enables allocation-free typed parsing. TODO: re-expose/document a public tuple-reader API only after it is backed by the parser-core typed path.