Just another Network site

ParaText: CSV parsing at 2.5 GB per second

ParaText: CSV parsing at 2.5 GB per second. #Python #BigData #DataScience

  • It is difficult to make claims about its memory efficiency to better inform how to provision resources for Spark jobs.
  • The alpha release includes a parallel Comma Separated Values (CSV) reader with Python bindings.
  • We conducted extensive benchmarks of ParaText against 7 CSV readers and 5 binary readers.
  • Introduced ParaText, reads text files in parallel on a single multi-core machine to consume more of that bandwidth.
  • ParaText had a higher throughput than any of the other CSV readers tested, on every dataset tried.

Read the full article, click here.


@randal_olson: “ParaText: CSV parsing at 2.5 GB per second. #Python #BigData #DataScience”


CSV, Python, C++, parallelism data science


ParaText: CSV parsing at 2.5 GB per second

Comments are closed, but trackbacks and pingbacks are open.