SchwertImStein@lemmy.dbzer0.com to Programmer Humor@programming.devEnglish · 7 months agomuskrat's data eng expert's hard drive overheats while processing 60k rowslemmy.dbzer0.comexternal-linkmessage-square292linkfedilinkarrow-up1985arrow-down114
arrow-up1971arrow-down1external-linkmuskrat's data eng expert's hard drive overheats while processing 60k rowslemmy.dbzer0.comSchwertImStein@lemmy.dbzer0.com to Programmer Humor@programming.devEnglish · 7 months agomessage-square292linkfedilink
minus-squarewise_pancake@lemmy.calinkfedilinkarrow-up6·7 months ago60k rows is generally very usable with even wide tables in row formats. I’ve had pandas work with 1M plus rows with 100 columns in memory just fine. After 1M rows move on to something better like Dask, polars, spark, or literally any DB. The first thing I’d do with whatever data they’re running into issues with is rewrite it as partitioned and sorted parquet.
minus-squareOnno (VK6FLAB)@lemmy.radiolinkfedilinkarrow-up4·7 months agoMy go-to tool of late is duckdb, comes with binaries for most platforms, works out of the box, loads any number of database formats and is FAST.
60k rows is generally very usable with even wide tables in row formats.
I’ve had pandas work with 1M plus rows with 100 columns in memory just fine.
After 1M rows move on to something better like Dask, polars, spark, or literally any DB.
The first thing I’d do with whatever data they’re running into issues with is rewrite it as partitioned and sorted parquet.
My go-to tool of late is
duckdb, comes with binaries for most platforms, works out of the box, loads any number of database formats and is FAST.