Win / TheDonald
Sign In
DEFAULT COMMUNITIES Front All General AskWin Funny Technology Animals Sports Gaming DIY Health Positive Privacy
Reason: None provided.

That's not a lot.

The CPU can run through GB a second. It can't take that many cycles per record.

So one run through against log n or something...

Python is also notoriously slow though.

There could be something more sophisticated going on but python isn't really a database. I'd just shove it into mysql or something. Buffer inserts and do them in blocks of at least 1000. Many different databases are super fast if you work around and do silly things to get around the bottlenecks. Shove it in mysql and you won't need to worry about dataset sizes against memory.

Also that amount of memory is probably a lot of overhead. I suspect it could be taken down a third to half at least.

In the morning if I can get the data I'll run some stuff myself and share the scripts. It's a lot of data and I wouldn't be processing it in a way it all needs to be in memory. The speed of SSD wont make any noticeable difference.

I've got 64GB but jobs like this can be done in under 10GB. How big is the CSV file uncompressed? Memory use should be close to that.

111 days ago
1 score
Reason: None provided.

That's not a lot.

The CPU can run through GB a second. It can't take that many cycles per record.

So one run through against log n or something...

Python is also notoriously slow though.

There could be something more sophisticated going on but python isn't really a database. I'd just shove it into mysql or something. Buffer inserts and do them in blocks of at least 1000. Many different databases are super fast if you work around and do silly things to get around the bottlenecks. Shove it in mysql and you won't need to worry about dataset sizes against memory.

Also that amount of memory is probably a lot of overhead. I suspect it could be taken down a third to half at least.

In the morning if I can get the data I'll run some stuff myself and share the scripts. It's a lot of data and I wouldn't be processing it in a way it all needs to be in memory. The speed of SSD wont make any noticeable difference.

I've got 64GB but jobs like this can be done in under 10GB.

111 days ago
1 score
Reason: Original

That's not a lot.

The CPU can run through GB a second. It can't take that many cycles per record.

So one run through against log n or something...

Python is also notoriously slow though.

There could be something more sophisticated going on but python isn't really a database. I'd just shove it into mysql or something. Buffer inserts and do them in blocks of at least 1000. Many different databases are super fast if you work around and do silly things to get around the bottlenecks. Shove it in mysql and you won't need to worry about dataset sizes against memory.

Also that amount of memory is probably a lot of overhead. I suspect it could be taken down a third to half at least.

In the morning if I can get the data I'll run some stuff myself and share the scripts. It's a lot of data and I wouldn't be processing it in a way it all needs to be in memory. The speed of SSD wont make any noticeable difference.

111 days ago
1 score