Skip to content Skip to sidebar Skip to footer

How Do I Debug Overflowerror: Value Too Large To Convert To Int32_t?

What I am trying to do I am using PyArrow to read some CSVs and convert them to Parquet. Some of the files I read have plenty of columns and have a high memory footprint (enough to

Solution 1:

If I understand correctly the third argument to generate_arrow_tables is batch_size which you are passing in as the block_size to the CSV reader. I'm not sure what data.size's value is but you are guarding it with min(data.size, GB ** 10).

A block_size of 10GB will not work. The error you are receiving is that the block size does not fit in a signed 32 bit integer (max ~2GB).

Besides that limit, I'm not sure it is a good idea to use a block size much bigger than the default (1MB). I wouldn't expect that you'd see much performance benefit and you will end up using far more RAM than you need to.

Post a Comment for "How Do I Debug Overflowerror: Value Too Large To Convert To Int32_t?"