How Do I Debug Overflowerror: Value Too Large To Convert To Int32_t?
What I am trying to do I am using PyArrow to read some CSVs and convert them to Parquet. Some of the files I read have plenty of columns and have a high memory footprint (enough to
Solution 1:
If I understand correctly the third argument to generate_arrow_tables
is batch_size
which you are passing in as the block_size
to the CSV reader. I'm not sure what data.size
's value is but you are guarding it with min(data.size, GB ** 10)
.
A block_size
of 10GB will not work. The error you are receiving is that the block size does not fit in a signed 32 bit integer (max ~2GB).
Besides that limit, I'm not sure it is a good idea to use a block size much bigger than the default (1MB). I wouldn't expect that you'd see much performance benefit and you will end up using far more RAM than you need to.
Post a Comment for "How Do I Debug Overflowerror: Value Too Large To Convert To Int32_t?"