Skip to content Skip to sidebar Skip to footer

Memory Leaks In Requests Library

I noticed a very large increase in memory usage when retrieving a pdf file using requests library. The file itself is ~4MB large but physical memory allocated to python processes i

Solution 1:

When no charset parameter is included in the content type and the response is not a text/ mimetype, then a character detection library is used to determine the codec.

By using response.text you triggered this detection, loading the library, and it's modules include some sizable tables.

Depending on the exact version of requests you have installed, you'll find that sys.modules['requests.packages.chardet'] or sys.modules['requests.packages.charade'] is now present, together with around 36 of sub-modules, where it wasn't before you used r.text.

As the detection runs, a number of objects are created that use various statistical techniques on your PDF document, as detection fails to hit on any specific codec with enough certainty. To fit all this in memory, Python requests more memory to be allocated from your OS. Once the detection process is complete, that memory is freed again, but the OS does not then de-allocate that memory, not immediately. This is done to prevent wild memory churn as processes can easily request and free memory in cycles.

Note that you also added the result of r.text to your memory, bound to t. This is a Unicode text object, which in Python 2 takes up between 2 and 4 times as much memory as the bytestring object. The specific download you have there is nearly 4 MB as a bytestring, but if you are using a UCS-4 Python build, then the resulting Unicode value adds another 16MB just for the decoded value.

Post a Comment for "Memory Leaks In Requests Library"