Skip to content Skip to sidebar Skip to footer

Sizeof(string) Not Equal To String Length

I used to think that each character is one byte (at least that is the case in c/c++) so the size of string should be equal to len(string) bytes. However, a simple experiment tells

Solution 1:

Python string objects contain more information than just the characters. They contain a reference count, a reference to the type definition, the length of the string, the cached hash and the interning state. See the PyStringObject struct, as well as the PyObject_VAR_HEAD struct referenced.

As a result, an empty string has a memory size too:

>>> import sys
>>> sys.getsizeof('')
37

This size is platform dependent, because pointers and C integers have different sizes on different platforms. 37 is the size of a Python 2 str object on Mac OS X.

For unicode objects the picture is even more distorted; Python 2 can use either 2 or 4 bytes per codepoint, depending on a compilation-time choice. The most recent Python 3 versions use a variable number of bytes for Unicode text, between 1 and 4 bytes per codepoint depending on the highest codepoint requirements in the text.

As such, it is normal for sys.getsizeof() to return a different, higher value. sys.getsizeof() is not a function to get a string length. Use len() for that.

If you are want to know how much memory other software uses for a string, you definitely can't use the sys.sizeof() value; other software will make different choices about how to store text, and will have different overheads. The len() value of the encoded text may be a starting point, but you'll have to check with the documentation or developers for that other piece of software to see what they can tell you how much memory is required for a given piece of text.


Post a Comment for "Sizeof(string) Not Equal To String Length"