Skip to content Skip to sidebar Skip to footer

How Does Python Determine If Two Strings Are Identical

I've tried to understand when Python strings are identical (aka sharing the same memory location). However during my tests, there seems to be no obvious explanation when two string

Solution 1:

From the link you posted:

Avoiding large .pyc files

So why does 'a' * 21 is 'aaaaaaaaaaaaaaaaaaaaa' not evaluate to True? Do you remember the .pyc files you encounter in all your packages? Well, Python bytecode is stored in these files. What would happen if someone wrote something like this ['foo!'] * 10**9? The resulting .pyc file would be huge! In order to avoid this phenomena, sequences generated through peephole optimization are discarded if their length is superior to 20.

If you have the string "HelloHelloHelloHelloHello", Python will necessarily have to store it as it is (asking the interpreter to detect repeating patterns in a string to save space might be too much). However, when it comes to string values that can be computed at parsing time, such as "Hello" * 5, Python evaluate those as part of this so-called "peephole optimization", which can decide whether it is worth it or not to precompute the string. Since len("Hello" * 5) > 20, the interpreter leaves it as it is to avoid storing too many long strings.

EDIT:

As indicated in this question, you can check this on the source code in peephole.c, function fold_binops_on_constants, near the end you will see:

// ...
} elseif (size > 20) {
    Py_DECREF(newconst);
    return -1;
}

EDIT 2:

Actually, that optimization code has recently been moved to the AST optimizer for Python 3.7, so now you would have to look into ast_opt.c, function fold_binop, which calls now function safe_multiply, which checks that the string is no longer than MAX_STR_SIZE, newly defined as 4096. So it seems that the limit has been significantly bumped up for the next releases.

Solution 2:

In Example 2 :

# Example 2
s1 = "Hello" * 3
s2 = "Hello" * 3
print(id(s1) == id(s2)) # True

Here the value of s1 and s2 are evaluated at compile time.so this will return true.

In Example 3 :

# Example 3
i = 3
s1 = "Hello" * i
s2 = "Hello" * i
print(id(s1) == id(s2)) # False

Here the values of s1 and s2 are evaluated at runtime and the result is not interned automatically so this will return false.This is to avoid excess memory allocation by creating "HelloHelloHello" string at runtime itself.

If you do intern manually it will return True

i = 3
s1 = "Hello" * i
s2 = "Hello" * i
print(id(intern(s1)) == id(intern(s2))) # True

Post a Comment for "How Does Python Determine If Two Strings Are Identical"