Why Does `myfloat In Myset` Become Super Slow?

July 09, 2022 Post a Comment

When I re-insert the same float value into my set a few times, the x in s check that should take constant time becomes very slow. Why? Output of timing x in s: 0.06 microseconds

Solution 1:

Because you are using nan, which is notorious for breaking naive expectations regarding __hash__/__eq__ contract... i.e.:

>>> myset = set()
>>> myset.add(float('nan'))
>>> myset
{nan}
>>> myset.add(float('nan'))
>>> myset
{nan, nan}

This happens because:

>>> float('nan') == float('nan')
False

But:

>>> hash(float('nan')) == hash(float('nan'))
True

So you are guaranteed to collide every time, and you are seeing hash set behavior degrade to O(N), which is the worst-case behavior, not O(1). Fundamentally, you are not re-inserting the same float value.

Moreover, watch out for this behavior:

>>> nan = float('nan')
>>> myset = set()
>>> myset.add(nan)
>>> myset.add(nan)
>>> myset
{nan}

Despite:

>>> nan == nan
False

The above is due to an optimization, that for containers, Python actually checks identity first to avoid potentially costly __eq__ operations. Since I re-used the same object, now it is being considered "the same value".

Python Guru

Why Does `myfloat In Myset` Become Super Slow?

Solution 1:

Post a Comment for "Why Does `myfloat In Myset` Become Super Slow?"