Why Does `myfloat In Myset` Become Super Slow?
When I re-insert the same float value into my set a few times, the x in s check that should take constant time becomes very slow. Why? Output of timing x in s: 0.06 microseconds
Solution 1:
Because you are using nan, which is notorious for breaking naive expectations regarding __hash__
/__eq__
contract... i.e.:
>>> myset = set()
>>> myset.add(float('nan'))
>>> myset
{nan}
>>> myset.add(float('nan'))
>>> myset
{nan, nan}
This happens because:
>>> float('nan') == float('nan')
False
But:
>>> hash(float('nan')) == hash(float('nan'))
True
So you are guaranteed to collide every time, and you are seeing hash set behavior degrade to O(N), which is the worst-case behavior, not O(1). Fundamentally, you are not re-inserting the same float value.
Moreover, watch out for this behavior:
>>> nan = float('nan')
>>> myset = set()
>>> myset.add(nan)
>>> myset.add(nan)
>>> myset
{nan}
Despite:
>>> nan == nan
False
The above is due to an optimization, that for containers, Python actually checks identity first to avoid potentially costly __eq__
operations. Since I re-used the same object, now it is being considered "the same value".
Post a Comment for "Why Does `myfloat In Myset` Become Super Slow?"