General statistics
List of Youtube channels
Youtube commenter search
Distinguished comments
About
Anony Mousse
Computerphile
comments
Comments by "Anony Mousse" (@anon_y_mousse) on "Python Hash Sets Explained u0026 Demonstrated - Computerphile" video.
When implementing a hash table, you should use a table size that's a power of two so that you can do a `bitwise and` on the hash code to index the table with, but also you should store the unnormalized hash with the corresponding keys. Everyone always recommends using a prime number for the table size, but that's the wrong approach. You can always make the hashing function more complicated, but the indexing should be as simple as possible. In case you're wondering why you should save the unnormalized hash code, that's so that you can resize the table as necessary when too many or too few collisions occur and rebuild the table without further querying the hashing function. There's a lot of balancing to be done in implementing a hash table, regardless of whether it's a set or map, but these two things should always be a part of the implementation.
1
Keys are unique in a hash table, and generally immutable. If you're confusing a multiway hash table, you still only use one instance of a key. In such a case, a key/value pair would be best with the value side of that equation being a balanced tree structure instead of the usual array.
1
@DanieleCapellini If you have a lot of collisions, then it being sorted or not will be the least of your problems. However, in a tiny array, say with 16 or fewer elements, then sorting can be done fastest with an insertion sort algorithm and it'll be faster to lookup and eat less memory as an array. Of course, if you don't resize your table adequately and collision chains back up to an insane degree, then you'd be better off with a tree data structure all around.
1