Comments by "MrAbrazildo" (@MrAbrazildo) on "1 Billion Rows Challenge" video.
-
0:42, if I would take care of this, I would start by rewriting this file to something much smaller. After all, 12GB is too much. Let's say names have average of 8 letters and numbers can go up to 99.9. So they are (8 + 1 (;) + 2 (2 numbers) + 1 (.) + 1 (1 number) + 1(new line char) )*8 = 14*8 = 112 bytes per line. So, writing in a binary file, 4 bits for the last fraction, 7 for the number and ~11 for a number simbolizing the name, which should be searched later in a separated table. So 4 + 7 + 11 = 22 bits, 3 bytes. If 3 is ~2,7% of 112 bytes, it means that those 12 GB would be reduced to 12x0.027 ~= 324 MB.
This would make any kind of search a lot faster.
4
-
1
-
1
-
1