Comments by "LoneTech" (@0LoneTech) on "Bug in Binary Search - Computerphile" video.

At the time, Java's demand for 32-bit processing was remarkable and often wasteful. Today, waste has been normalized to the point people get offended if you remark a 100GB patch is excessive, and Java has been awkwardly forced into smartcards.
24
That style of optimization exists, e.g. in gcc's -ffast-math, which enables some for floating point processing. However, the buggy code has undefined behaviour which the corrected code does not, so this erroneous change should not be produced by an optimization pass.
24
@B3Band You're assuming a widening operation is the only way to check for overflow. This is untrue; the patterns of operands that cause them are known and detectable, and there's often dedicated hardware to detect and flag overflows.
5
This is called hadd or rhadd (h=half, r=right) in OpenCL, and there's mix() for arbitrary sections of floating point values. It's an ancient issue; compare e.g. Forth's */ word, which conceptually does a widening multiply before a narrowing divide.
5
@chri-k There are a handful of others. Division is often around 40 cycles, random number instructions have been known to take thousands of cycles, cache misses are typically a few hundred, and bulk instructions like rep movsb or arbitrary size multiply are unlimited (but usually interruptible).
4
Several languages do have built in checking for this error; Zig is one of them. The overhead may be negligible; on some CPUs it's just a matter of enabling the overflow to cause an interrupt. This is the reason some ISAs distinguish signed or unsigned add.
4
Can you find any processor where the conversion to and from floating point, as well as the division you duplicated, are not each slower than the subtraction you replaced?
3
@ARKGAMING Then why suggest it? I could also note it's going to run into truncation errors far more easily than the original ran into overflows.
3
Firstly, the problem remains with unsigned integers; the incorrectly calculated index just might access defined memory and lead to more confusing misbehaviour, such as an infinite loop or incorrect answer. Secondly, it can be useful to apply an offset to an index, such as the (r-l)/2 value here, and in other algorithms it wouldn't be odd for such a step to be negative. Thirdly, Java doesn't know the number is an index until it's used to index, and the algorithm isn't array specific. There do exist languages which can restrict index types to match, like Ada or Clash. In Python negative indices index from the right.
3
@joshuascholar3220 Do you think arrays require a separate array of indices to exist? Where's the index array for those?
2
@felixmoore6781 Then it's more likely that you also have a narrow index type, and the search is likely to be so short that extra cycle is insignificant.
2
Nothing in the binary search algorithm requires your search space to be something physically stored in memory. Consider e.g. doing a binary search on a function to calculate a root.
2
@dealloc Note that saturation arithmetic, while one way to handle overflows, would not solve this bug but make it stranger.
2
@antoniogarest7516 That just shifts the over/underflow boundaries around, though. -100-100 isn't 56 either. Java was designed with a 32 bits is enough attitude, Python switches to arbitrary precision, and Zig allows you to specify (u160 and i3 are equally valid types). Ada is also quite happy to have a type going from 256 to 511.
2
Note that Python does switch to floating point with / (that's why // exists), and that drops from arbitrary precision to 53 bits (from IEEE 754 binary64). You get rounding errors that go outside the search range. >>> left=(1<<53)+1; right=left+1; mid=int(left/2+right/2); left,mid,right,left<=mid<=right (9007199254740993, 9007199254740992, 9007199254740994, False) Since this mid is too low, the check would move the left side to mid+1, which is right where it was and we have an infinite loop.
1
You can keep going with binary search, though many implementations only return the midpoint upon match, not the high and low range. Searching for any match, left or right edge are distinct goals, often expressed as looking for the first or last index a value could be inserted while maintaining order.
1
@lunarobinson2169 That's when the typescript compiler (with tsc --target) comes in handy to convert your scripts to older versions of ecmascript. The default output is the oldest supported (ES3), so should have wide compatibility anyhow. Numeric separators are rather new, from 12th edition, ECMAScript 2021. Side note, exponential format usually implies floating point. In Javascript that's no surprise because the spec expects it to be the only type of number, but in more strictly typed languages they must be converted to e.g. specify an array size.
1
@phiefer3 Could you elaborate on "something like Java that will sidestep the overflow issue"? Java appears to be one of the languages that specifically don't handle overflows. Compare e.g. Zig's @addWithOverflow, which exists because overflows normally throw an error, and in addition you can use widths wider than your CPU types. The Java spec states "The integer operators do not indicate overflow or underflow in any way." and "Despite the fact that overflow, underflow, or other loss of information may occur, a narrowing primitive conversion never results in a run-time exception."
1
@minilathemayhem The point is that a 16-bit scalar can't fit every Unicode (UCS) code point; currently 17 times larger than that. UTF-16 applies multi-word techniques, in contrast to UCS-2, and for those who picked 16 bits to fit "everything" that tactic failed. There's still a range of code points that are more compact in UTF-16 than UTF-8, but it comes at the cost of endianness issues. Either way, we should not assume one code point represents one character; that's wrong for e.g. modified emoji (colours, nations etc) or decomposed accents.
1
 @skeetskeet9403 While it's true integer (floor) division is not associative, what you showed here was it doesn't have the distributive property.
1
@Packetlust Accurate, but not terribly clear. What you have there is outputs from an array of shifted half-adders (xor is modulo 2 sum, and gives carry), with a carry in to a final full adder to correct truncation errors. Very unlikely to be any faster or smaller on a CPU, but possibly on a gate array.
1
XKCD 583.
1
CPUs often have more adder/subtractors than right shift units, all operating at the same speed. Left shift is a little more common as it's used in address indexing.
1