Comments by "Mikko Rantalainen" (@MikkoRantalainen) on "Internationalis(z)ing Code - Computerphile" video.

@terner1234 Yes, supporting Hebrew when you can already fully support Arabic is just much better start than only supporting English. I think the hardest part is when you have multiple languages mixed together. In worst case you could have overall layout in Arabic, have some long quotation in English (meaning that the quotation must wrap over multiple lines in the middle of Arabic text) with some Japanese names with Ruby text above the name. And when you can successfully support all that, some joker comes by and messes with your user interface with zalgo text overflowing over all the content.
6
I think you should have included a sentence or two about text input. When you have mixture of LTR and RTL input, your text caret can split into two to show where the next letter is going to be depending on the next letter (the future left to right letter would go to one caret, the future right to left letter would go to another caret). I'm pretty sure implementing that after-the-fact would be pretty hard indeed. And to make things even worse, many languages require IME to enter the text (e.g. traditional Chinese) where you have to render something after entering it partially. For more latin-like letters, combining characters are one example, too.
5
@felipevasconcelos6736 > “End” in “weekend” doesn’t mean “final section”, but “extremity”. Yeah, that sounds like an explanation that has been invented after the fact.
3
As in "Thi͡s is not a c͒ͪorrupted piece of te̿̔̉xt but ĵũśt a test of a UTF-8 string handling. It will be VȆ̴̟̟͙̞ͩ͌͝ ̅ͫ͏̙̤RY hard to҉ parse this to actual letters (or graphem̡e̶s) and probably should no͛ͫt be̠̅ tried by the web server. The only check requi̍̈́̂̈́red is that this string ̲͚̖͔̙î̩́s a valid UTF-8 encoded string and this string ̲͚̖͔̙î̩́s not lea͠ki̧n͘g HTML special chacters such as < because òtherwise an XS̨̥̫͎̭ͯ̿̔̀ͅSͮ̂҉̯͈͕̹̘̱ attack can be exécütèḑ." It seems that YouTube fails to render some of those letters with at least Chrome on Linux, YMMV.
2
@Liggliluff And even in countries that group numbers less than 10000, the year numbers are an exception. Nobody wants to see "year 2 022". So when you're rendering a number, your software should know the language context of the number and the meaning of the number. And if it's about currency, some languages require rendering negative numbers (e.g. loan amount) different from mathematical negative numbers. And we have this whole mess "due historical reasons".
2
And it turns out that many Finnish users accidentally use space in 1 000 000 but the correct letter is the non-breaking space which avoids getting the number split into two parts because of text wrapping. However, the jury is still out if the correct separator is single non-breaking space or combination of codepoints zero width joiner, regular space, zero width joiner. Both result in preventing wrapping at the middle of the number but have different meaning. And some math geeks think that using full width space looks bad and one should use 1‍ ‍000‍ ‍000 instead where the space is replaced with thin space (U+2009) which is ever so slighly narrower than regular space. And with that you have to use zero width joiner (U+200D) always to prevent word wrapping from breaking the number.
1
Yeah, but unlike that video, there's no silver lining here. Timezones are easy when you just keep track of timezones everywhere and use the black box libraries that can handle all the details. The only hard part about timezones is to wrap it around non-developers that a "date" is not a thing worldwide. When you have date such as 2022-03-15 (ISO 8601 syntax) it starts and ends at different times around the globe. You cannot say that e.g. deadline for a homework is 2022-03-15 because that would be 2022-03-15 plus or minus 12 hours. And if you're close to switch between summertime and wintertime, make it plus or minus 13 hours. Plus maybe an extra hour if some country is also changing timezones that year. Any deadline or other exact time should always include date, time and timezone. And the timezone is important because when non-developers set time, they may say that they want "2035-03-15 23:55 Europe/Helsinki" and that means the moment when clocks show that time in Helsinki after all future changes to timezones have already been implemented. As a result, you cannot store timezones as time delta to UTC, no matter how many existing systems are already doing so.
1
@NathanTAK I absolutely agree that a week ends with Saturday and Sunday. I have never understood how people in the USA call those days as "weekend" which is literally end of the week and still they think that the next week starts between those days. ISO 8601 would be the obvious fix here but let's just forget the "T" and replace it with space.
1
Here in Finland, nearly all the TV programs and movies are shown with original audio and subtitled in Finnish. This was historically done because of lower cost (subtitling is cheaper than dubbing) but when you're fluent with the technique, it's great for any content. For example, I actually watched the "Better Than Us" series on Netflix using the original Russian audio even though I don't understand Russian. The dubbed English just seemed off even though it was technically done about as well as dubbing is possible to do. The only reason I watched any dubbed content is when our children were too young to cope with subtitled content. After they learned to read fluently, they also prefer original audio nowadays.
1
@SeralyneYT UTC used to be rebranded GMT but then England decided that they want summertime and GMT didn't follow UTC for some years. As a result, if you have GMT time, it may or may not match the UTC time depending on the timestamp you got. And then we have IAT which is same as UTC without the leap seconds. Currently those differ by 37 seconds.
1
The interface to such library needs lots of data, though. For example, to compare to strings you need to know collation for the context in case the comparision should be made case-insensitive. And you need the gender for the subject in case you're trying to combine names with full sentences like Tom explained.
1
We use internal labels such as "Save[button]" and "Save[menu option]" because the same English word may require different translation depending on context. If you use gettext library, you have (in theory) support for such context information without putting it into translateable string but I've found such support to be so unstable in many translation tools that it's better to use extra tags in the identifiers used in the source code.
1
Make that non-breaking space and don't use it if the number is a year, though.
1