General statistics
List of Youtube channels
Youtube commenter search
Distinguished comments
About

OpenGL4ever
Creel
comments

Comments by "OpenGL4ever" (@OpenGL4ever) on "Creel" channel.

Previous
1
Next
...
All

@AlyxSharkBite-2000 As a small update, since Zen4 AMD supports it.
3
Unfortunately, consumers in particular will not benefit from this as long as most CPUs do not support AVX512. Even if they have an AVX512 capable CPU. Computer game manufacturers in particular will only support AVX512 when the market is big enough for it. Until then, consumers and gamers will have to be content with AVX2. This is likely to be the lowest common denominator over the next few years. Of course there are exceptions. As a programmer, I don't have to buy a server CPU to use AVX512 if I can buy a consumer processor from AMD with AVX512 support. And performance-hungry end user programs, where the effort for extra binaries or plugins is worthwhile, will probably also be able to offer AVX512 support.
2
@glenwaldrop8166 No it's the complete capability to run 8086 real mode in hardware that Intel wants to drop. The POST is only one of the given reasons, because that is the time, where modern computers still have to run in real mode. Only the support of 32-bit protected mode for applications in Ring 3 should remain, this also means, that you will no more be able to run 32 bit protected mode operating systems in hardware. If Intel implements this, then you can even no longer use it within a virtualized machine. Then it has to be emulated in software. There is paper from Intel called "Envisioning a Simplified Intel Architecture", that shows this clearly in a table.
2
@WhatsACreel Intel is planing to get rid of 16 Bit Real Mode support and 32 Bit Protected Mode support in Ring 0 to simplify newer x86-64 CPUs.
2
My next CPU that i will buy will definitely have AVX512 support.
1
I agree and we already have several MiB as 2nd Level Cache.
1
This will have the very simple reason that there are currently hardly any applications for the end user that support AVX512. But for compilers and developers it's good that they can buy CPUs that can do AVX512 so that they can adapt their software and compilers to it. That is why the space on the silicon chip was very likely saved. As soon as the software supports AVX512 better, there will also be hardware with more AVX512 execution units per core, so that the additional performance compared to AVX2 can also be used. It's basically a chicken and egg problem. But why waste space for the chicken if the egg hasn't even been laid yet.
1
The compiler will support it and produce code that uses it. As more CPUs support AVX512, there will be more applications using it.
1
If you need a playground. Many open source audio and video codecs are already optimized for the x86 and ARM architectures, but this is not yet the case for the RISC-V architecture. So you could buy a single board computer (SBC) with a RISC-V CPU and then see what could be optimized there. You would need to learn RISC-V assembly though.
1
Today all x86 since the Pentium Pro are RISC chips. Other x86 manufacturers like NexGen went to RISC even earlier. If you want a comparison between ARM64 and x86-64 you can take a look at Apple's M1.
1
@martineyles They use Microps and the underlining hardware is definitely RISC and not CISC.
1
@martineyles It is widely accepted and known that today's x86 processors are considered RISC.
1
You can emulate AVX512 in bochs and run bochs on your ARM processor. It will be very slow, but for learning it should be good enough.
1
@Illya9999 Well then you lost. As far as i know on ARM a SIMD unit is only available from the later generation, the ARMv6.
1
@lohphat As AVX-512 becomes more widespread, programmers can be more confident that AVX-512 will be supported. And then at some point the support of AVX-512 will be chosen as the compile target as the lowest common denominator. Today this is the case with SSE2, for example, because every 64-bit x86 CPU can handle SSE2. What you can also do is simply create several different binaries with AVX-2, AVX-512 and without AVX support and then start a small program first when the user wants to start the program. This small program then checks which features the CPU supports. And only then is the corresponding binary started. This was done, for example, for the computer game "The Chronicles of Riddick: Escape from Butcher Bay" from 2004. AVX didn't exist at the time, but MMX, SSE, SSE2 and 3dNow! did. The corresponding binary was thus started according to their capabilities. Of course, this is only a small effort if the code is written in a high-level language and the compiler is advanced enough to optimize for the corresponding SIMD units. Otherwise the code or at least parts of it would have to be written manually for each SIMD unit type and that takes a lot of time and therefore money.
1
AVX support started 10 years ago with Haswell. I don't know about AMD. As soon as Windows 10 is no longer supported with updates, you can rely on AVX2 as a common denominator. Because Windows 11 does not officially support most of the old and broken SPECTRE and MELTDOWN CPUs. So if you want to use Windows 11 with official CPU support, you have to upgrade your CPU as customer anyway. And all new CPUs do all at least support AVX2.
1
Calling conventions are compiler specific and that's what the compiler expects when an extern function is used. Many different compilers adhere to a specific calling convention and as a developer there isn't much you can do about it because then you would have to change the compiler.
1
@gideonmaxmerling204 You could test it in BOCHS emulator. BOCHS does emulate AVX512 in software.
1
@WhatsACreel The reason why is, because for AVX2 they use two AVX2 units per core in a super scalar way. And with AVX512, these two units just work together as one. So in the end, the result is the same. But if you ask me, this is not important at the moment, because Compilers have to adapt anyway first.
1
A 486DX is only two times as fast as a 386DX at the same clock speed. Later the real improvements where mainly done with an increase in clock speed and the use of pipe-lining and SIMD instructions.
1
Intel burned their fingers with VLIW on Itanium 1 and 2. VLIW was therefore a dead end, just like the optimization to super high clock rates with the Pentium 4.
1
I never used the AES instructions, but i assume for some of them, it requires trusting them. Thus i would not use them.
1
I love that line. And the background to that is, if you can do that, you don't need to write a virus. You will also find a well-paid job without having to drift into the criminal corner to make a lot of money.
1
inc eax mov ebx, eax Does the same job as your code and requires less RAM.
1
@y2ksw1 Why should it? In my opinion it runs at the same speed. Your code might do mov ebx, eax inc eax in its own pipeline, but nop ; does nothing and inc ebx depends on the mov ebx, eax before.
1
@y2ksw1 You said: " it stalls and waits to settle just that tiny bit which doesn't allow to move the code to the other pipeline. " But doesn't that also apply to your mov ebx, eax and if not, why not? I assume that we can agree on that your "inc ebx" mustn't be executed before "mov ebx,eax" is finished. From my understanding your code may use both execution units, but I don't see why your code shouldn't stall too? Or do mean a mov is faster than an inc?
1
@y2ksw1 "mov and nop can execute together, and inc on two different registers, too. If I would try to use mov and inc together, mov had to wait for inc to finish." That wasn't my point, my point was, that your inc still have to wait for the mov. Let's assume we have a CPU with two execution units A and B and thus is able to do 2 instructions per step. 1. Step A does "mov ebx, eax" B does "nop" 2. Step A does "inc eax" B does "inc ebx" ; here B does have to wait until A in Step 1 is finished. And now my version: 1. Step A does "inc eax" B unused or a nop 2. Step A does "mov ebx, eax" ; here A does have to to wait, until A in Step 1 is finished B unused or a nop So in all variants it's 2 steps. Where is your performance gain? As far as I know it's the other way around. The U-pipe can execute any instruction in the Intel architecture while the V-pipe can execute only simple instructions. Source: The Book "Computerarchitektur" from Andrew S. Tanenbaum and James Goodman, ISBN 3-8273-7016-7. (it's the German version, that's why the word Computer Architecture is written differently.) But I agree to use the correct pipe for the appropriate instruction and then order the instruction accordingly if the pipes are different. I also agree on the rest what you said and it might explain why your code is faster, but in that case only because by luck the right unit was fed with the right instructions. What happens if you put a NOP in the instruction chain of my code to explicitly make sure, that the instructions units are fed with the correct instructions they can understand? For if statements, it is best to use branchless programming techniques on modern CPUs whenever possible. High level language compilers will do this automatically and optimize the code for it in most cases, but not in all cases. But i assume you know that already.
1
@y2ksw1 On what kind of CPU did you test it?
1
@y2ksw1 Thank you for your answer. Did you also use the MMX SIMD?
1
@y2ksw1 Thank you for your information. And did you also work with AMD processors and its 3DNow!?
1
@y2ksw1 Thank you for your reply. I have one additional question. You said the last CPU you worked with was the 686, which should be the Pentium Pro. The Pentium Pro introduced out-of-order execution. On a Pentium Pro with out-of-order execution, did it still make any sense to order the instructions manually as you showed above?
1
@Creel Could you make a comparison video between AMD's 3DNow! and Intel's SSE (1st SSE Version)? Which one was better? What were the pros and cons of each? Which was easier to use from a programming point of view, more powerful, more versatile etc.?
1
@glenwaldrop8166 Intel is planing to drop 8086 real mode support.
1
@glenwaldrop8166 You're welcome.
1
Same here
1
@WhatsACreel For those who want to know the generation, it's generation 10 and an Ice Lake.
1
1. Doing work on a dedicated GPU comes with a penalty. The bus is slow, thus data exchange between GPU, CPU and RAM is slow and latency high. For example in games things like physic effects that should affect the game play and not only be eye candy are better done on the CPU. 2. General code does not benefit from dedicated units like a GPU. If code is to use a dedicated unit, it must always be written specifically for the dedicated unit to be used. It's different with AVX-512, here it is enough to recompile the general code with an optimized compiler and it can already benefit from AVX-512 if AVX-512 can be used profitably for this.
1
You could use the preprocessor for this. It's available in C and C++.
1
@catchnkill I highly disagree. I hope AVX-512 will become mainstream in every x86-64 CPU. Reason: A highly optimizing compiler will use it when it's available, even in cases where a normal programmer would never think to use the SIMD unit for it. Special accelerator chips/cards are expensive, usually have a small market share, access to them is slow due to the BUS and programs have to be written specially for them. Because of the latter case, compilers cannot use them for normal programs written for the CPU. Normal program code does not benefit at all from such external accelerator chips. With AVX512 it's different because of compiler optimization. The only prerequisite is a wide adaptation of AVX512 in all x86-64 CPUs.
1
@catchnkill Apple was only faster for a very short time. Current x86-64 CPUs have long been faster than Apple CPUs again and this applies to both AMD and Intel CPUs. There's nothing wrong with that. It takes time to adapt the compilers to AVX-512. It also took a long time until x86-64 was properly supported and the extended registers of the 64-bit long mode could be used. Compilers are not created overnight, they need time to evolve. No, AVX-512 doesn't use energy if it is not used. Energy saving features are shutting down the units and its transistors that are not needed. I disagree, more cores will not improve single thread performance. But AVX-512 does when it's in use. As soon as AVX-512 becomes more widespread, i.e. there are more CPUs with this feature among end users, applications will use AVX-512 in the same way that SSE2 is used for many things today. And today's compiler uses SSE2 for things where a human programmer programming directly in assembler would never think of using SSE2. So ordinary tasks. This only affects certain CPU models. They probably made a mistake here and there, but future CPU generations won't have that problem.
1
@catchnkill This YT spamfilter is shadowbanning again, so I have to split my comment from just now into several parts to find the trigger word. Part 1 of 6 Apple was only faster for a very short time. Current x86-64 CPUs have long been faster than Apple CPUs again and this applies to both AMD and Intel CPUs.
1
@catchnkill Part 2 of 6 There's nothing wrong with that. It takes time to adapt the compilers to AVX-512. It also took a long time until x86-64 was properly supported and the extended registers of the 64-bit long mode could be used. Compilers are not created overnight, they need time to evolve.
1
@catchnkill Part 3 of 6 No, AVX-512 doesn't use energy if it is not used. Energy saving features are shutting down the units and its transistors that are not needed.
1
@catchnkill Part 4 of 6 I disagree, more cores will not improve single thread performance. But AVX-512 does when it's in use.
1
@catchnkill Part 5 of 6 As soon as AVX-512 becomes more widespread, i.e. there are more CPUs with this feature among end users, applications will use AVX-512 in the same way that SSE2 is used for many things today. And today's compiler uses SSE2 for things where a human programmer programming directly in assembler would never think of using SSE2. So ordinary tasks.
1
@catchnkill Part 6 of 6 This only affects certain CPU models. They probably made a mistake here and there, but future CPU generations won't have that problem.
1
@catchnkill BTW. if you search for "AVX 512 vs. off" comparison videos here on YT you will see, that AVX-512 will improve in games the framerate performance. This is a clear sign, that AVX-512 increases single thread performance, like i said. You can't do that with more cores. And it's already well known, that more cores don't scale well in games.
1
You've already made an assumption here, using a specific compiler. On the other hand, if you use a compiler that is optimized for the use of fast calls and 68k, then it can look different.
1
@Creel 20:24 Did you notice, that it rounded myFloats[0] = 1.5 to 2, but myFloats[4] = 0.5 to 0? I would consider that strange. If a value is x.5 i would always expect to round the value upwards.
1
There is a WP article about calling conventions the article is called "x86 calling conventions". This should give a nice overview.
1

Previous
1
Next
...
All