Comments by "Siana Gearz" (@SianaGearz) on "Simulating the World To Train AI" video.
-
5
-
1
-
@weksauce Spatial reasoning bound image recognition is a very hard task, which requires by all reason a large neural network. The network needs to generalise from the data set. When the data set is too small for the size of the network, the network generally learns it verbatim instead or focuses on detection of secondary coincidental traits.
Problem, we don't have a great understanding and manual access into the function of neural networks. As to ad hoc, hardcoded solutions, they tend to be very difficult to integrate, except specifically using synthetic data. But i would agree a handcrafted solution should still be present as a failsafe running alongside neural network subsystems, specifically because we have such a mediocre grasp on them. But also the two systems should ideally mostly agree rather than fight, as that's a source of danger as well.
Synthetic data also helps solve the chicken and egg problem. Like yes ideally you'd have more real world data, and predominantly more diverse real world data with better corner case coverage. But the raw data we can get easily isn't marked up. You can train the markup on synthetic data, and human supervision and correction is orders of magnitude cheaper than manual markup from ground up. Then you can gradually increase the amount of real world data used in training.
1