I watched a video from Geoffrey Hinton, who explained why this happens.

In 2019, when I first learned about Neural Network, I tested the robustness of neural networks by adding random noise. To my surprise, if the data is big enough, even with 80% of random error, the neural nets could get 95% accuracy. [doc]

Untitled

Today, in this video, Geoffrey indicates that “the rule of thumb is basically what counts is the mutual information between the assigned label and the truth. That tells how valuable your training example is.”

How many flaky ones can we have, how about the size?

Geoffrey says if there’s like a 1/50 of the mutual information and we have 50 times as many examples, can we have the same performance? Yes, you do. If the training sets are 2*50 times as many.

Let me do a test on our dataset, GIGO maybe not be correct for Neural Nets.

In the future, I would like to find more research papers for explaining this.