Here is the principle behind tests of statistical significance.
There are two dice. One is given to you, one to me. We each roll just a handful of times and check our own average: somewhere from 1 to 6. With such a small number of rolls, your average and my average could be quite far apart. Maybe 3 vs. 5. Maybe 2 vs. 4.5. Even so, we’d probably trust that the dice were both the same.
Now, suppose we each rolled thousands of times. Randomness, chance, works according to certain known rules. In thousands of rolls, differences ought to get smoothed out. Your average and mine should be very, very close together. Maybe 3.51 vs. 3.48.
If they are not very close, most anyone observing would conclude: “Something else besides chance must have been inserted into the process. The dice must not be the same; this isn’t the sort of difference chance alone would produce.” Thus they would call the difference “statistically significant.”