I drew no conclusion at all, I only gave the measured performance differences. There's a difference between giving calculations and coming up with an explanation based on some analysis.
To compare < 2 days worth of data against ~20 days worth of data and draw a conclusion doesn't sound realistic. The disparity in the sample size is pretty significant, so it doesn't matter what approach is taken to compile the data IF the goal is to reach a conclusion. Also there are so many factors that can impact the results of a synthetic benchmark test, that using these numbers alone for making such a determination is kind of odd.
Posted via the Android Central App
I don't understand this reasoning. It's not like the benchmark's expected value and standard deviation will change if we wait a few more days. We already have a good sample size.