(emphasis mine)
That's exactly what wunderbar was saying; everyone's usage is different and thus screen on time is a poor metric. Having a range of 3-6 hrs SoT is a huge range without knowing exactly what each person was doing on the phone and the apps they run every minute of their usage.
Then in that case there is no point in doing tests, because no one person will use their device the same as another. A streaming test won't be accurate, because the signal strengths might not be the same from one person to another. Because of the infinite number of factors, personal usage, core phone settings, apps installed and their individual settings, environmental factors like signal strength and atmospheric conditions, etc, etc, there is no way to get a measurement that is 100% accurate for everyone, it's just not possible.
But you have to start somewhere. I've had people tell me that tests like those done on PhoneArena aren't valid because their tests aren't as scientific as those on a site like Anandtech. While that may be true, it does give you a starting point. A rundown test may not tell you exactly how long it will last, but it does give you an idea of how battery life will compare to another device. But otherwise there would be no tests they could do that would be relevant because everyone uses their devices differently.
The same applies to gas mileage in cars. Every person drives differently, so the tests they do that say this model will get X amount of mileage may not be what you'll see, but it gives you a rough idea. The only way you'll truly know how much mileage you'll get is by running it yourself, same with a smartphone 's battery life. But you have to start somewhere.