I can understand the reasoning for video, where any spatial seperation might have a context, but for music? Any pronounced and deliberate seperation in stereo or multi channel recordings relies on the fact that those channels are reproduced from physically distant points, ie 5.1 channel recordings need to have the 5 speakers placed in the correct orientation and at a suitable distance from each other, stereo speakers should be placed as far apart as possible. If not, then the brain is not able to make the distinction, using headphones or the relatively close together built in speakers on a phone negates all but the most pronounced effects. So in short, it don't matter.