Here's how I've heard it explained. May not be accurate. Assume 1 second of processing time just to even out the example.
Let's say a single core uses uses x power at 300mhz to do some task. That uses 1 unit of power.
Let's say there is a more difficult task. That would require the single core to run at 600mhz. The power consumption is not double the 300mhz task, but it's X4 or squared or something like that. It's not linear. It's more parabolic. Let's say it's X4 for this example. Now instead of using 1 unit at 300mhz, you are using 4 units at 600mhz.
If you have two cores, they can each chug along at 300 mhz. Two @ 300 mhz means you only used 2 units of power, compared with 4 units of power for the single core example.
Ignore the specific variables I used. The idea is that there is exponential power consumption as the processors work harder. More processors working slower means better power conservation than one running fast.
Then again, I could be way off