| || |
1. How mirroring works
There are two types of mirroring, hardware and software. Hardware mirroring is when the GPU is given instructions on how to build one source (what we refer to as frame buffer) and two targets (what you would call a display or panel). It is possible that the EVO's GPU supports hardware mirroring, but without technical specifications from Qualcomm about how to directly program the chip, it is unlikely that we would stumble upon hardware mirroring by accident. Software mirroring, or cloning, is when you take two source and two targets, and somehow copy the contents of one source onto the other. While this may sound simple (cp /dev/graphics/fb0 /dev/graphics/fb1), it seldom works in practice. The reason for the difficulty is that seldom do the two targets share the same format. Sometimes, it's just a variation in timings (for instance, 59.950Hz vs 59.230Hz) or it may be a drastic variation (like the EVO's panel of 480x800 and the EVO's max HDMI resolution of 1280x720). This makes direct copying problematic. To make matters even more difficult, the EVO's GPU is programmed to handle output in RGB565 format, a format which dedicates 5 bits for red, 6 bits for green, and 5 bits for blue. This is a normal 16-bit format. Many app developers will tell you that it uses a 32-bit format, but the reality is that 32-bit frame buffer actually gets translated to RGB565 by the hardware during render. To keep memory requirements down for HDMI, the HDMI framebuffer is natively in RGB565. It *can* be switched to 32-bit color, but this encounters other issues such as memory bandwidth and memory consumption. If you want more details about this, just ask, I'd be glad to share as much technical detail as you'd like.
So how do you transfer the image from one source to the other? There are two normal methods used. The first, which has gone mostly out of style in traditional GPUs is called a Hardware Overlay. Overlays have strange behavior in that they represent a unique carve-out of memory which acts as an independent frame buffer and can display data in a different format than the rest of the same display. The other method used is the bit-block image transfer, for blit. A blit is a method of performing multiple operations on a source image while transferring it to a destination location. This includes rotation, scaling, clipping, and even masking operations. And the entire operation is done in hardware.
2. Issues with the existing HDMI implementation
It sounds like roses and perfumes above, why didn't you just use the blit? Well, the problem lies in software implementation. When Qualcomm released the source code for its MSM video driver, it was designed to do all the operations that its customers had requested and that Google had felt fit into the AOSP branch. Unfortunately, this included some needless limitations. The limitation in question was in a routine called get_img. This routine prepares for a blit operation by verifying the source and destination data locations, and gathering details about them. The problem with this routine is that it assumes if you're talking to a frame buffer, you're talking about it's current frame buffer. So the blit routine can never make a request to blit from one frame buffer to another. There's no such hardware limitation, it was just a minor issue with the kernel.
3. Fixes done to enable smooth HDMI mirroring
So now that we can blit from fb0 to fb1, applying rotation and scaling, what was all the talk about "smooth" mirroring? And why did people keep talking about this dreadful "tearing" that I heard about/saw/hated/laughed at. Well, this is another problem with how GPUs work, and in particular, the GPU on the EVO. The GPU on the EVO has no dedicated memory. When I say this, I don't mean the kernel doesn't carve out memory dedicated to the GPU. I mean the GPU doesn't have it's own physical memory that is designed to support graphics. To better understand why this is a problem, you need to understand how an image appears on your display. There is a rendering side to every image, where some component (like SurfaceFlinger) decides what needs to be shown, what colors it should be, and where to place them. This happens into a final frame buffer, which represents what the user should see on their panel. The second side is the scan-out. This happens at whatever refresh rate your display is running, and cannot stop. The scan-out keeps scanning over the same frame buffer and showing its contents on the screen, regardless of whether it's changed. If it stops, the display will go blank because it no longer knows what to display. The "tearing" issue which plagued everyone in the beginning was actually an "lcdc underflow" failure. What that means is the scan-out for HDMI was locked out of memory momentarily, and ran out of data to send to the display. This is a serious concern, because it can lead to truncated clock signals and failure to maintain link. Normally, when users reported this, they saw graphic anomalies which appeared like someone "tearing" the image apart on their screen.
So how did we fix it? The solution lies in understanding the real problem, concurrent memory access. The EVO was never designed for so much GPU activity against the graphics memory, and so there is insufficient bandwidth to the carve-out to handle multiple simultaneous DMA operations on the frame buffers. So the fix was simply to move the blit operation to the tail end of the DMA operation used to update the screen. This had three benefits, and one drawback. The first benefit was that we only blit when the screen is actually updated, and not at some pre-defined interval. The second benefit was that we never collide with the update, because the update is blocked on both the DMA operation and blit operation. The third benefit was that we had guaranteed frame rendering. This means that any frame drawn to the panel was shown via HDMI. No frames are ever lost. The drawback was that we added the blit time to the update operation, causing a loss of measured frames per second. This could be resolved a bit by using a queue for the blit operation after the DMA, and a follow-up update would block until the previous blit had completed. This would allow apps to continue the rendering pipeline while the blit was occurring in hardware.
4. Why is root required?
Due to the direct access needed to both frame buffers, root access is required. Or to put this easily, would you want any app you downloaded from the market to be able to read the screen of your phone without your permission? Unfortunately, Google has not created a permission for accessing the frame buffers. There is a design in place for one, but it isn't actually "wired" to anything. Or in other words, it don't work.
5. What does the daemon do?
The daemon (hdmid) enables and disabled the mirroring, as well as does all the math required to turn requests like "center scaling" into blit rectangles. The daemon also monitors for HDMI blitting to fail (such as when you turn off your device) and stop the kernel from attempting to mirror. This is to save battery life while the device is asleep. The daemon also allows for placing banners on the screen, such as the ones shown during the demo videos. The banners are fully customizable.
6. How do I talk to the daemon?
The daemon uses a TCP listening port to talk to clients. The default port is 61083, but this may be unavailable, at which point the daemon will find the next available port. You can find out which port the daemon is running on by reading the file /data/hdmid/state and looking for port=xxx. The state file is semicolon delimited, and will have a semicolon at the end of the line.
After every command is sent, a return status will be sent as the very first character of the string. So the enable command, which is 'e', would return '1' if successful.
The following commands are available:
- e - Enable HDMI mirroring
- d - Disable HDMI mirroring
- o[value between 0 and 255] - Set overscan value
- r[value between 0 and 3] - Set rotation. 0 = portrait, 1 = landscape, 2 and 3 are inverted portrait and landscape
- s[value between 0 and 2] - Set scaling. 0 = default, 1 = centered, 2 = stretched
- v[value between 0 and 3] - Set vsync control. 0 = wait on vsync for all, 1 = wait on vsync for panel, 2 = wait for vsync on HDMI, 3 = no vsync
- t - Dump statistics about the blitter
- l - Return a list of valid resolutions from the EDID. List is returned following the status byte
- m - Returns blitter status as command status. 0 = blitter not running, 1 = blitter running
- ! - Hard reset of HDMI and blitter. This will force disable HDMI
- q - Quit the daemon, disabling the blitter as well
- c - Cable status. Returns cable status as command status. 0 = Cable not detected, 1 = TV connected
- h - Hex dump of EDID, returned following the status byte
- a[filepath] - Banner setup. 'default' will restore the default value.
- a[number] - Replace banners with a solid color. Color is (red * 256 * 256) + (green * 256) + (blue * 256)
- x[value] - Set resolution. If no resolution is provided, the best match is used. Otherwise, the requested resolution is done as [width]x[height]x[bpp]. Due to driver limitations, bpp must be 16.
- n - Returns the version number of the hdmid daemon.
Multiple commands can be queued together with the '|' separator, although the first failed command will end the execution. Return status is the status of all executed commands.
7. Can I write my own UI?
Of course you can! While we do provide a UI which attempts to keep up with all the supported features, you can create any custom UI you'd like, or even embed the controls directly into an application or ROM settings. We are not restricting anyone from using this software.
8. Can the EVO do 32-bit color?
This is a tricky question. The SurfaceFlinger handles 32-bit color, and the driver can be configured for 32-bit color. But it's not actually delivering 32-bit color. This can be verified by writing a gradient pattern of either red or blue, and counting the color bands.
9. I've heard other HDMI software is using your accelerated kernel design, can I do this?
Of course! Most of the work is done in the kernel for you. But the following IOCTLs are the key to controlling the accelerated kernel's mirroring support.
- HDMI_GET_MIRRORING / HDMI_SET_MIRRORING : Takes in an unsigned value. 0 = disable, 1 = enable
- HDMI_GET_STATISTICS : Returns statistics about the blitter, even if mirroring is disabled
- HDMI_CLEAR_STATISTICS : Clears the statistics about the blitter. Useful to use when tracking a benchmark only
- HDMI_GET_VSYNC_MODE / HDMI_SET_VSYNC_MODE : Take in an unsigned value, the same as the 'v' command for the daemon
- HDMI_BLIT : This isn't a new IOCTL, but this must be called at least once to set the blitter parameters. For memory_id, pass in the value of 0x40000000 + framebuffer# instead of a real file descriptor
10. How does the kernel-free version work?
When we were still trying to get integrated into every kernel, we got a lot of negative feedback that, unlike other products, we required the custom kernel to operate. So I started looking at how we could solve the issue without the necessary fixes to the blitter. At first, I attempted doing the blitter conversions in software, but this was extremely costly in terms of performance, so that method failed. Then, I looked at how much memory we had available, and figured out that we could directly copy the fb0 memory to an fb1 backbuffer. This worked better, as I could use the blitter to translate the fb0 data residing in fb1 memory to fb1's front buffer. The problem was, we had awful tearing because we couldn't sync the blitter and DMA channel. so we then squeezed in another trick, which is what restricted us to 800x600. We blit the data from the first portion of the backbuffer to the second portion of the backbuffer. Then we copy the second portion of the backbuffer to the frontbuffer. This greatly reduces tearing because the blit operation can get stalled without the user seeing it. Direct copying of memory is very quick, so it seldom tears during that portion of the operation.
To understand the memory requirement, let's take a direct look at what memory we need to do this.
- Panel is 480x800x32 = 1,536,000 bytes (remember, 32 bits = 4 bytes)
- HDMI is 1280x720x16 = 1,843,200 bytes
- HDMI can do 800x600x16 = 960,000 bytes
So to do 720p, we'd need 1,843,200 + 1,536,000 + 1,843,200 bytes = 5,222,400 bytes! Ouch, we don't have that.
But to do 800x600, we only need 960,000 + 1,536,000 + 960,000 bytes = 3,456,000 bytes. We have that!
So that's how we get accelerated-kernel-free to work, and why it's restricted to 800x600.
11. What does the app mean by "Download a new module"?
The application has the ability to check the current version of the HDMI control daemon. This is the daemon responsible for most of the features delivered by HDMwIn. If the version detected is not the latest, you will be prompted to get the latest build. It's always recommended to get the latest build to reduce the likelihood of encountering an already fixed issue.
Items still in need of documentation:
- How did you get HDMI audio to work on AOSP?