ref: ed22179a82700f4f0ba58030e2f3af2c17a02e52
parent: 03698aa6d8dc10e526955f4b516799e023663b4d
author: Yunqing Wang <yunqingwang@google.com>
date: Wed Oct 2 13:26:01 EDT 2013
Rewrite HORIZx4 and HORIZx8 in subpixel filter functions In subpixel filters, prefetched source data, unrolled loops, and interleaved instructions. In HORIZx4, integrated the idea in Scott's CL (commit: d22a504d11a15dc3eab666859db0046b5a7d75c5), which was suggested by Erik/Tamar from Intel. Further tweaking was done to combine row 0, 2, and row 1, 3 in registers to do more 2-row-in-1 operations until the last add. Test showed a ~2% decoder speedup. Change-Id: Ib53d04ede8166c38c3dc744da8c6f737ce26a0e3