ref: 77311e0dff5c5f561e0f6480bad5e276e22bf4ca
parent: 2fa710aa6db08aabd00f139274780e9300e815f1
author: Linfeng Zhang <linfengz@google.com>
date: Tue Mar 7 08:06:06 EST 2017
Update vpx_idct32x32_1024_add_neon() Most are cosmetics changes. Speed has no change with clang 3.8, and about 5% faster with gcc 4.8.4 Tried the strategy used in 8x8 and 16x16 (which operations' orders are similar to the C code), though speed gets better with gcc, it's worse with clang. Tried to remove store_in_output(), but speed gets worse. Change-Id: I93c8d284e90836f98962bb23d63a454cd40f776e