ref: bf7a4786bf0d097ca7e36e133d17d825ac4552be
parent: 04b70ea56d3258bedef3002ea877cc90277e5ab2
author: Ronald S. Bultje <rsbultje@gmail.com>
date: Fri Oct 5 12:30:50 EDT 2018
Rewrite horizontal loopfilter Loop inside SIMD (instead of in the caller) so that we can handle multiple 4px blocks per iteration, allowing for more efficient SIMD. To make this easier, also transpose the masks for the hor filter.