Reduce input sizes to 8 (to test that the function works with widths smaller than the vector length) and 1920 (raising the largest input size to improve benchmark results).