Fix invalid right shift count with recent ifcvt changes

I got too clever trying to simplify the right shift computation in my recent
ifcvt patch.  Interestingly enough, I haven't seen anything but the Linaro CI
configuration actually trip the problem, though the code is clearly wrong.

The problem I was trying to avoid were the leading zeros when calling clz on a
HWI when the real object is just say 32 bits.

The net is we get a right shift count of "2" when we really wanted a right
shift count of 30.  That causes the execution aspect of bics_3 to fail.

The scan failures are due to creating slightly more efficient code. THe new
code sequences don't need to use conditional execution for selection and thus
we can use bic rather bics which requires a twiddle in the scan.

I reviewed recent bug reports and haven't seen one for this issue.  So no new
testcase as this is covered by the armv7 testsuite in the right configuration.

Bootstrapped and regression tested on x86_64, also verified it fixes the Linaro
reported CI failure and verified the crosses are still happy.  Pushing to the
trunk.

gcc/
	* ifcvt.cc (noce_try_sign_bit_splat): Fix right shift computation.

gcc/testsuite/
	* gcc.target/arm/bics_3.c: Adjust expected output
This commit is contained in:
Jeff Law
2025-08-24 19:55:44 -06:00
parent e855cd3b44
commit 56ca14c4c4
2 changed files with 10 additions and 6 deletions

View File

@@ -1283,13 +1283,15 @@ noce_try_sign_bit_splat (struct noce_if_info *if_info)
one with smaller cost. */
rtx and_form = gen_rtx_AND (mode, temp, GEN_INT (val_a));
rtx shift_left = gen_rtx_ASHIFT (mode, temp, GEN_INT (ctz_hwi (val_a)));
rtx shift_right = gen_rtx_LSHIFTRT (mode, temp, GEN_INT (ctz_hwi (~val_a)));
HOST_WIDE_INT rshift_count
= (clz_hwi (val_a) & (GET_MODE_PRECISION (mode).to_constant() - 1));
rtx shift_right = gen_rtx_LSHIFTRT (mode, temp, GEN_INT (rshift_count));
bool speed_p = optimize_insn_for_speed_p ();
if (exact_log2 (val_a + 1) >= 0
&& (rtx_cost (shift_right, mode, SET, 1, speed_p)
<= rtx_cost (and_form, mode, SET, 1, speed_p)))
temp = expand_simple_binop (mode, LSHIFTRT, temp,
GEN_INT (ctz_hwi (~val_a)),
GEN_INT (rshift_count),
if_info->x, false, OPTAB_WIDEN);
else if (exact_log2 (~val_a + 1) >= 0
&& (rtx_cost (shift_left, mode, SET, 1, speed_p)
@@ -1333,13 +1335,15 @@ noce_try_sign_bit_splat (struct noce_if_info *if_info)
one with smaller cost. */
rtx and_form = gen_rtx_AND (mode, temp, GEN_INT (val_b));
rtx shift_left = gen_rtx_ASHIFT (mode, temp, GEN_INT (ctz_hwi (val_b)));
rtx shift_right = gen_rtx_LSHIFTRT (mode, temp, GEN_INT (ctz_hwi (~val_b)));
HOST_WIDE_INT rshift_count
= (clz_hwi (val_b) & (GET_MODE_PRECISION (mode).to_constant() - 1));
rtx shift_right = gen_rtx_LSHIFTRT (mode, temp, GEN_INT (rshift_count));
bool speed_p = optimize_insn_for_speed_p ();
if (exact_log2 (val_b + 1) >= 0
&& (rtx_cost (shift_right, mode, SET, 1, speed_p)
<= rtx_cost (and_form, mode, SET, 1, speed_p)))
temp = expand_simple_binop (mode, LSHIFTRT, temp,
GEN_INT (ctz_hwi (~val_b)),
GEN_INT (rshift_count),
if_info->x, false, OPTAB_WIDEN);
else if (exact_log2 (~val_b + 1) >= 0
&& (rtx_cost (shift_left, mode, SET, 1, speed_p)

View File

@@ -33,5 +33,5 @@ main (void)
return 0;
}
/* { dg-final { scan-assembler-times "bics\tr\[0-9\]+, r\[0-9\]+, r\[0-9\]+" 2 } } */
/* { dg-final { scan-assembler-times "bics\tr\[0-9\]+, r\[0-9\]+, r\[0-9\]+, .sl #2" 1 } } */
/* { dg-final { scan-assembler-times "bic\tr\[0-9\]+, r\[0-9\]+, r\[0-9\]+" 2 } } */
/* { dg-final { scan-assembler-times "bic\tr\[0-9\]+, r\[0-9\]+, r\[0-9\]+, .sl #2" 1 } } */