View Single Post
  #510  
Old 19th September 2013, 04:00 AM
HatCat's Avatar
HatCat HatCat is offline
Alpha Tester
Project Supporter
Senior Member
 
Join Date: Feb 2007
Location: In my hat.
Posts: 16,256
Default

Quote:
Originally Posted by MarathonMan View Post
Not trying to trollolol, just sayin' vectorization is a world of hurt sometimes when relying on the compiler.
Actually it looks like the only vectorization that got hurt in GCC 4.8.2, that was working fine in 4.7.2, was basic accumulator write-back.

VSAW-middle:
Code:
static void VSAWM(void)
{
    const int vd = inst.R.sa;

    memcpy(VR[vd], VACC_M, N*sizeof(short));
    return;
}
Code:
_VSAWM:
LFB1159:
	.cfi_startproc
	movzwl	_inst, %eax
	movl	_VACC+16, %ecx
	shrw	$6, %ax
	andl	$31, %eax
	sall	$4, %eax
	leal	_VR(%eax), %edx
	movl	%ecx, _VR(%eax)
	movl	_VACC+20, %eax
	movl	%eax, 4(%edx)
	movl	_VACC+24, %eax
	movl	%eax, 8(%edx)
	movl	_VACC+28, %eax
	movl	%eax, 12(%edx)
	ret
	.cfi_endproc
Even Microsoft Visual Studio is intelligent enough to vectorize this simple memcpy as SSE2.

So it should hardly be a "world of hurt" for the latest GCC to do it, just some temporary bug that hopefully goes away in later versions.

I don't know why 4.8.2 has the bug.
I've tried everything to get it to make it MOVDQA the ACC_M over to VR[vd], and the only solution that works besides downgrading to 4.7.2 is changing the definition of the accumulator array from (short) to (unsigned short), for some stupid-ass reason.
Reply With Quote