Go Back   Project64 Forums > General Discussion > Open Discussion

Reply
 
Thread Tools Display Modes
  #971  
Old 22nd September 2014, 04:55 AM
RPGMaster's Avatar
RPGMaster RPGMaster is offline
Alpha Tester
Project Supporter
Super Moderator
 
Join Date: Dec 2013
Posts: 2,008
Default

Sorry my post was confusing, i was in a rush . I meant to say I tried different combinations. Because at first, i thought the issue was angrylion's plugin. I've tried different lle video plugins with each rsp.

You don't need a savestate, because it's the intro.

I've been pretty busy myself. Good practice and it's been fun though. I guess I'll just have to experiment with pj64 2.1, for WDC.

So far I've tried a few different commits and went back to one made in september 27 2013. It's still bugged in that one, but the one in this thread http://forum.pj64-emu.com/showthread.php?t=3959 appears to work.
Reply With Quote
  #972  
Old 22nd September 2014, 05:00 AM
HatCat's Avatar
HatCat HatCat is offline
Alpha Tester
Project Supporter
Senior Member
 
Join Date: Feb 2007
Location: In my hat.
Posts: 16,236
Default

Sounds like it may be a twin sister bug to the 9-11-2013 introduced regression angrylion saw.

I'll look into it.
Reply With Quote
  #973  
Old 24th September 2014, 09:37 AM
RPGMaster's Avatar
RPGMaster RPGMaster is offline
Alpha Tester
Project Supporter
Super Moderator
 
Join Date: Dec 2013
Posts: 2,008
Default

Rofl, I wanted to practice analyzing algorithms and learning how to find flaws. I kinda gave up half way, with VCH. I probably would have figured it out by now, if I was more aware of MSVC's debugger bugs ;/ . Freakin annoying how I had to force each variable to be global or static, otherwise the debugger wouldn't show the correct values for the arrays. That wasted quite a bit of my time. Then I tried writing a test program, but somehow failed with that, so I went back and just recorded the data. I need to take a break from this and watch some anime .

The main problem is le[]. You'll need to fix that algorithm.

here's what I recorded.
Code:
ST
[0x0]	0xfd34	short
[0x1]	0x0eaa	short
[0x2]	0x7669	short
[0x3]	0x0a97	short
[0x4]	0xb2a0	short
[0x5]	0x0a99	short
[0x6]	0xfd20	short
[0x7]	0x0eaa	short

VS
[0x0]	0x0000	short
[0x1]	0x1810	short
[0x2]	0x8000	short
[0x3]	0x0640	short
[0x4]	0x0000	short
[0x5]	0x0190	short
[0x6]	0xe264	short
[0x7]	0x955c	short

VC
[0x0]	0x0001	short
[0x1]	0x0001	short
[0x2]	0xffff	short
[0x3]	0x0001	short
[0x4]	0x0001	short
[0x5]	0x0001	short
[0x6]	0xffff	short
[0x7]	0xffff	short

sn
[0x0]	0x0000	short
[0x1]	0x0000	short
[0x2]	0x0001	short
[0x3]	0x0000	short
[0x4]	0x0000	short
[0x5]	0x0000	short
[0x6]	0x0001	short
[0x7]	0x0001	short

Proper results for comp
[0x0]	0x0000	short
[0x1]	0x0000	short
[0x2]	0x0001	short
[0x3]	0x0000	short
[0x4]	0x0000	short
[0x5]	0x0000	short
[0x6]	0x0001	short
[0x7]	0x0001	short

Improper results for comp
[0x0]	0x0000	short
[0x1]	0x0000	short
[0x2]	0x0000	short
[0x3]	0x0000	short
[0x4]	0x0000	short
[0x5]	0x0000	short
[0x6]	0x0001	short
[0x7]	0x0001	short
Anyway, I'm sooo happy with the progresss I've made. Results actually surpassed my expectations. Didn't even need to use SSSE3 instructions to get the results I wanted . Only thing I can really complain about is the amount of time I spent, fixing silly mistakes, and also time I wasted due to trying to cut corners. Working on something, super error prone, has made me appreciate accuracy so much more! That's probably the next thing I will focus on.

LLE can be fast if done right. I can always expand and keep adding more optimizations. I don't even need HLE audio anymore . Dunno why people hype HLE Musyx so much.
Reply With Quote
  #974  
Old 24th September 2014, 07:00 PM
HatCat's Avatar
HatCat HatCat is offline
Alpha Tester
Project Supporter
Senior Member
 
Join Date: Feb 2007
Location: In my hat.
Posts: 16,236
Default

It was indeed a twin sister regression.
Just like the other one found by angrylion, it's a corner-case bug.

On 2's compliment Intel x86 given a 16-bit register x,
-x is ~x + 1, so -32768 is actually ~(0x8000) + 1 = -32768 itself.
An old, easy memory to be distracted from. Similar to the correction I pointed out in early cen64 rsp for the SSSE3 emulation of VABS.

I can't promote the shorts to ints without breaking SSE2 optimizations and contradicting the purpose behind the commit that clipped triangles in Wrestlemania anyway, so I have a better, faster idea.

Code:
#if (0)
    for (i = 0; i < N; i++)
        le[i] = sn[i] ? (VS[i] <= VC[i]) : (VC[i] < 0);
    for (i = 0; i < N; i++)
        ge[i] = sn[i] ? (VC[i] > 0x0000) : (VS[i] >= VC[i]);
#elif (0)
    for (i = 0; i < N; i++)
        le[i] = sn[i] ? (VT[i] <= -VS[i]) : (VT[i] <= ~0x0000);
    for (i = 0; i < N; i++)
        ge[i] = sn[i] ? (~0x0000 >= VT[i]) : (VS[i] >= VT[i]);
#else
    for (i = 0; i < N; i++)
        diff[i] = -VS[i] | -(sn[i] ^ 1);
    for (i = 0; i < N; i++)
        le[i] = (VT[i] <= diff[i]);
    for (i = 0; i < N; i++)
        diff[i] = +VS[i] | -(sn[i] ^ 0);
    for (i = 0; i < N; i++)
        ge[i] = (diff[i] >= VT[i]);
#endif
It's a good thing I wrote that chain of macro #if's. It wasn't just a "just in case my code is broken" thing, but also a gradual step-by-step process of showing my homework of how I eventually arrived to the code at the bottom.

Instead, I'll do this now.
Code:
...
#else
    for (i = 0; i < N; i++)
        diff[i] = sn[i] | VS[i];
    for (i = 0; i < N; i++)
        ge[i] = (diff[i] >= VT[i]);

    for (i = 0; i < N; i++)
        sn[i] = (unsigned)(sn[i]) >> 15; /* ~0 to 1, 0 to 0 */

    for (i = 0; i < N; i++)
        diff[i] = VC[i] - VS[i];
    for (i = 0; i < N; i++)
        diff[i] = (diff[i] >= 0);
    for (i = 0; i < N; i++)
        le[i] = (VT[i] < 0);
    merge(le, sn, diff, le);
#endif
...with an overall difference of 2 SIMD instructions fewer generated in exchange for these lower-level steps.

The `merge' function, on the other hand, is probably the only negative compensation, although outweighed by the rest of the algorithm rewrites and greater positive compensation. No function call here, just a maintainable method to statically execute a ternary ? conditional : statement without any branch prediction.

Code:
static INLINE void merge(short* VD, short* cmp, short* pass, short* fail)
{
    register int i;
#if (0)
/* Do not use this version yet, as it still does not vectorize to SSE2. */
    for (i = 0; i < N; i++)
        VD[i] = (cmp[i] != 0) ? pass[i] : fail[i];
#else
    short diff[N];

    for (i = 0; i < N; i++)
        diff[i] = pass[i] - fail[i];
    for (i = 0; i < N; i++)
        VD[i] = fail[i] + cmp[i]*diff[i]; /* actually `(cmp[i] != 0)*diff[i]` */
#endif
    return;
}
The overall result for one of the RSP's 2 most complicated vector instructions:
Code:
_VCH:
	pushl	%ebp
	movl	%esp, %ebp
	pushl	%ebx
	andl	$-16, %esp
	subl	$144, %esp
	movl	8(%ebp), %eax
	movl	12(%ebp), %edx
	movl	16(%ebp), %ebx
	movl	20(%ebp), %ecx
	sall	$4, %ebx
	movdqu	_VR(%ebx), %xmm4
	sall	$4, %ecx
	movdqu	_smask(%ecx), %xmm0
	pshufb	%xmm0, %xmm4
	sall	$4, %edx
	movdqu	_VR(%edx), %xmm2
	movdqa	%xmm2, %xmm5
	pxor	%xmm4, %xmm5
	psraw	$15, %xmm5
	movdqa	%xmm5, %xmm6
	pxor	%xmm4, %xmm6
	movdqa	LC0, %xmm3
	movdqa	%xmm2, %xmm0
	pcmpeqw	%xmm6, %xmm0
	pand	%xmm3, %xmm0
	pand	%xmm5, %xmm0
	movdqa	%xmm0, _vce
	psubw	%xmm5, %xmm6
	movdqa	%xmm2, %xmm1
	pcmpeqw	%xmm6, %xmm1
	pand	%xmm3, %xmm1
	por	%xmm0, %xmm1
	movdqa	%xmm2, %xmm0
	por	%xmm5, %xmm0
	movdqa	%xmm4, %xmm7
	pcmpgtw	%xmm0, %xmm4
	pandn	%xmm3, %xmm4
	psraw	$15, %xmm5
	movdqa	%xmm6, %xmm0
	psubw	%xmm2, %xmm0
	movdqa	%xmm0, (%esp)
	pxor	%xmm0, %xmm0
	pcmpgtw	(%esp), %xmm0
	pandn	%xmm3, %xmm0
	psrlw	$15, %xmm7
	psubw	%xmm7, %xmm0
	pmullw	%xmm5, %xmm0
	paddw	%xmm7, %xmm0
	movdqa	%xmm0, %xmm7
	psubw	%xmm4, %xmm7
	pmullw	%xmm5, %xmm7
	paddw	%xmm4, %xmm7
	psubw	%xmm2, %xmm6
	pmullw	%xmm7, %xmm6
	paddw	%xmm2, %xmm6
	movdqa	%xmm6, _VACC+32
	sall	$4, %eax
	movdqa	%xmm6, _VR(%eax)
	movdqa	%xmm4, _clip
	movdqa	%xmm0, _comp
	pxor	%xmm3, %xmm1
	movdqa	%xmm1, _ne
	movdqa	%xmm5, _co
	movl	-4(%ebp), %ebx
	leave
	ret
With of course, the usual fundamental rewrites in need, such as promoting the intelligent passing of XMM registers on the call stack.

Last edited by HatCat; 24th September 2014 at 07:26 PM.
Reply With Quote
  #975  
Old 24th September 2014, 07:33 PM
RPGMaster's Avatar
RPGMaster RPGMaster is offline
Alpha Tester
Project Supporter
Super Moderator
 
Join Date: Dec 2013
Posts: 2,008
Default

Wow nice! It prolly woulda took me hours to come up with a good method. I guess I should worry about those kind of things later. That will be a fun thing to do in the future. Lol that's great how the output is a bit shorter than the original one too .
Reply With Quote
  #976  
Old 29th September 2014, 01:36 PM
hellbringer616 hellbringer616 is offline
Junior Member
 
Join Date: Jul 2012
Posts: 20
Default

Quick question Hatcat, Do you ever intend on supported Mupen64Plus? I ask because Gonetz is going to be releasing a beta with HD support here within the next two months, At which point i'll be switching to Mupen64Plus. But i don't want to leave your RSP behind.

I love accurate emulation, But i also love all the bells in whistles. And your plugin gives both of that!
Reply With Quote
  #977  
Old 29th September 2014, 02:44 PM
HatCat's Avatar
HatCat HatCat is offline
Alpha Tester
Project Supporter
Senior Member
 
Join Date: Feb 2007
Location: In my hat.
Posts: 16,236
Default

Not really a logical reason for it. Mupen64Plus is really just Mupen64 (which already is supported), with a few rebellious changes. Primarily, its own API.

You have mudlord to thank for the fact that this plugin was already ported to Mupen64Plus by ecsv, so there is no cause for concern. You can just use this plugin there as well.
Reply With Quote
  #978  
Old 29th September 2014, 03:03 PM
V1del V1del is offline
Project Supporter
Senior Member
 
Join Date: Feb 2012
Posts: 442
Default

@Hellbringer, its called cxd4 rsp there, obtainable here https://github.com/mupen64plus/mupen64plus-rsp-cxd4 or if you need a windows build: https://bitbucket.org/ecsv/mupen64pl...9302?at=master (or download m64py, for a gui and pretty recent builds of all plugins)

That said, as far as I know the beta will have builds for all emulators already so there'd be no real necessity to switch
Reply With Quote
  #979  
Old 1st October 2014, 02:47 AM
theboy181's Avatar
theboy181 theboy181 is offline
Alpha Tester
Project Supporter
Senior Member
 
Join Date: Aug 2014
Location: Prince Rupert,British Columbia Canada
Posts: 424
Default

Are we closing in on a release soon? From all the reading it sounds like you have made some improvements. Care to share?
Reply With Quote
  #980  
Old 1st October 2014, 06:11 AM
RPGMaster's Avatar
RPGMaster RPGMaster is offline
Alpha Tester
Project Supporter
Super Moderator
 
Join Date: Dec 2013
Posts: 2,008
Default

Quote:
Originally Posted by theboy181 View Post
Are we closing in on a release soon? From all the reading it sounds like you have made some improvements. Care to share?
HatCat's too busy with OpenGL, side projects, anime. Dunno when he will finish.

You seem rather eager for an update . What do you want to see an improvement in?


HatCat, I've noticed for certain games, using RSP recompiler, I get these message box errors saying "Dp reserved command". I know the RSP is at fault, but do you know the cause? It would be nice if I was able to improve game compatibility for recompiler.

Last edited by RPGMaster; 1st October 2014 at 09:50 AM.
Reply With Quote
Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Forum Jump


All times are GMT. The time now is 12:13 AM.


Powered by vBulletin® Version 3.7.3
Copyright ©2000 - 2023, Jelsoft Enterprises Ltd.