|
#971
|
||||
|
||||
![]()
Sorry my post was confusing, i was in a rush
![]() You don't need a savestate, because it's the intro. I've been pretty busy myself. Good practice and it's been fun though. I guess I'll just have to experiment with pj64 2.1, for WDC. So far I've tried a few different commits and went back to one made in september 27 2013. It's still bugged in that one, but the one in this thread http://forum.pj64-emu.com/showthread.php?t=3959 appears to work. |
#972
|
||||
|
||||
![]()
Sounds like it may be a twin sister bug to the 9-11-2013 introduced regression angrylion saw.
I'll look into it.
__________________
http://theoatmeal.com/comics/cat_vs_internet |
#973
|
||||
|
||||
![]()
Rofl, I wanted to practice analyzing algorithms and learning how to find flaws. I kinda gave up half way, with VCH. I probably would have figured it out by now, if I was more aware of MSVC's debugger bugs ;/ . Freakin annoying how I had to force each variable to be global or static, otherwise the debugger wouldn't show the correct values for the arrays. That wasted quite a bit of my time. Then I tried writing a test program, but somehow failed with that, so I went back and just recorded the data. I need to take a break from this and watch some anime
![]() The main problem is le[]. You'll need to fix that algorithm. here's what I recorded. Code:
ST [0x0] 0xfd34 short [0x1] 0x0eaa short [0x2] 0x7669 short [0x3] 0x0a97 short [0x4] 0xb2a0 short [0x5] 0x0a99 short [0x6] 0xfd20 short [0x7] 0x0eaa short VS [0x0] 0x0000 short [0x1] 0x1810 short [0x2] 0x8000 short [0x3] 0x0640 short [0x4] 0x0000 short [0x5] 0x0190 short [0x6] 0xe264 short [0x7] 0x955c short VC [0x0] 0x0001 short [0x1] 0x0001 short [0x2] 0xffff short [0x3] 0x0001 short [0x4] 0x0001 short [0x5] 0x0001 short [0x6] 0xffff short [0x7] 0xffff short sn [0x0] 0x0000 short [0x1] 0x0000 short [0x2] 0x0001 short [0x3] 0x0000 short [0x4] 0x0000 short [0x5] 0x0000 short [0x6] 0x0001 short [0x7] 0x0001 short Proper results for comp [0x0] 0x0000 short [0x1] 0x0000 short [0x2] 0x0001 short [0x3] 0x0000 short [0x4] 0x0000 short [0x5] 0x0000 short [0x6] 0x0001 short [0x7] 0x0001 short Improper results for comp [0x0] 0x0000 short [0x1] 0x0000 short [0x2] 0x0000 short [0x3] 0x0000 short [0x4] 0x0000 short [0x5] 0x0000 short [0x6] 0x0001 short [0x7] 0x0001 short ![]() LLE can be fast if done right. I can always expand and keep adding more optimizations. I don't even need HLE audio anymore ![]() |
#974
|
||||
|
||||
![]()
It was indeed a twin sister regression.
Just like the other one found by angrylion, it's a corner-case bug. On 2's compliment Intel x86 given a 16-bit register x, -x is ~x + 1, so -32768 is actually ~(0x8000) + 1 = -32768 itself. An old, easy memory to be distracted from. Similar to the correction I pointed out in early cen64 rsp for the SSSE3 emulation of VABS. I can't promote the shorts to ints without breaking SSE2 optimizations and contradicting the purpose behind the commit that clipped triangles in Wrestlemania anyway, so I have a better, faster idea. Code:
#if (0) for (i = 0; i < N; i++) le[i] = sn[i] ? (VS[i] <= VC[i]) : (VC[i] < 0); for (i = 0; i < N; i++) ge[i] = sn[i] ? (VC[i] > 0x0000) : (VS[i] >= VC[i]); #elif (0) for (i = 0; i < N; i++) le[i] = sn[i] ? (VT[i] <= -VS[i]) : (VT[i] <= ~0x0000); for (i = 0; i < N; i++) ge[i] = sn[i] ? (~0x0000 >= VT[i]) : (VS[i] >= VT[i]); #else for (i = 0; i < N; i++) diff[i] = -VS[i] | -(sn[i] ^ 1); for (i = 0; i < N; i++) le[i] = (VT[i] <= diff[i]); for (i = 0; i < N; i++) diff[i] = +VS[i] | -(sn[i] ^ 0); for (i = 0; i < N; i++) ge[i] = (diff[i] >= VT[i]); #endif Instead, I'll do this now. Code:
... #else for (i = 0; i < N; i++) diff[i] = sn[i] | VS[i]; for (i = 0; i < N; i++) ge[i] = (diff[i] >= VT[i]); for (i = 0; i < N; i++) sn[i] = (unsigned)(sn[i]) >> 15; /* ~0 to 1, 0 to 0 */ for (i = 0; i < N; i++) diff[i] = VC[i] - VS[i]; for (i = 0; i < N; i++) diff[i] = (diff[i] >= 0); for (i = 0; i < N; i++) le[i] = (VT[i] < 0); merge(le, sn, diff, le); #endif The `merge' function, on the other hand, is probably the only negative compensation, although outweighed by the rest of the algorithm rewrites and greater positive compensation. No function call here, just a maintainable method to statically execute a ternary ? conditional : statement without any branch prediction. Code:
static INLINE void merge(short* VD, short* cmp, short* pass, short* fail) { register int i; #if (0) /* Do not use this version yet, as it still does not vectorize to SSE2. */ for (i = 0; i < N; i++) VD[i] = (cmp[i] != 0) ? pass[i] : fail[i]; #else short diff[N]; for (i = 0; i < N; i++) diff[i] = pass[i] - fail[i]; for (i = 0; i < N; i++) VD[i] = fail[i] + cmp[i]*diff[i]; /* actually `(cmp[i] != 0)*diff[i]` */ #endif return; } Code:
_VCH: pushl %ebp movl %esp, %ebp pushl %ebx andl $-16, %esp subl $144, %esp movl 8(%ebp), %eax movl 12(%ebp), %edx movl 16(%ebp), %ebx movl 20(%ebp), %ecx sall $4, %ebx movdqu _VR(%ebx), %xmm4 sall $4, %ecx movdqu _smask(%ecx), %xmm0 pshufb %xmm0, %xmm4 sall $4, %edx movdqu _VR(%edx), %xmm2 movdqa %xmm2, %xmm5 pxor %xmm4, %xmm5 psraw $15, %xmm5 movdqa %xmm5, %xmm6 pxor %xmm4, %xmm6 movdqa LC0, %xmm3 movdqa %xmm2, %xmm0 pcmpeqw %xmm6, %xmm0 pand %xmm3, %xmm0 pand %xmm5, %xmm0 movdqa %xmm0, _vce psubw %xmm5, %xmm6 movdqa %xmm2, %xmm1 pcmpeqw %xmm6, %xmm1 pand %xmm3, %xmm1 por %xmm0, %xmm1 movdqa %xmm2, %xmm0 por %xmm5, %xmm0 movdqa %xmm4, %xmm7 pcmpgtw %xmm0, %xmm4 pandn %xmm3, %xmm4 psraw $15, %xmm5 movdqa %xmm6, %xmm0 psubw %xmm2, %xmm0 movdqa %xmm0, (%esp) pxor %xmm0, %xmm0 pcmpgtw (%esp), %xmm0 pandn %xmm3, %xmm0 psrlw $15, %xmm7 psubw %xmm7, %xmm0 pmullw %xmm5, %xmm0 paddw %xmm7, %xmm0 movdqa %xmm0, %xmm7 psubw %xmm4, %xmm7 pmullw %xmm5, %xmm7 paddw %xmm4, %xmm7 psubw %xmm2, %xmm6 pmullw %xmm7, %xmm6 paddw %xmm2, %xmm6 movdqa %xmm6, _VACC+32 sall $4, %eax movdqa %xmm6, _VR(%eax) movdqa %xmm4, _clip movdqa %xmm0, _comp pxor %xmm3, %xmm1 movdqa %xmm1, _ne movdqa %xmm5, _co movl -4(%ebp), %ebx leave ret
__________________
http://theoatmeal.com/comics/cat_vs_internet Last edited by HatCat; 24th September 2014 at 07:26 PM. |
#975
|
||||
|
||||
![]()
Wow nice! It prolly woulda took me hours to come up with a good method. I guess I should worry about those kind of things later. That will be a fun thing to do in the future. Lol that's great how the output is a bit shorter than the original one too
![]() |
#976
|
|||
|
|||
![]()
Quick question Hatcat, Do you ever intend on supported Mupen64Plus? I ask because Gonetz is going to be releasing a beta with HD support here within the next two months, At which point i'll be switching to Mupen64Plus. But i don't want to leave your RSP behind.
I love accurate emulation, But i also love all the bells in whistles. And your plugin gives both of that! |
#977
|
||||
|
||||
![]()
Not really a logical reason for it. Mupen64Plus is really just Mupen64 (which already is supported), with a few rebellious changes. Primarily, its own API.
You have mudlord to thank for the fact that this plugin was already ported to Mupen64Plus by ecsv, so there is no cause for concern. You can just use this plugin there as well.
__________________
http://theoatmeal.com/comics/cat_vs_internet |
#978
|
|||
|
|||
![]()
@Hellbringer, its called cxd4 rsp there, obtainable here https://github.com/mupen64plus/mupen64plus-rsp-cxd4 or if you need a windows build: https://bitbucket.org/ecsv/mupen64pl...9302?at=master (or download m64py, for a gui and pretty recent builds of all plugins)
That said, as far as I know the beta will have builds for all emulators already so there'd be no real necessity to switch |
#979
|
||||
|
||||
![]()
Are we closing in on a release soon? From all the reading it sounds like you have made some improvements. Care to share?
|
#980
|
||||
|
||||
![]() Quote:
You seem rather eager for an update ![]() HatCat, I've noticed for certain games, using RSP recompiler, I get these message box errors saying "Dp reserved command". I know the RSP is at fault, but do you know the cause? It would be nice if I was able to improve game compatibility for recompiler. Last edited by RPGMaster; 1st October 2014 at 09:50 AM. |