|
#391
|
||||
|
||||
![]() Quote:
![]() |
#392
|
||||
|
||||
![]()
Maybe, maybe not.
Usually in Mupen64 0.5.1 the regular interpreter works more stable than the pure interpreter. I don't know if that's a bug with RCP feedback in LLE from the CPU due to wrong informations sent from maybe the regular interpreter, causing falsely stable games. But otherwise, several ROMs (like Mario64, Action Replay Pro) won't even boot on the "Pure Interpreter" mode (or in Mario's case, get past the logo), only the regular "Interpreter" mode or sometimes of course, the dynarec (but not for Action Replay Pro). These things may have been the sort fixed at some point during Mupen64Plus dev but I am not really interested in Mupen64Plus. ShadowPrince's port of Hacktarux's original emulator has always been sufficient for my needs.
__________________
http://theoatmeal.com/comics/cat_vs_internet |
#393
|
||||
|
||||
![]()
Send you some motivation to speed up bug fixing.
__________________
--------------------- CPU: Intel U7300 1.3 GHz GPU: Mobile Intel 4 Series (on board) AUDIO: Realtek HD Audio (on board) RAM: 4 GB OS: Windows 7 - 32 bit |
#394
|
||||
|
||||
![]()
This accumulation was helpful, although unnecessary.
I actually got tired of infinitely guessing the problem, that I decided to watch some shit. Not the kind that you linked, but I took a break from bug-fixing and this RSP emulator, and watched all 66 episodes of the animated Teen Titans. ![]() It only lasted 3 maybe 4 days of watching them all, but I guess I am ready to go back to work now. :P
__________________
http://theoatmeal.com/comics/cat_vs_internet |
#395
|
||||
|
||||
![]()
Just what I was afraid of at first!
I had a bug in my debugger. I was so amazed at how lucky I felt for writing MM's file I/O debugger idea working on the very first try (had to guess a lot of things related to argument vector endianness in between imports/exports) that I forgot that it wouldn't necessarily be perfect. No, still haven't committed anything to Git because I hate this stupid debugger and I want this to be a one-time deal. Anyway, Super Mario 64 had these results for the very first sync fail: Code:
task number: 1 PC offset: 0x7D4 count: 70 SR[23] = 0xFFFFFEB8 wrong : 0x00227000 I picked this with HLE audio so audio ucodes wouldn't get in the way. Mario64 immediately at the very start exploited my bug in this plugin so that is why I picked it. It means, in order of listing:
So tomorrow (EDIT, today when I wake up :P) I can jump in my RSP disassembler for my current unstable beta tomorrow and exit it 70 times (70 total instructions executed) and compare all the results and find out which instruction just got executed to raise the bug. Also I am no longer using prototype of zilmar's 1.4 RSP plugin for the stable comparison base; I got my newer RSP public release 4 source code from this very thread plugged in instead to guarantee accurate RSP comparison. ;P
__________________
http://theoatmeal.com/comics/cat_vs_internet Last edited by HatCat; 2nd September 2013 at 07:18 AM. |
#396
|
||||
|
||||
![]()
Caught the bug finally.
I find this hard to believe even now, but it really was a bug with LBU/LHU. Only I find it hard to believe because I could have sworn the new zero-extension algorithm wasn't added until after I already started fixing pre-existing bugs. Anyway, to be more specific: Code:
... 1FC 304200FE ANDI $2, $2, 0x00FE 200 84420076 LH $2, 0x076($2) 204 00400008 JR $2 208 9361FFFF LBU $at, -1($27) # DMEM[0x6B4]: 0x80 288 9361FFFB LBU $at, -5($27) # DMEM[0x6B0]: 0x06 ... The difference was that the latest release in this thread wrote it out to the assembler temporary as 0x00000080, whereas my unstable beta was writing it as 0x00000000. Reason why: The zero-extension macro I wrote zero-extended it by the little-endian bit number, using this purposefully slower and software-forced mechanism: Code:
#define ZE(x, b) (-(x & (0 << b)) | (x & ~(~0 << b))) Instead, it took this value and flushed it accidentally. Rather than directly fix it I just changed to use type conversion instead of the ZE macro, though I rewrote it anyway. Code:
#define ZE(x, b) (+(x & (1 << b)) | (x & ~(~0 << b))) ![]() Anyway, all the games are booting without issues now. It appears I'm back to the way I was. I just need to work out how I am going to test speeds now with MarathonMan's advice.
__________________
http://theoatmeal.com/comics/cat_vs_internet Last edited by HatCat; 3rd September 2013 at 02:09 AM. |
#397
|
||||
|
||||
![]() Quote:
I actually did the opposite (kind of) and didn't sign-extend LH and it took me a long time to find. ![]() |
#398
|
||||
|
||||
![]()
Current speed tests are the same as they were when I posted them for Conker's.
Code:
new: 57-65 VI/s old: 58-66 VI/s It's still just 1 VI/s slower, all the time, consistently it appears. I just need to switch some shit back on, and I am pretty confident it will be faster than the old version. (Maybe not the LH/LW/SH/SW if/else-if trees for trying to write multiple bytes at once though...this branch tree checking seems to make it an extra VI/s slower rather than faster unfortunately.) Quote:
I always knew to sign them, but I had an earlier bug where I assumed that, just because for LSV, LLV, LDV etc., the offset used to compute the address is multiplied by * 2, * 4 or * 8? Well I falsely assumed you're supposed to follow the same pattern for LH/LW. ![]() I'm still amazed my bug was with zero-extension though. What a simple thing. All you do is & 0xFF or & 0xFFFF. I added that ZE() macro while I was already searching for the bug, so it seemed impossible to me that it had anything to do with that. ![]() Proof speaks otherwise though. :/
__________________
http://theoatmeal.com/comics/cat_vs_internet |
#399
|
||||
|
||||
![]()
More speed gains along the way!
In the meantime, I have made a discovery. The union-indexing service for decoding the 6-bit MIPS opcode: Code:
EX_SCALAR[inst.J.op][inst.W>>sub_op_table[inst.J.op] & 077](); After all, if you use a union instead of trying to hardcode all the bit masks and shifts yourself, compiler intrinsics can figure out shortcuts for you right? Well, even though the answer is yes, it seems that factor may sometimes be outweighed. Code:
EX_SCALAR[inst.W >> 26][inst.W>>sub_op_table[inst.W >> 26] & 077](); [(inst.W >> 26) is the same value as (inst.R.op, inst.I.op, inst.J.op).] It is fortunate that I tried this experiment. The second method, while a little bit more arrogantly written (rather than letting the compiler *virtually* do the shift by 26 for me), trims a little bit of excess fetching off the end computation. AT&T x86 code output for the faster, second way to write it: Code:
L1055: movl $0, _SR movl _inst, %eax movl %eax, %edx shrl $26, %edx movl _sub_op_table(,%edx,4), %ecx shrl %cl, %eax andl $63, %eax sall $6, %edx addl %edx, %eax call *_EX_SCALAR(,%eax,4) Code:
L1055: movl $0, _SR movb _inst+3, %al shrb $2, %al movzbl %al, %edx movl _sub_op_table(,%edx,4), %ecx movl _inst, %eax shrl %cl, %eax andl $63, %eax sall $6, %edx addl %edx, %eax call *_EX_SCALAR(,%eax,4) In key, the latter output block:
Code:
void LS_Group_I(int direction, int length) { /* Group I vector loads and stores, as defined in SGI's patent. */ register unsigned long addr; register int i; register int e = (inst.R.sa >> 1) & 0xF; const signed int offset = -(inst.SW & 0x00000040) | inst.R.func; addr = (SR[inst.R.rs] + length*offset); if (direction == 0) /* "Load %s to Vector Unit" */ for (i = 0; i < length; i++) VR_B(inst.R.rt, (e + i) | 0x0) = RSP.DMEM[BES(addr + i) & 0xFFF]; else /* "Store %s from Vector Unit" */ for (i = 0; i < length; i++) RSP.DMEM[BES(addr + i) & 0xFFF] = VR_B(inst.R.rt, (e + i) & 0xF); return; } Code:
void LS_Group_I(int direction, int length) { /* Group I vector loads and stores, as defined in SGI's patent. */ register unsigned long addr; register int i; register int e = (inst.R.sa >> 1) & 0xF; const signed int offset = SE(inst.SW, 6); addr = (SR[inst.R.rs] + length*offset); if (direction == 0) /* "Load %s to Vector Unit" */ for (i = 0; i < length; i++) VR_B(inst.R.rt, (e + i) | 0x0) = RSP.DMEM[BES(addr + i) & 0xFFF]; else /* "Store %s from Vector Unit" */ for (i = 0; i < length; i++) RSP.DMEM[BES(addr + i) & 0xFFF] = VR_B(inst.R.rt, (e + i) & 0xF); return; } Code:
#define SE(x, b) (-(x & (1 << b)) | (x & ~(~0 << b))) #define ZE(x, b) (+(x & (1 << b)) | (x & ~(~0 << b)))
__________________
http://theoatmeal.com/comics/cat_vs_internet |
#400
|
||||
|
||||
![]()
Oh yeah, I forgot.
I also made a commit to Git for "fixing" the SP_PC_REG. This isn't something I would expect any games to exploit, but if they do exploit it, it will surely break zilmar's RSP emulator. Actual IMEM tracking for maintaining the instruction-fetching is moved away from SP_PC_REG and uses a local register instead. Instead, there is some evidence suggesting that SP_PC_REG = 0x04001000 | (PC offset), where 0x04001000 is the base address for SP IMEM start. So if a game exploited the RCP memory map by using the CPU-read mode on SP_PC_REG, outside of the RSP emulator and in the main Project64 CPU core, then it could break the current Project64 RSP emulator. No speed cost behind adding this, just compensating for the small rewrite to maintain the same speed.
__________________
http://theoatmeal.com/comics/cat_vs_internet |