Go Back   Project64 Forums > General Discussion > Open Discussion

Reply
 
Thread Tools Display Modes
  #371  
Old 27th August 2013, 10:22 PM
MarathonMan's Avatar
MarathonMan MarathonMan is offline
Alpha Tester
Project Supporter
Senior Member
 
Join Date: Jan 2013
Posts: 454
Default

Quote:
Originally Posted by BatCat View Post
Code:
In file included from ../../rsp/rsp.c:7:0:
../../rsp/rsp.h: In function `trace_RSP_registers`:
../../rsp/rsp.h:230:13: warning: format `%hX1 expects a machine `int` argument
[-Wformat]
../../rsp/rsp.h:236:13: warning: format `%hX` expects a machine `int` argument
[-Wformat]
Code:
    for (i = 0; i < 10; i++)
        fprintf(
            out,
           " $v%i:  [%04hX][%04hX][%04hX][%04hX][%04hX][%04hX][%04hX][%04hX]\n",
            VR[i][00], VR[i][01], VR[i][02], VR[i][03],
            VR[i][04], VR[i][05], VR[i][06], VR[i][07]);
    for (i = 10; i < 32; i++) /* decimals "10" and higher with two characters */
        fprintf(
            out,
            "$v%i:  [%04hX][%04hX][%04hX][%04hX][%04hX][%04hX][%04hX][%04hX]\n",
            VR[i][00], VR[i][01], VR[i][02], VR[i][03],
            VR[i][04], VR[i][05], VR[i][06], VR[i][07]);
lolnut

VR is array of short arrays, yet it tells me now it expects type `int`.
Didn't you forgot to specify "i" in the argument list to printf? The first match in the format string is %i.

Also, I would recommend taking advantage of format strings; you can fix the length of the number using the format string instead of writing two loops.

Last edited by MarathonMan; 27th August 2013 at 10:25 PM.
Reply With Quote
  #372  
Old 27th August 2013, 10:24 PM
MarathonMan's Avatar
MarathonMan MarathonMan is offline
Alpha Tester
Project Supporter
Senior Member
 
Join Date: Jan 2013
Posts: 454
Default

Quote:
Originally Posted by BatCat View Post
Now it adds 0x008 which is the actual, documented and correct algorithm, at a speed cost of constantly checking the branch scheduler frame every time in the CPU loop.
This isn't what the actual processor does... the documentation lies . It is what "appears" to happen, though.

Quote:
Originally Posted by BatCat View Post
Maybe you are right though...perhaps it shouldn't have had that drastic of an effect.
I am not sure there.
Good... I don't feel bad then! XD
Reply With Quote
  #373  
Old 27th August 2013, 10:39 PM
MarathonMan's Avatar
MarathonMan MarathonMan is offline
Alpha Tester
Project Supporter
Senior Member
 
Join Date: Jan 2013
Posts: 454
Default

By the way, I found a new way to decode RSP instructions. I haven't merged the code into my own RSP decoder yet, but I have made the changes to my VR4300 decoder successfully.

My VR4300 decoder is now completely branchless, and only requires 7 arrays of 64 pointers each (total cache overhead of 3584 bytes on x86_64, or 1792 bytes on IA-32). Moreover, the code is stupid short and is made up of zero macro-ops:

Code:
0000000000000000 <VR4300DecodeInstruction>:
   0:	89 f8                	mov    %edi,%eax
   2:	c1 e8 1a             	shr    $0x1a,%eax
   5:	48 c1 e0 04          	shl    $0x4,%rax
   9:	8b 88 00 00 00 00    	mov    0x0(%rax),%ecx
   f:	d3 ef                	shr    %cl,%edi
  11:	23 b8 00 00 00 00    	and    0x0(%rax),%edi
  17:	48 8b 80 00 00 00 00 	mov    0x0(%rax),%rax
  1e:	48 8d 04 f8          	lea    (%rax,%rdi,8),%rax
  22:	c3                   	retq
The trick was to do something similar to your 2-dimensional array, but with less overhead. I could explain it, but the code is straightforward. Here: https://github.com/tj90241/cen64-vr4...Decoder.c#L249

Last edited by MarathonMan; 27th August 2013 at 10:43 PM.
Reply With Quote
  #374  
Old 27th August 2013, 10:41 PM
HatCat's Avatar
HatCat HatCat is offline
Alpha Tester
Project Supporter
Senior Member
 
Join Date: Feb 2007
Location: In my hat.
Posts: 16,236
Default

Quote:
Originally Posted by MarathonMan View Post
Didn't you forgot to specify "i" in the argument list to printf? The first match in the format string is %i.
yep.navi

Also, for some reason, when I added -Wextra, pedantic, and ansi, those warnings went away.

So maybe it's more catchy sometimes when I only use -Wall, without those other three.

Anyway yeah, just need to fill in the count register then that's fixed.

Quote:
Originally Posted by MarathonMan View Post
Also, I would recommend taking advantage of format strings; you can fix the length of the number using the format string instead of writing two loops.
How is this arranged?

Like, "%(space)2X" or something? printf strings confuse me.
Basically I just want it looking like this:
Code:
...
 $v8:
 $v9:
$v10:
$v11:
...
Quote:
Originally Posted by MarathonMan View Post
This isn't what the actual processor does... the documentation lies . It is what "appears" to happen, though.
How are you sure of this?

So it really does add 0x008 to the link/return-address register specified (usually $ra) and not 0x004 then?
Because all the manuals say that return_addr = current_PC + 8, not 4.

That also was the way zilmar did it in his RSP for which he was reversing.
Reply With Quote
  #375  
Old 27th August 2013, 11:36 PM
HatCat's Avatar
HatCat HatCat is offline
Alpha Tester
Project Supporter
Senior Member
 
Join Date: Feb 2007
Location: In my hat.
Posts: 16,236
Default

Quote:
Originally Posted by MarathonMan View Post
By the way, I found a new way to decode RSP instructions. I haven't merged the code into my own RSP decoder yet, but I have made the changes to my VR4300 decoder successfully.

My VR4300 decoder is now completely branchless, and only requires 7 arrays of 64 pointers each (total cache overhead of 3584 bytes on x86_64, or 1792 bytes on IA-32). Moreover, the code is stupid short and is made up of zero macro-ops:

Code:
0000000000000000 <VR4300DecodeInstruction>:
   0:	89 f8                	mov    %edi,%eax
   2:	c1 e8 1a             	shr    $0x1a,%eax
   5:	48 c1 e0 04          	shl    $0x4,%rax
   9:	8b 88 00 00 00 00    	mov    0x0(%rax),%ecx
   f:	d3 ef                	shr    %cl,%edi
  11:	23 b8 00 00 00 00    	and    0x0(%rax),%edi
  17:	48 8b 80 00 00 00 00 	mov    0x0(%rax),%rax
  1e:	48 8d 04 f8          	lea    (%rax,%rdi,8),%rax
  22:	c3                   	retq
The trick was to do something similar to your 2-dimensional array, but with less overhead. I could explain it, but the code is straightforward. Here: https://github.com/tj90241/cen64-vr4...Decoder.c#L249
I don't use a struct currently since my mask is always mod by 64, since the array is a fixed size of 64x64. The only thing I need to know is the shift amount, which one piece of data is not worth creating a whole struct over.

So things with structs as parameters kind of make it hard for me to read.
It looks about the same basic concept as before so I'm not really sure what's different.

Always worth arguing over once I isolate this last bug first though.
I'm running RSP in HLE gfx and LLE audio now, Banjo-Tooie and Zelda/etc. with other RSP LLE types function perfectly and audio is mostly perfect, even more-so in MusyX games like Twine 007 ... damn, what kind of opcode would cause this particularity of audio issues in standard ABI...
Reply With Quote
  #376  
Old 27th August 2013, 11:57 PM
MarathonMan's Avatar
MarathonMan MarathonMan is offline
Alpha Tester
Project Supporter
Senior Member
 
Join Date: Jan 2013
Posts: 454
Default

Quote:
Originally Posted by BatCat View Post
yep.navi
HEY, LISTEN. God that got so annoying sometimes.

Quote:
Originally Posted by BatCat View Post
How is this arranged?

Like, "%(space)2X" or something? printf strings confuse me.
Basically I just want it looking like this:

Code:
...
 $v8:
 $v9:
$v10:
$v11:
...
Whoops. If you want the $ to be mashed against the vX, I'm not certain that it's possible without some extra logic. You would have to do either `$v<space><single_digit>` or `$v<two_digits>`.

Quote:
How are you sure of this?

So it really does add 0x008 to the link/return-address register specified (usually $ra) and not 0x004 then?
Because all the manuals say that return_addr = current_PC + 8, not 4.

That also was the way zilmar did it in his RSP for which he was reversing.
Let's just say I have some insider details . I actually have to make certain in my notes that this is so, because if memory serves the VR4300 does things differently than the RSP and every other processor on the planet, and I could be mixing them up. zilmar nor anyone else would have been able to figure out what the actual silicon does as the high-level effects are the same.

Quote:
Originally Posted by BatCat View Post
So things with structs as parameters kind of make it hard for me to read.
It looks about the same basic concept as before so I'm not really sure what's different.
Primarily much smaller overhead. I only have 7 arrays.

Last edited by MarathonMan; 28th August 2013 at 12:02 AM.
Reply With Quote
  #377  
Old 28th August 2013, 12:43 AM
HatCat's Avatar
HatCat HatCat is offline
Alpha Tester
Project Supporter
Senior Member
 
Join Date: Feb 2007
Location: In my hat.
Posts: 16,236
Default

Quote:
Originally Posted by MarathonMan View Post
HEY, LISTEN. God that got so annoying sometimes.


*HEAH!*
*HEEAAH!!*
*HEEAAAHH!!!*

Come on, nao!
Who wants to play dat shit?

Quote:
Originally Posted by MarathonMan View Post
Whoops. If you want the $ to be mashed against the vX, I'm not certain that it's possible without some extra logic. You would have to do either `$v<space><single_digit>` or `$v<two_digits>`.
Can't accept that, because "$v 9" is not a valid RSP assembly language token.
AFAIK the '$', 'v', and '9' must all be contiguous.
But maybe I could settle for "$v9 :" instead of " $v9:",
I don't really care because I'm going to have this code commented out by macros in the stable build anyway.
The debug build is just something you have to compile to make it.
So I'm not interested in optimizing that so much.

I only changed it to shut those warnings up primarily.

Quote:
Originally Posted by MarathonMan View Post
Let's just say I have some insider details . I actually have to make certain in my notes that this is so, because if memory serves the VR4300 does things differently than the RSP and every other processor on the planet, and I could be mixing them up. zilmar nor anyone else would have been able to figure out what the actual silicon does as the high-level effects are the same.


heh, but seriously that does sound kind of cool, but nothing I need to be too concerned about atm I guess =D

Quote:
Originally Posted by MarathonMan View Post
Primarily much smaller overhead. I only have 7 arrays.
Technically I only have one array; it's just too damned big.
But there are 8 opcode matrices on the RSP, so idk.
Reply With Quote
  #378  
Old 28th August 2013, 01:22 AM
MarathonMan's Avatar
MarathonMan MarathonMan is offline
Alpha Tester
Project Supporter
Senior Member
 
Join Date: Jan 2013
Posts: 454
Default

Quote:
Originally Posted by BatCat View Post


*HEAH!*
*HEEAAH!!*
*HEEAAAHH!!!*

Come on, nao!
Who wants to play dat shit?


Imma let you finish, but...

Last edited by MarathonMan; 28th August 2013 at 01:26 AM.
Reply With Quote
  #379  
Old 28th August 2013, 06:42 AM
HatCat's Avatar
HatCat HatCat is offline
Alpha Tester
Project Supporter
Senior Member
 
Join Date: Feb 2007
Location: In my hat.
Posts: 16,236
Default

Quote:
Originally Posted by MarathonMan View Post
Regarding debugging, I would recommend you do exactly what I did:

Take your working plugin and make a static FILE *. Initialize it with fopen("rsp.dump", "w"), and, after every instruction, do fwrite(VR, 16, 32, file) and fwrite(SR, 4, 32, file). Then in your NEW plugin, fread both back from the file after every instruction and assert out if they're not equal. Use a debugger to determine the faulting instruction and reason of occurrence - it makes it super easy. I usually can't do such things, though, because I'm in essence bootstrapping my own plugins.
I still haven't found that last bug yet.

I tried what you said, but the CPU core seems to be behaving even more unpredictably than my RSP emulator its self. (Or at least Project64; I did try to get the messages to pop on MUPEN just as well.)

I fwrite the SR and VR register files for every single instruction, to its own file named after the current task number (starting at 0).
So "rcp_0000.bin" is the log of all the SR and VR values before the very first BREAK instruction.
"rcp_0001.bin" is the log of all the SR and VR values before the next BREAK encounter, etc..

Unfortunately the results seem unpredictable.
One time I load on Project64 with a (very much) older, stable version of the RSP emulator (albeit from years back, when it was still mostly zilmar's code), the very first SP task (rcp_xxxx number) to report a mismatch between the register storage results of my current RSP beta, and this older, stable version, kept randomizing between 22, to 24, to 33, to 54 ... ??? register results seemed unreliable and unstable due to possibly CPU intervention.

It takes way too long into the RSP emulation thread for a mismatch to pop-up, to know whether or not it is the bug I am trying to find, or some stupid unreliability about the Project64 1.6 CPU interpreter I'm using for Super Mario 64, Resident Evil 2, (Here it starts at SP task #2, #3, or sometimes later, but it is still random and inconsistent.) etc..

If you ask me, it must be some really, really narrow corner case for it to take that long into a commercial game to be exploited.

And the more I am thinking about this, the more I'm beginning to wonder if it IS a bug in one of my vector ops, even though I didn't touch them one bit during the 2-D jump table rewrite, except for that ANSI compliance stuff you got me to try.. That's just how desperately I'm starting to think about this because I have looked over all the scalar ops...the ADDs/SUBs, the logical/arithmetic immediates, all the branches and jumps, and the load/store words, like a hundred times over and see nothing wrong XD.
This is frustrating.
Reply With Quote
  #380  
Old 28th August 2013, 12:55 PM
MarathonMan's Avatar
MarathonMan MarathonMan is offline
Alpha Tester
Project Supporter
Senior Member
 
Join Date: Jan 2013
Posts: 454
Default

Quote:
Originally Posted by BatCat View Post
Unfortunately the results seem unpredictable.
That is... odd. Obviously, I can't do it anymore with my simulator as the interrupts and firing at different times on the VR4300 and whatnot, but this is the technique I used almost singlehand-idly to bash out any bugs when I was working on the SSE additions to your plugin. And I never had an issue of things being unpredictable like that. If it helps, I was using the latest version of mupen64plus.

Quote:
Originally Posted by BatCat View Post
And the more I am thinking about this, the more I'm beginning to wonder if it IS a bug in one of my vector ops, even though I didn't touch them one bit during the 2-D jump table rewrite, except for that ANSI compliance stuff you got me to try.. That's just how desperately I'm starting to think about this because I have looked over all the scalar ops...the ADDs/SUBs, the logical/arithmetic immediates, all the branches and jumps, and the load/store words, like a hundred times over and see nothing wrong XD.
This is frustrating.
Now you feel my pain.

Remember that it could also be things like the branch delay slots, too, since you changed the operation of those. Many times in my simulator all my instructions have been flawless, but there's been some corner-case with the control flow logic that was just a little off.
Reply With Quote
Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Forum Jump


All times are GMT. The time now is 01:20 AM.


Powered by vBulletin® Version 3.7.3
Copyright ©2000 - 2023, Jelsoft Enterprises Ltd.