Go Back   Project64 Forums > General Discussion > Open Discussion

Reply
 
Thread Tools Display Modes
  #351  
Old 23rd August 2013, 06:50 PM
HatCat's Avatar
HatCat HatCat is offline
Alpha Tester
Project Supporter
Senior Member
 
Join Date: Feb 2007
Location: In my hat.
Posts: 16,236
Default

I'm not sure yet that it converted the functions into the 2-D jump table as it should.
The section labels are there using the names, but ".cfi_startproc" makes me think it is keeping them all to their own function.
Code:
LFE32:
	.p2align 2,,3
	.def	_SLL;	.scl	3;	.type	32;	.endef
_SLL:
LFB35:
	.cfi_startproc
	movb	_inst+1, %al
	shrb	$3, %al
	movzbl	%al, %eax
	movb	_inst+2, %dl
	andl	$31, %edx
	movl	_inst, %ecx
	shrl	$6, %ecx
	movl	_SR(,%edx,4), %edx
	sall	%cl, %edx
	movl	%edx, _SR(,%eax,4)
	ret
	.cfi_endproc
LFE35:
	.p2align 2,,3
	.def	_SRL;	.scl	3;	.type	32;	.endef
_SRL:
LFB36:
	.cfi_startproc
	movb	_inst+1, %al
	shrb	$3, %al
	movzbl	%al, %eax
	movb	_inst+2, %dl
	andl	$31, %edx
	movl	_inst, %ecx
	shrl	$6, %ecx
	movl	_SR(,%edx,4), %edx
	shrl	%cl, %edx
	movl	%edx, _SR(,%eax,4)
	ret
	.cfi_endproc
LFE36:
	.p2align 2,,3
	.def	_SRA;	.scl	3;	.type	32;	.endef
_SRA:
LFB37:
	.cfi_startproc
;# .........
I got it to compile because as I was reading your post I just thought of the idea, except, I didn't have to implement it exactly the way you exemplified.

Chiefly, because my jump table is fixated and consistent, always 64x64, so I did not need a structure like you did to include the AND-mask value, only the shift.

Code:
//in execute.h:
        EX_SCALAR[inst.J.op][(inst.W >> sub_op_table[inst.J.op]) & 077]();
//because, in su.h:
#define OFF_FUNCTION     0
#define OFF_SA           6
#define OFF_E            7
#define OFF_RD          11
#define OFF_RT          16
#define OFF_RS          21
#define OFF_OPCODE      26
const int sub_op_table[64] = {
    OFF_FUNCTION, /* SPECIAL */
    OFF_RT, /* REGIMM */
    OFF_OPCODE, /* J */
    OFF_OPCODE, /* JAL */
    OFF_OPCODE, /* BEQ */
    OFF_OPCODE, /* BNE */
    OFF_OPCODE, /* BLEZ */
    OFF_OPCODE, /* BGTZ */
    OFF_OPCODE, /* ADDI */
    OFF_OPCODE, /* ADDIU */
    OFF_OPCODE, /* SLTI */
    OFF_OPCODE, /* SLTIU */
    OFF_OPCODE, /* ANDI */
    OFF_OPCODE, /* ORI */
    OFF_OPCODE, /* XORI */
    OFF_OPCODE, /* LUI */
    OFF_RS, /* COP0 */
    OFF_RS,
    OFF_RS, /* COP2 */
    OFF_RS,
    OFF_OPCODE,
    OFF_OPCODE,
    OFF_OPCODE,
    OFF_OPCODE,
    OFF_OPCODE,
    OFF_OPCODE,
    OFF_OPCODE,
    OFF_OPCODE,
    OFF_OPCODE,
    OFF_OPCODE,
    OFF_OPCODE,
    OFF_OPCODE,
    OFF_OPCODE,
    OFF_OPCODE, /* LB */
    OFF_OPCODE, /* LH */
    OFF_OPCODE,
    OFF_OPCODE, /* LW */
    OFF_OPCODE, /* LBU */
    OFF_OPCODE, /* LHU */
    OFF_OPCODE,
    OFF_OPCODE,
    OFF_OPCODE, /* SB */
    OFF_OPCODE, /* SH */
    OFF_OPCODE,
    OFF_OPCODE, /* SW */
    OFF_OPCODE,
    OFF_OPCODE,
    OFF_OPCODE,
    OFF_OPCODE,
    OFF_OPCODE,
    OFF_RD, /* LWC2 */
    OFF_OPCODE,
    OFF_OPCODE,
    OFF_OPCODE,
    OFF_OPCODE,
    OFF_OPCODE,
    OFF_OPCODE,
    OFF_OPCODE,
    OFF_RD, /* SWC2 */
    OFF_OPCODE,
    OFF_OPCODE,
    OFF_OPCODE,
    OFF_OPCODE,
    OFF_OPCODE
};
I will go back later and change some of those.
Like LB, LH and LW for example, I can make versions of those functions where inst.R.rs is zero, so we save the time of decoding (SR[base] + offset) & 0xFFF, because we always know that SR[0] is fixed to 0.

In the meantime, wow lmao, the DLL size jumped up from 80 KB to 104.
zilmar's RSP emulator was always way too bulky and over-sized because of defficient build settings. When I modified his RSP emulator I got it down to as low as 34 KB (the one in 2.0 I think is like 70 something), but now it seems I have to go the opposite way and accept the fact that his DLL is always going to be smaller than mine from now on.

As for performance, did not even bother waiting to make this post first. I'm sure I screwed up at least 2 things, probably the branch delay slot for scalar jumps and the BREAK instruction exception handler outside the loop, and hopefully none of the actual RSP operation codes. I'm assuming the DLL is broken, but hopefully have it fixed today.

Last edited by HatCat; 23rd August 2013 at 06:58 PM.
Reply With Quote
  #352  
Old 23rd August 2013, 07:07 PM
MarathonMan's Avatar
MarathonMan MarathonMan is offline
Alpha Tester
Project Supporter
Senior Member
 
Join Date: Jan 2013
Posts: 454
Default

Quote:
Originally Posted by BatCat View Post
I'm not sure yet that it converted the functions into the 2-D jump table as it should.
The section labels are there using the names, but ".cfi_startproc" makes me think it is keeping them all to their own function.
This is good, not bad... this is what rewriting it should have done.

Now that each RSP instruction is contained within it's own C function, the interpreter will be able to dispatch functions with easy since it's a single, flat indirect just without any checks needed in regards to the bounds of the jump range or things like that.

Quote:
Originally Posted by BatCat View Post
In the meantime, wow lmao, the DLL size jumped up from 80 KB to 104.
zilmar's RSP emulator was always way too bulky and over-sized because of defficient build settings. When I modified his RSP emulator I got it down to as low as 34 KB (the one in 2.0 I think is like 70 something), but now it seems I have to go the opposite way and accept the fact that his DLL is always going to be smaller than mine from now on.
That size isn't a problem. I think my simulator is up to ~250kB active code and I don't have instruction cache thrashing issues yet. The RSP in PJ64 executes tasks in batches, so you have locality on your side anyways.

Quote:
Originally Posted by BatCat View Post
In the meantime, wow lmao, the DLL size jumped up As for performance, did not even bother waiting to make this post first. I'm sure I screwed up at least 2 things, probably the branch delay slot for scalar jumps and the BREAK instruction exception handler outside the loop, and hopefully none of the actual RSP operation codes. I'm assuming the DLL is broken, but hopefully have it fixed today.
Huh? So something broke? I'm confused, but anxious to see results.
Reply With Quote
  #353  
Old 23rd August 2013, 07:31 PM
HatCat's Avatar
HatCat HatCat is offline
Alpha Tester
Project Supporter
Senior Member
 
Join Date: Feb 2007
Location: In my hat.
Posts: 16,236
Default

Quote:
Originally Posted by MarathonMan View Post
This is good, not bad... this is what rewriting it should have done.

Now that each RSP instruction is contained within it's own C function, the interpreter will be able to dispatch functions with easy since it's a single, flat indirect just without any checks needed in regards to the bounds of the jump range or things like that.
True, though before they were not all split to their own functions, and we agreed an aligned JMP was faster than a procedural CALL.

It is possibly still better with this new way I guess.
Before, I avoided the assumption that inner, sub-opcodes were necessarily common in the ROM, so I only did a switch() on the primary opcode to avoid resorting to calling any functions at all, but if it was a vector instruction it was a function pointer table regardless before even reading in the primary opcode.

Only one way to know for sure here, to get the run-time stable!

Quote:
Originally Posted by MarathonMan View Post
That size isn't a problem. I think my simulator is up to ~250kB active code and I don't have instruction cache thrashing issues yet. The RSP in PJ64 executes tasks in batches, so you have locality on your side anyways.
Yeah I agree; I do not really have a problem with the DLL size going up.
As long as it's not like a megabyte or anything, there is nothing really concerning about it to me.

What I AM picky about is making the file size nice and aligned.
The old size of 80 KB was not the best example, but at least it was 2*2*2*2*5 KB.
It's just an obsession of mine.

Quote:
Originally Posted by MarathonMan View Post
Huh? So something broke? I'm confused, but anxious to see results.
heh, Do not worry about that.
I knew all along that it was going to be broken.

It didn't even take me running my plugin in PJ64 just now to give it a test drive to have known all along, before I even made that post, that I had to have broken something.

All I really concentrated on was stabilizing all the stuff under LWC2/SWC2, the scalar loads/stores and almost every other opcode anyway.

What I know I didn't concentrate on was maintaining the stability of the Jumps and Branches, and the BREAK opcode SP_STATUS_HALT reader for continuing the RSP CPU loop. I almost intentionally broke those things, because I was in a hurry to just rewrite the damn thing quickly!

But those are all the things that are easy to go back to and fix.
I just stressed out my time span of the rewrite on the opcodes that are not easy to go back and fix in the case of unexpected/invisible bugs, as you only get one shot to catch those things before such interpreter bugs are extremely hard to find.

Should have it fixed over today.
Reply With Quote
  #354  
Old 24th August 2013, 08:01 AM
HatCat's Avatar
HatCat HatCat is offline
Alpha Tester
Project Supporter
Senior Member
 
Join Date: Feb 2007
Location: In my hat.
Posts: 16,236
Default

Sorry, did not even try to fix it today because actually I completed (and I think perfected) my own dynamic RSP disassembler instead!
It would have come in handy so long ago....

The RSP disassembler I wrote is the new file $rsp/matrix.h .

Besides official info, so far as algorithm this was mostly through my own ideas, but it is otherwise in terms of accuracy as compared to zilmar's RSP disassembler built into RSP 1.7.0.9 for Project64 2.x:
  • Register syntax is fixed. Usually PJ64 commands stepper steals the main MIPS R4000 macro defines for the register names, but none of these are valid assembly language tokens for the 32 RSP scalar registers because they are not all the same function as the CPU core's ones (except $at, $s8, $ra, and $sp).
  • C?C2 instruction encoding fixed: The three vector control registers are not enumerable through the integers in zilmar's commands stepper, but the RSP assembler allows "$vco", "$vcc", and "$vce".
  • Jump targets are not truncated to twelve bits. Usually the MIPS jump target is an effective 28-bit address, which zilmar AND-masks by 0xFFF since these are the only valid IMEM offsets. However, I like to see the upper 4 nybbles because games seem to set them for a reason...for example, gfx boot ucode for Mario says `J 0x4001068`, because the RSP SP_PC_REG is CPU-mapped to the IMEM offset BASE=0x4001???, which zilmar's commands stepper trims to the ??? part.
  • Branch offsets conform to the official RSP assembly language specifications. It is probably more convenient that PJ64 stepper just prints the 0x??? IMEM offset so you can jump with your eyes, but the assembly language strictly requires a signed integer offset, in units of 32-bit IWs, relative to the current instruction slot.
  • Interchangeable disassembly-to-assembly compliance. Usually assembly language specific constructs such as CP0 system control register names through directives, do not compile back, so the internal register names for system control are used.
  • Pseudo-opcodes such as "MOVE", "BEQZ", and "NOP" which are not real RSP instructions were also suspended to make debugging easier. (Then again I don't think zilmar used the "MOVE" one, or "B", or plenty of the other pseudo-ops, only particular ones for some reason.)
  • "VSAW" encoding operands in the wrong order on pj64 disasm
  • LWC2::"L[T]WV" is disassembled as a reserved instruction which does not exist, while RSP 1.7.0.9 source seems to still reference it as a valid RSP opcode.
  • Didn't look at the more incidental fixes yet, just need to hurry up and post this so I can go to bed, wake up and re-work the new execute loop tomorrow.
Reply With Quote
  #355  
Old 25th August 2013, 07:41 AM
HatCat's Avatar
HatCat HatCat is offline
Alpha Tester
Project Supporter
Senior Member
 
Join Date: Feb 2007
Location: In my hat.
Posts: 16,236
Default

Have still not fixed it yet!
I underestimated the amount of interpreter bugs I could have to go back to.
I am at the same time making the core more stable than the current public release anyway, so the final result won't have any new weaknesses.

Some things I have run into:
* had to fix branch-and-link scheduler phase mechanism, which I broke purposely anyway
* Lazy copy-pasta of bit-wise AND, into the XORI function from ANDI
* Shouldn't use sizeof(VR) as a computational constant for 16-bite LS V lengths
* also purposely broke the SP_STATUS_BROKE exception handler and fixed it
* union bit-field decoder bug with I-type which MM helped me fix today
* was not supposed to shift offset for scalar loads and stores, only vector-DMEM transactions

Quote:
Originally Posted by MarathonMan View Post
Very much so. That looks much, much easier to read and less error-prone.

You can't really optimize it any way you look at it too much because of all the byte-ordering issues. Unless you do your 'static reinterpreter' thing, but methinks that may give very little benefit or none at all due to the large footprint size.
And since you seemed to like the universally, stable and compliant algorithm there better, I wrote a universal function for handling all the Group I vector loads and stores as categorized in SGI's legal patent informations:

Code:
INLINE void LS_Group_I(int direction, int length)
{ /* Group I vector loads and stores, as defined in SGI's patent. */
    register unsigned long addr;
    register int i;
    register int e = (inst.R.sa >> 1) & 0xF;
    const signed int offset = -(inst.W & 0x00000040) | inst.R.func;

    addr = (SR[inst.R.rs] + length*offset);
    if (direction == 0) /* "Load %s to Vector Unit" */
        for (i = 0; i < length; i++)
            VR_B(inst.R.rt, e+i | 0x0) = RSP.DMEM[BES(addr+i & 0xFFF)];
    else /* "Store %s from Vector Unit" */
        for (i = 0; i < length; i++)
            RSP.DMEM[BES(addr+i & 0xFFF)] = VR_B(inst.R.rt, e+i & 0xF);
    return;
}
LBV/LSV/LLV/LDV/SBV/SSV/SLV/SDV functions can all call this function and return instead of using hazard-prone code like I showed you before.

Code:
void LLV(void)
{
    LS_Group_I(0, sizeof(long) > 4 ? 4 : sizeof(long));
    return;
}
void LDV(void)
{
    LS_Group_I(0, sizeof(long long) > 8 ? 8 : sizeof(long long));
    return;
}
void SBV(void)
{
    LS_Group_I(1, sizeof(unsigned char));
    return;
}
void SSV(void)
{
    LS_Group_I(1, sizeof(short) > 2 ? 2 : sizeof(short));
    return;
}
bbtomoz
Reply With Quote
  #356  
Old 26th August 2013, 07:31 AM
HatCat's Avatar
HatCat HatCat is offline
Alpha Tester
Project Supporter
Senior Member
 
Join Date: Feb 2007
Location: In my hat.
Posts: 16,236
Default

And another bug fixed, caused again by the 144-hour-or-so frustrating rewrite of the entire scalar unit inducing tire and laziness.

Even though I rewrote like half of the file, yet again lmao, the actual fix was a simple one-liner for SH:

Code:
    *(short *)(RSP.DMEM + addr - HES(0x000)*(addr%4 - 1));
    *(short *)(RSP.DMEM + addr - HES(0x000)*(addr%4 - 1)) = (short)(SR[rt]);
lulz, so much pressure and time spent into rewriting that file, that I accidentally put null statements in.

The fortunate thing about these bugs is that it's impossible for me to not notice them because a game is always bound to exploit them, so there won't be any invisible/unnoticeable bugs when I am done sorting through this.

I spent about:
  • 2 hours re-reading the entire interpreter functions tree to look for what might be wrong.
  • 3 hours disassembling non-Group-I ?WC2 vector transaction calls to prove that none of the really complex RSP operations had the bug
  • 2 more hours proving that the interpreter::L/S scalar data primary ops must, by process of elimination, have had the bug.

Anyway, it boots Quest 64 and shows all the intro, start menu etc. graphics perfectly and flawlessly.
But there is a new bug by some rare RSP opcode that waited until now to execute, probably something under SWC2 Groups II and III somewhere.
So it's not enough to boot Mario64 3-D geometry and compare MarathonMan's 2-D jump table to my old RSP interpreter's speed.

That should be an easy fix tomorrow when I wake up.
I mean today when I wake up. *passes out*
Reply With Quote
  #357  
Old 26th August 2013, 02:16 PM
MarathonMan's Avatar
MarathonMan MarathonMan is offline
Alpha Tester
Project Supporter
Senior Member
 
Join Date: Jan 2013
Posts: 454
Default

Quote:
Originally Posted by BatCat View Post
lulz, so much pressure and time spent into rewriting that file, that I accidentally put null statements in.
Always, always, always, always, always, always:

Compile with:
Code:
gcc -Wall ...
99.99999% of the time, if -Wall says anything, your code is bad, and you should feel bad.

If you really want to be standard conformant and have clean code:
Code:
gcc -Wall -Wextra -pedantic -ansi ...
Reply With Quote
  #358  
Old 26th August 2013, 02:54 PM
mudlord_ mudlord_ is offline
Alpha Tester
Project Supporter
Senior Member
 
Join Date: Dec 2012
Posts: 381
Default

your love for gnu/linux and anything made by stallman including gcc sickens me.

go peddle your foss manlove elsewhere.
Reply With Quote
  #359  
Old 26th August 2013, 04:30 PM
HatCat's Avatar
HatCat HatCat is offline
Alpha Tester
Project Supporter
Senior Member
 
Join Date: Feb 2007
Location: In my hat.
Posts: 16,236
Default

Actually those are for the most part in common with Visual Studio also, which has the /Wall parameter.

And if you hate GNU/GCC/Stallman so bad then why keep porting my plugins to mupen64plus?
No doubt that uses GCC and is FOSS.

Last edited by HatCat; 26th August 2013 at 04:34 PM.
Reply With Quote
  #360  
Old 26th August 2013, 04:43 PM
MarathonMan's Avatar
MarathonMan MarathonMan is offline
Alpha Tester
Project Supporter
Senior Member
 
Join Date: Jan 2013
Posts: 454
Default

Quote:
Originally Posted by haxatax View Post
your love for gnu/linux and anything made by stallman including gcc sickens me.

go peddle your foss manlove elsewhere.
http://3.bp.blogspot.com/-PVMnGpK4re...4016221789.jpg
Reply With Quote
Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Forum Jump


All times are GMT. The time now is 04:06 AM.


Powered by vBulletin® Version 3.7.3
Copyright ©2000 - 2023, Jelsoft Enterprises Ltd.