#1  
Old 11th September 2013, 04:32 PM
mudlord_ mudlord_ is offline
Alpha Tester
Project Supporter
Senior Member
 
Join Date: Dec 2012
Posts: 381
Default RDP development thread

started porting marathonman's work back to pj64 plugin
Attached Files
File Type: zip n64video.zip (792.4 KB, 50 views)
Reply With Quote
  #2  
Old 11th September 2013, 04:48 PM
MarathonMan's Avatar
MarathonMan MarathonMan is offline
Alpha Tester
Project Supporter
Senior Member
 
Join Date: Jan 2013
Posts: 454
Default

Does it work now? How does it compare to angrylion stock RDP? CEN64 spends a large majority of time outside of RDP currently, so I can't tell much of a difference yet.

Another thing I thought of: performance is going to suffer somewhat without LTO. I'll see if I can compile this with gcc tonight and/or think of a way to cleanly remove this "optimization dependency".
Reply With Quote
  #3  
Old 11th September 2013, 05:10 PM
mudlord_ mudlord_ is offline
Alpha Tester
Project Supporter
Senior Member
 
Join Date: Dec 2012
Posts: 381
Default

Thats the thing: doesn't work atm currently. I replaced the macros etc.

Would be nice indeed to see a GCC compile with it working.
Reply With Quote
  #4  
Old 11th September 2013, 07:31 PM
HatCat's Avatar
HatCat HatCat is offline
Alpha Tester
Project Supporter
Senior Member
 
Join Date: Feb 2007
Location: In my hat.
Posts: 16,236
Default

angrylion admittedly avoided many endianness issues in his plugin.

If you do any 16- or 32-bit reads and writes (like DST = *(int *)(base + addr)) then this is going to end up little-endian anyway.

Since MarathonMan's emulator is entirely big-endian (which is correct for all storage purposes), I think the RREADADDR8 or w/e functions would themselves all need to be rewritten to become byte-iterative.

Not really an easy thing, but, not sure whether it's easier to port it now and keep it in sync with future changes, or to delay porting it until after MM "simplifies" it.
Reply With Quote
  #5  
Old 11th September 2013, 07:51 PM
MarathonMan's Avatar
MarathonMan MarathonMan is offline
Alpha Tester
Project Supporter
Senior Member
 
Join Date: Jan 2013
Posts: 454
Default

Quote:
Originally Posted by BatCat View Post
angrylion admittedly avoided many endianness issues in his plugin.

If you do any 16- or 32-bit reads and writes (like DST = *(int *)(base + addr)) then this is going to end up little-endian anyway.

Since MarathonMan's emulator is entirely big-endian (which is correct for all storage purposes), I think the RREADADDR8 or w/e functions would themselves all need to be rewritten to become byte-iterative.

Not really an easy thing, but, not sure whether it's easier to port it now and keep it in sync with future changes, or to delay porting it until after MM "simplifies" it.
Thankfully he enclosed all RSP and RDRAM accesses using those macros (or was it MooglyGuy? -- possibly the latter), so all I had to do was enclose the macros in bswap functions.
Reply With Quote
  #6  
Old 12th September 2013, 04:07 AM
MarathonMan's Avatar
MarathonMan MarathonMan is offline
Alpha Tester
Project Supporter
Senior Member
 
Join Date: Jan 2013
Posts: 454
Default

Latest version in CEN64 is looking pretty gnarly. Give her some more vectorization and work and fun shall be had.

http://www.youtube.com/watch?v=oaWTlqzBFvw
Reply With Quote
  #7  
Old 12th September 2013, 05:22 AM
mudlord_ mudlord_ is offline
Alpha Tester
Project Supporter
Senior Member
 
Join Date: Dec 2012
Posts: 381
Default

wow, looks very nice.
got back to porting over bits, one by one, so i dont break anything.

almost ported all the precompiled tables except for the cvarray, which doesnt work for some reason here.
Attached Files
File Type: zip AL_sse.zip (663.4 KB, 15 views)

Last edited by mudlord_; 12th September 2013 at 05:40 AM.
Reply With Quote
  #8  
Old 12th September 2013, 11:21 AM
MarathonMan's Avatar
MarathonMan MarathonMan is offline
Alpha Tester
Project Supporter
Senior Member
 
Join Date: Jan 2013
Posts: 454
Default

Quote:
Originally Posted by haxatax View Post
wow, looks very nice.
got back to porting over bits, one by one, so i dont break anything.

almost ported all the precompiled tables except for the cvarray, which doesnt work for some reason here.
EDIT: Nevermind, I didn't know that the port was a work in progress.

Hmm... not sure why it wouldn't work, though...

Last edited by MarathonMan; 12th September 2013 at 12:52 PM.
Reply With Quote
  #9  
Old 12th September 2013, 07:14 PM
HatCat's Avatar
HatCat HatCat is offline
Alpha Tester
Project Supporter
Senior Member
 
Join Date: Feb 2007
Location: In my hat.
Posts: 16,236
Default

Probably because only the cvarray has this struct type declared:
Code:
typedef struct {
    uint8_t cvg;
    uint8_t cvbit;
    uint8_t xoff;
    uint8_t yoff;
} CVtcmaskDERIVATIVE;
I don't know whether endianness for structs is maintained or if it needs to be corrected in this case with a byte-swap, but if it is maintained then maybe it's just the run-time initialization of the array:

Code:
	for (; i < 0x100; i++)
	{
		mask = decompress_cvmask_frombyte(i);
		cvarray[i].cvg = cvarray[i].cvbit = 0;
		cvarray[i].cvbit = (i >> 7) & 1;
		for (k = 0; k < 8; k++)
			cvarray[i].cvg += ((i >> k) & 1);

		
		masky = maskx = offx = offy = 0;
		for (k = 0; k < 4; k++)
			masky |= ((mask & (0xf000 >> (k << 2))) > 0) << k;

		offy = yarray[masky];
		
		maskx = (mask & (0xf000 >> (offy << 2))) >> ((offy ^ 3) << 2);
		
		
		offx = xarray[maskx];
		
		cvarray[i].xoff = offx;
		cvarray[i].yoff = offy;
	}
Question: Why do we use angrylion's FPTR for this module instead of "nocomment"? Wasn't the latter supposed to be more accurate and up-to-date with more of his fixes and optimizations?
Reply With Quote
  #10  
Old 12th September 2013, 08:45 PM
MarathonMan's Avatar
MarathonMan MarathonMan is offline
Alpha Tester
Project Supporter
Senior Member
 
Join Date: Jan 2013
Posts: 454
Default

Quote:
Originally Posted by BatCat View Post
Probably because only the cvarray has this struct type declared:
Code:
typedef struct {
    uint8_t cvg;
    uint8_t cvbit;
    uint8_t xoff;
    uint8_t yoff;
} CVtcmaskDERIVATIVE;
I don't know whether endianness for structs is maintained or if it needs to be corrected in this case with a byte-swap, but if it is maintained then maybe it's just the run-time initialization of the array:

Code:
	for (; i < 0x100; i++)
	{
		mask = decompress_cvmask_frombyte(i);
		cvarray[i].cvg = cvarray[i].cvbit = 0;
		cvarray[i].cvbit = (i >> 7) & 1;
		for (k = 0; k < 8; k++)
			cvarray[i].cvg += ((i >> k) & 1);

		
		masky = maskx = offx = offy = 0;
		for (k = 0; k < 4; k++)
			masky |= ((mask & (0xf000 >> (k << 2))) > 0) << k;

		offy = yarray[masky];
		
		maskx = (mask & (0xf000 >> (offy << 2))) >> ((offy ^ 3) << 2);
		
		
		offx = xarray[maskx];
		
		cvarray[i].xoff = offx;
		cvarray[i].yoff = offy;
	}
Question: Why do we use angrylion's FPTR for this module instead of "nocomment"? Wasn't the latter supposed to be more accurate and up-to-date with more of his fixes and optimizations?
Nah, no byte-ordering issues. The tables are loaded the same if he used the arrays from Tables.c.

I used angrylion's nocomment branch.
Reply With Quote
Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Forum Jump


All times are GMT. The time now is 01:56 AM.


Powered by vBulletin® Version 3.7.3
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.