Go Back   Project64 Forums > General Discussion > Open Discussion

Reply
 
Thread Tools Display Modes
  #191  
Old 12th May 2013, 04:52 PM
HatCat's Avatar
HatCat HatCat is offline
Alpha Tester
Project Supporter
Senior Member
 
Join Date: Feb 2007
Location: In my hat.
Posts: 16,236
Default

Quote:
Originally Posted by nintendo1889 View Post
... just say the word
http://www.youtube.com/watch?v=2WNrx2jq184
Reply With Quote
  #192  
Old 12th May 2013, 05:32 PM
the_randomizer's Avatar
the_randomizer the_randomizer is offline
Alpha Tester
Project Supporter
Senior Member
 
Join Date: Sep 2008
Location: USA
Posts: 1,136
Default

Quote:
Originally Posted by FatCat View Post
Nice.

I guess I wasn't clear enough

I meant to say "if you need any testers, just PM me..."
__________________
My rig:
CPU: Intel Core i7 4470 3.4 GHz to 3.9 GHz
Video card:: MSI nVidia GTX 970 4 GB GDDR5
OS: Windows 7 Professional 64-bit
RAM: 16 GB DDR3 SDRAM 10600
HDD: 2 x Western Digital 1 TB HDDs
Monitor: 23" Asus Full HD LED

Oh, and Snes9x > Zsnes in every way
Reply With Quote
  #193  
Old 12th May 2013, 08:30 PM
HatCat's Avatar
HatCat HatCat is offline
Alpha Tester
Project Supporter
Senior Member
 
Join Date: Feb 2007
Location: In my hat.
Posts: 16,236
Default

Hmmm so anyway as I was saying.

Splitting the scalar software emulation loop into individual loops saves some instruction space and is half of the time (Vector Logical, Vector Divide/Multiply) faster (except for some possible hybrids of having to re-pop slice segments off the vector stack using the external loop), half of the time slower (Vector Select, Vector Add).

Vector Select Less Than illustrates a key problem with attempting to vectorize combined loops into parallel transfers, particularly when such compiler predictions are unsuccessful:

Code:
#include "vu.h"

static void VLT(int vd, int vs, int vt, int e)
{
    int lt; /* less than, or if (CARRY && NOTEQUAL), equal */
    register int i;

    VCC = 0x0000;
    for (i = 0; i < 8; i++)
    {
        const signed short VS = VR[vs][i];
        const signed short VT = VR_T(i);

        lt  = (VS == VT);
        lt &= (VCO >> 0) & (unsigned char)(VCO >> 8) & 0x01;
        lt |= (VS < VT);
        VCC |= lt <<= i;
        VCO >>= 1; /* We need to clear this entire register anyway. */
    }
    for (i = 0; i < 8; i++)
        ACC_R(i) = VCC & (1 << i) ? VR[vs][i] : VR_T(i);
    for (i = 0; i < 8; i++)
        ACC_W(i) = ACC_R(i);
    VCO = 0x0000;
    return;
}
Because GCC cannot predict how to vectorize writes to the vector condition code flags registers (in this case, VCC) from the source elements*, vectorization attempts are most unsuccessful, and this split loop is left redundantly fetching the same source element values off the vector stack.

It was better kept as a single loop:
Code:
static void VLT(int vd, int vs, int vt, int e)
{
    int lt; /* less than, or if (CARRY && NOTEQUAL), equal */
    register int i;

    VCC = 0x0000;
    for (i = 0; i < 8; i++)
    {
        const signed short VS = VR[vs][i];
        const signed short VT = VR_T(i);

        lt  = (VS == VT);
        lt &= (VCO >> 0) & (unsigned char)(VCO >> 8) & 0x01;
// or, if stdlib.h, `... & _byteswap_ushort(VCO) & 0x01`
        lt |= (VS < VT);
        VCC |= lt <<= i;
        ACC_R(i) = lt ? VS : VT;
        VCO >>= 1; /* We need to clear this entire register anyway. */
    }
    for (i = 0; i < 8; i++)
        ACC_W(i) = ACC_R(i);
    VCO = 0x0000;
    return;
}
*MarathonMan found a way to do it on SSE by allocating RSP flags regs as an array of 1-bit Booleans, but this adds a problem because COP2 :: CFC2/CTC2 needs them allocated as correct registers.

Quote:
Originally Posted by nintendo1889 View Post
Nice.

I guess I wasn't clear enough

I meant to say "if you need any testers, just PM me..."
Yeah, yeah, wutevs, same testing as usual.
I'm still working my ass off on this thing.

If you're impatient that I haven't released an update yet, no worries, you're not alone. I am, too.

I would recommend trying to play through Resident Evil 2 completely, but it's not playable on Jabo's LLE due to incomplete Direct3D-software communications and also not on z64gl due to a tremendous frame buffer drawing obstructing the view of the game.

Last edited by HatCat; 12th May 2013 at 08:35 PM.
Reply With Quote
  #194  
Old 12th May 2013, 11:21 PM
the_randomizer's Avatar
the_randomizer the_randomizer is offline
Alpha Tester
Project Supporter
Senior Member
 
Join Date: Sep 2008
Location: USA
Posts: 1,136
Default

Quote:
Originally Posted by FatCat View Post
Hmmm so anyway as I was saying.

Splitting the scalar software emulation loop into individual loops saves some instruction space and is half of the time (Vector Logical, Vector Divide/Multiply) faster (except for some possible hybrids of having to re-pop slice segments off the vector stack using the external loop), half of the time slower (Vector Select, Vector Add).

Vector Select Less Than illustrates a key problem with attempting to vectorize combined loops into parallel transfers, particularly when such compiler predictions are unsuccessful:

Code:
#include "vu.h"

static void VLT(int vd, int vs, int vt, int e)
{
    int lt; /* less than, or if (CARRY && NOTEQUAL), equal */
    register int i;

    VCC = 0x0000;
    for (i = 0; i < 8; i++)
    {
        const signed short VS = VR[vs][i];
        const signed short VT = VR_T(i);

        lt  = (VS == VT);
        lt &= (VCO >> 0) & (unsigned char)(VCO >> 8) & 0x01;
        lt |= (VS < VT);
        VCC |= lt <<= i;
        VCO >>= 1; /* We need to clear this entire register anyway. */
    }
    for (i = 0; i < 8; i++)
        ACC_R(i) = VCC & (1 << i) ? VR[vs][i] : VR_T(i);
    for (i = 0; i < 8; i++)
        ACC_W(i) = ACC_R(i);
    VCO = 0x0000;
    return;
}
Because GCC cannot predict how to vectorize writes to the vector condition code flags registers (in this case, VCC) from the source elements*, vectorization attempts are most unsuccessful, and this split loop is left redundantly fetching the same source element values off the vector stack.

It was better kept as a single loop:
Code:
static void VLT(int vd, int vs, int vt, int e)
{
    int lt; /* less than, or if (CARRY && NOTEQUAL), equal */
    register int i;

    VCC = 0x0000;
    for (i = 0; i < 8; i++)
    {
        const signed short VS = VR[vs][i];
        const signed short VT = VR_T(i);

        lt  = (VS == VT);
        lt &= (VCO >> 0) & (unsigned char)(VCO >> 8) & 0x01;
// or, if stdlib.h, `... & _byteswap_ushort(VCO) & 0x01`
        lt |= (VS < VT);
        VCC |= lt <<= i;
        ACC_R(i) = lt ? VS : VT;
        VCO >>= 1; /* We need to clear this entire register anyway. */
    }
    for (i = 0; i < 8; i++)
        ACC_W(i) = ACC_R(i);
    VCO = 0x0000;
    return;
}
*MarathonMan found a way to do it on SSE by allocating RSP flags regs as an array of 1-bit Booleans, but this adds a problem because COP2 :: CFC2/CTC2 needs them allocated as correct registers.



Yeah, yeah, wutevs, same testing as usual.
I'm still working my ass off on this thing.

If you're impatient that I haven't released an update yet, no worries, you're not alone. I am, too.

I would recommend trying to play through Resident Evil 2 completely, but it's not playable on Jabo's LLE due to incomplete Direct3D-software communications and also not on z64gl due to a tremendous frame buffer drawing obstructing the view of the game.
No worries, I can wait, but I can test Resident Evil 2 on other plugins.
__________________
My rig:
CPU: Intel Core i7 4470 3.4 GHz to 3.9 GHz
Video card:: MSI nVidia GTX 970 4 GB GDDR5
OS: Windows 7 Professional 64-bit
RAM: 16 GB DDR3 SDRAM 10600
HDD: 2 x Western Digital 1 TB HDDs
Monitor: 23" Asus Full HD LED

Oh, and Snes9x > Zsnes in every way
Reply With Quote
  #195  
Old 12th May 2013, 11:35 PM
HatCat's Avatar
HatCat HatCat is offline
Alpha Tester
Project Supporter
Senior Member
 
Join Date: Feb 2007
Location: In my hat.
Posts: 16,236
Default

To be honest I would somewhat rather people wait for my next release.

It has a couple new things that may need testing.
That info won't be available when putting those efforts to test for only the current build.

But, at the same time would rather not cause anyone to have to wait.
Reply With Quote
  #196  
Old 13th May 2013, 03:14 AM
the_randomizer's Avatar
the_randomizer the_randomizer is offline
Alpha Tester
Project Supporter
Senior Member
 
Join Date: Sep 2008
Location: USA
Posts: 1,136
Default

Quote:
Originally Posted by FatCat View Post
To be honest I would somewhat rather people wait for my next release.

It has a couple new things that may need testing.
That info won't be available when putting those efforts to test for only the current build.

But, at the same time would rather not cause anyone to have to wait.
Again, no worries. It's out when it's out
__________________
My rig:
CPU: Intel Core i7 4470 3.4 GHz to 3.9 GHz
Video card:: MSI nVidia GTX 970 4 GB GDDR5
OS: Windows 7 Professional 64-bit
RAM: 16 GB DDR3 SDRAM 10600
HDD: 2 x Western Digital 1 TB HDDs
Monitor: 23" Asus Full HD LED

Oh, and Snes9x > Zsnes in every way
Reply With Quote
  #197  
Old 13th May 2013, 01:56 PM
shunyuan's Avatar
shunyuan shunyuan is offline
Alpha Tester
Project Supporter
Senior Member
 
Join Date: Apr 2013
Posts: 491
Default

Two questions about the idea you mentioned before to use LLE graphics as the the rendering engine for HLE graphics emulation:

(1) how to define the interface between HLE and LLE?

(2) what if add LLE RSP as the graphics task interpreter to solve the customized ucode problem, then again how to define the interface.
__________________
---------------------
CPU: Intel U7300 1.3 GHz
GPU: Mobile Intel 4 Series (on board)
AUDIO: Realtek HD Audio (on board)
RAM: 4 GB
OS: Windows 7 - 32 bit
Reply With Quote
  #198  
Old 13th May 2013, 03:05 PM
HatCat's Avatar
HatCat HatCat is offline
Alpha Tester
Project Supporter
Senior Member
 
Join Date: Feb 2007
Location: In my hat.
Posts: 16,236
Default

Quote:
Originally Posted by suanyuan View Post
Two questions about the idea you mentioned before to use LLE graphics as the the rendering engine for HLE graphics emulation:
Hum, my memory must be fogging.
Could you point me to where I said that?
I know I had been thinking about it, just don't remember posting it.

Quote:
Originally Posted by suanyuan View Post
(1) how to define the interface between HLE and LLE?
Excluding some basic HLE of RSP tasks from inside the RSP plugin internally I have no experience with HLE of the RCP, so I have little insight on how to interpret your meaning there.

It should be able to use all the same structures that LLE does, except you are pre-allocating the recompiler for the RSP.
The RSP interpreter plugin will be the simple LLE plugin.
The RSP HLE plugin will "call" the RSP functions as inline macros, sort of, which will pull them into the ucode simulation process and have the compiler "re-compile" all the code statically for you, which is much stronger than writing a dynamic recompiler.

But, I never tried it, just been more fascinated with optimizing LLE.

Hope I partially answered your question somehow.

Quote:
Originally Posted by suanyuan View Post
(2) what if add LLE RSP as the graphics task interpreter to solve the customized ucode problem, then again how to define the interface.
Not sure I'm following, but if you're asking how to define the LLE interface of the RSP gfx then all of that should be laid out in my RSP plugin already.

It would be useful also to add pre-compilers for the other custom graphics microcodes (Star Wars and such), but I don't find the endeavor of the utmost interest since such ucodes are nonstandard and rarely used. It would be better to worry about the hardware than the software; hence LLE usually does the trick.
Reply With Quote
  #199  
Old 13th May 2013, 04:16 PM
HatCat's Avatar
HatCat HatCat is offline
Alpha Tester
Project Supporter
Senior Member
 
Join Date: Feb 2007
Location: In my hat.
Posts: 16,236
Default

lulz, looks like I might have had a bug in VCH:

Code:
...
static void VCH(int vd, int vs, int vt, int e)
{
    int ge, le, neq;
    register int i;

    VCO = 0x0000;
    VCC = 0x0000;
    VCE = 0x00;
    for (i = 0; i < 8; i++)
    {
        const signed short VS = VR[vs][i];
        const signed short VT = VR_T(i);
        const int sn = (VS ^ VT) < 0; /* sn = (unsigned short)(VS ^ VT) >> 15 */

        if (sn)
        {
            ge = (VT < 0);
            le = (VS + VT <= 0);
            neq = (VS + VT == -1); /* compare extension */
            VCE |= neq << i;
            neq ^= !(VS + VT == 0);
            ACC_R(i) = le ? -VT : VS;
            VCC |=  (ge << (i + 0x8)) | (le << (i + 0x0));
            VCO |= (neq << (i + 0x8)) | (0x0001 << i);
        }
        else
        {
...
It should be, this bit is set only if neither (VS + VT == -1) nor (VS + VT == 0). (It's a tricky mess of decoding bit-wise hack algorithms, but zilmar's RSP reverse-engineering notes do agree with that definition.)

I guess I thought I was doing it faster by saying neq = (VS + VT == -1) ^ !(VS + VT == 0), when it should really just be:
neq = !(VS + VT == -1) & !(VS + VT == 0)
or, alternatively,
neq = !((VS + VT == -1) | (VS + VT == 0))

Doing the deskwork in C++ Boolean math:

Code:
extern bool f(x, y);
extern bool g(x, y);
extern bool h(x, y);

// #define x (VS + VT == -1)
// #define y (VS + VT ==  0)

bool f(x, y) {
    return (!x & !y); // correct
}

bool g(x, y) {
    return !(x | y); // also correct
}

bool h(x, y) {
    return (x ^ !y); // NOT correct
}
Yields these results:
Code:
f(x, y) ::
    !0 & !0 == 1 & 1 == true
    !0 & !1 == 1 & 0 == false
    !1 & !0 == 0 & 1 == false
    !1 & !1 == 0 & 0 == false

g(x, y) ::
    !(0 | 0) == !0 == true
    !(0 | 1) == !1 == false
    !(1 | 0) == !1 == false
    !(1 | 1) == !1 == false

h(x, y) ::
    0 ^ !0 == 0 ^ 1 == true
    0 ^ !1 == 0 ^ 0 == false
    1 ^ !0 == 1 ^ 1 == false
    1 ^ !1 == 1 ^ 0 == true
So the next build of the RSP dll I release here should have VCH fixed.
[/code]
Reply With Quote
  #200  
Old 13th May 2013, 04:31 PM
HatCat's Avatar
HatCat HatCat is offline
Alpha Tester
Project Supporter
Senior Member
 
Join Date: Feb 2007
Location: In my hat.
Posts: 16,236
Default

*facepalm*

Of course, one would be hard-pressed to call that a bug, if the one missed condition is both x && y are TRUE.

Because (VS + VT == 0) and (VS + VT == -1) is not a valid combination of true conditions...so I guess I was right.

Changing it to the more readable version throws in a few ten extra statements, so I just need to comment that confusion somehow.
Reply With Quote
Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Forum Jump


All times are GMT. The time now is 11:37 PM.


Powered by vBulletin® Version 3.7.3
Copyright ©2000 - 2022, Jelsoft Enterprises Ltd.