Go Back   Project64 Forums > General Discussion > Open Discussion

Reply
 
Thread Tools Display Modes
  #1231  
Old 21st July 2014, 08:10 PM
HatCat's Avatar
HatCat HatCat is offline
Alpha Tester
Project Supporter
Senior Member
 
Join Date: Feb 2007
Location: In my hat.
Posts: 16,236
Default

Before:
Code:
static void vi_vl_lerp(CCVG* up, CCVG down, UINT32 frac)
{
    UINT32 r0, g0, b0;

    if (frac == 0)
        return;

    r0 = up->rgba[CCVG_RED];
    g0 = up->rgba[CCVG_GRN];
    b0 = up->rgba[CCVG_BLU];

    up->rgba[CCVG_RED] =
        (((frac*(down.rgba[CCVG_RED] - r0) + 16) >> 5) + r0) & 0xFF;
    up->rgba[CCVG_GRN] =
        (((frac*(down.rgba[CCVG_GRN] - g0) + 16) >> 5) + g0) & 0xFF;
    up->rgba[CCVG_BLU] =
        (((frac*(down.rgba[CCVG_BLU] - b0) + 16) >> 5) + b0) & 0xFF;
    return;
}
After:
Code:
static void vi_vl_lerp(CCVG* up, CCVG down, unsigned char frac)
{
#ifdef USE_SSE_SUPPORT
    __m128i xmm0, xmm1, xmm2, xmm3;
#endif
    ALIGNED unsigned int result[4];
    ALIGNED unsigned int source[4];

    if (frac == 0)
        return;

    source[CCVG_RED] = up -> rgba[CCVG_RED];
    source[CCVG_GRN] = up -> rgba[CCVG_GRN];
    source[CCVG_BLU] = up -> rgba[CCVG_BLU];
    source[CCVG_CVG] = up -> rgba[CCVG_CVG];

    result[CCVG_RED]   = down.rgba[CCVG_RED];
    result[CCVG_GRN]   = down.rgba[CCVG_GRN];
    result[CCVG_BLU]   = down.rgba[CCVG_BLU];
    result[CCVG_CVG]   = down.rgba[CCVG_CVG];

#ifdef USE_SSE_SUPPORT
    xmm0 = _mm_load_si128((__m128i *)result);
    xmm1 = _mm_load_si128((__m128i *)source);
    xmm2 = _mm_set1_epi32(frac);
    xmm3 = _mm_set1_epi32(16);

    xmm0 = _mm_sub_epi32(xmm0, xmm1);
    xmm0 = _mm_mullo_epi16(xmm0, xmm2);
    xmm0 = _mm_add_epi32(xmm0, xmm3);
    xmm0 = _mm_srli_epi32(xmm0, 5);
    xmm0 = _mm_add_epi32(xmm0, xmm1);
    _mm_store_si128((__m128i *)result, xmm0);
#else
    result[CCVG_RED]  -= source[CCVG_RED];
    result[CCVG_GRN]  -= source[CCVG_GRN];
    result[CCVG_BLU]  -= source[CCVG_BLU];

    result[CCVG_RED]  *= frac;
    result[CCVG_GRN]  *= frac;
    result[CCVG_BLU]  *= frac;

    result[CCVG_RED]  += 16;
    result[CCVG_GRN]  += 16;
    result[CCVG_BLU]  += 16;

    result[CCVG_RED] >>= 5;
    result[CCVG_GRN] >>= 5;
    result[CCVG_BLU] >>= 5;

    result[CCVG_RED]  += source[CCVG_RED];
    result[CCVG_GRN]  += source[CCVG_GRN];
    result[CCVG_BLU]  += source[CCVG_BLU];
#endif

    up->rgba[CCVG_RED] = result[CCVG_RED] & 0xFF;
    up->rgba[CCVG_GRN] = result[CCVG_GRN] & 0xFF;
    up->rgba[CCVG_BLU] = result[CCVG_BLU] & 0xFF;
    return;
}
Believe it or not, the previous version was better. Forcing the compiler to emit SSE2 instructions is a negative, vendor-happy eccentricity. In this case the generation of SSE operations makes the code larger overall, but it was a fun experiment nonetheless. I will stick to a cross between the former and the latter and leave it up to the compiler whether to generate them.
Reply With Quote
  #1232  
Old 21st July 2014, 08:35 PM
HatCat's Avatar
HatCat HatCat is offline
Alpha Tester
Project Supporter
Senior Member
 
Join Date: Feb 2007
Location: In my hat.
Posts: 16,236
Default

Quote:
Originally Posted by RPGMaster View Post
How can I find out what version of OpenGL my computer can support?
I would probably G**gle if I wanted to know what mine was.
But then again, I don't myself.

Quote:
Originally Posted by RPGMaster View Post
I really want to start developing a plugin that works especially well on my hardware.
It's not what version of OpenGL you use.
It's what functions of OpenGL you use.

Creating an OpenGL 1.0 context could be every bit as fast as running things off a GL 3.1 context as long as you're not using things like deprecated, fixed-function pipeline stuff in GL 1.x that got deprecated in 3.1 (which I fear may be impossible for me to avoid even for a basic software-rendering plugin >.<). So maybe I'm not really saying much, but the point is, whether glGetString(GL_VERSION) says you're running a 2.0 context or a 3.1 context, doesn't affect whether it "works especially well on [my] hardware", only what features/GL functions are unconditionally supported at the developer's disposal.

Based on what you told me your gl_ext.txt was it sounds like your video card has a lot of extensions, so is probably compatible with a high version number of OpenGL that unconditionally supports some of them. Even if you were programming on OpenGL 1.1 (which in case you didn't know, Microsoft practically forces you to do), you can still access them using the extensions API in gl.

Quote:
Originally Posted by RPGMaster View Post
Anyway, you mind explaining ways to make the compiler optimize functions better? I remember in your RSP plugin, you had a bunch of header files. What's the difference between having 1 .c file and a bunch of header files, and multiple .c files with some header files. I know that when you have multiple .c files, you get multiple .obj files. As far as optimization goes, is there a significant difference? Because with certain compilers, I can see that it's doing a poor job with functions.
There is no significant difference other than how much extra optimization is done by the compiler versus by the linker. You could put every function in its own .c file for thousands of functions (thousands of .c files) and the linker could still optimize some code across .obj boundaries if link-time code generation is enabled. The more organized approach of course is to not use so many .c files, but only so many as there are different groups of functions, different modules of supported features of the program. Just think of it as doing a multi-tabbed GUI for this plugin with different GUI options "functions" in each tab ".c source file".
Reply With Quote
  #1233  
Old 21st July 2014, 09:51 PM
GPDP GPDP is offline
Senior Member
 
Join Date: May 2013
Posts: 146
Default



hue
Reply With Quote
  #1234  
Old 21st July 2014, 10:42 PM
HatCat's Avatar
HatCat HatCat is offline
Alpha Tester
Project Supporter
Senior Member
 
Join Date: Feb 2007
Location: In my hat.
Posts: 16,236
Default

a) original
Code:

    if ((leftr >= centerr && rightr >= leftr) || (leftr >= rightr && centerr >= leftr))
        final->rgba[CCVG_RED] = leftr;
    else if ((rightr >= centerr && leftr >= rightr) || (rightr >= leftr && centerr >= rightr))
        final->rgba[CCVG_RED] = rightr;
b) if/else-if always involves branch prediction, so use 2 if's instead.
The logic of course behind "else if" is that if the first test failed, so move the (else if) condition on top if the (if) one so that final->rgba[] can be overwritten by the primary one if that's also true.
Code:

    if ((rightr >= centerr && leftr >= rightr) || (rightr >= leftr && centerr >= rightr))
        final->rgba[CCVG_RED] = rightr;
    if ((leftr >= centerr && rightr >= leftr) || (leftr >= rightr && centerr >= leftr))
        final->rgba[CCVG_RED] = leftr;


c) in order to help convert the behavior from dynamic to static, prepare a preliminary ternary conditional statement:
Code:
    if ((rightr >= centerr && leftr >= rightr) || (rightr >= leftr && centerr >= rightr))
        final->rgba[CCVG_RED] = rightr;
    else
        final->rgba[CCVG_RED] = final->rgba[CCVG_RED];

    if ((leftr >= centerr && rightr >= leftr) || (leftr >= rightr && centerr >= leftr))
        final->rgba[CCVG_RED] = leftr;
    else
        final->rgba[CCVG_RED] = final->rgba[CCVG_RED];
d) Start by converting all the `>=' (greater than or equal) operators to `<' (less than).
Less than is much better in 2's complement because inequality/comparisons of "less than" between 2 signed integers can be directly evaluated as int32_t x, y; (x < y) == (x - y < 0) == !!((x - y) & 0x80000000) on 2's cmpl., which has no branch or ugly conditional logic.
Code:
    if (!(rightr < centerr || leftr < rightr) || !(rightr < leftr || centerr < rightr))
        final->rgba[CCVG_RED] = rightr;
    else
        final->rgba[CCVG_RED] = final->rgba[CCVG_RED];

    if (!(leftr < centerr || rightr < leftr) || !(leftr < rightr || centerr < leftr))
        final->rgba[CCVG_RED] = leftr;
    else
        final->rgba[CCVG_RED] = final->rgba[CCVG_RED];
e) Right now we're left with conditions of the form: if (!a || !b), which is annoying. Double negatives aren't always the best thing to look at, nor usually for performance. But the double negatives can algebraically cancel each other out by inverting both the if() conditions as well as the statements within them: (!a || !b) == !(a && b) # "if either of them are NOT true" is the opposite of "if both of them ARE true"
Code:
    if ((rightr < centerr || leftr < rightr) && (rightr < leftr || centerr < rightr))
        final->rgba[CCVG_RED] = final->rgba[CCVG_RED];
    else
        final->rgba[CCVG_RED] = rightr;

    if ((leftr < centerr || rightr < leftr) && (leftr < rightr || centerr < leftr))
        final->rgba[CCVG_RED] = final->rgba[CCVG_RED];
    else
        final->rgba[CCVG_RED] = leftr;
f) Everything is still written in a logical/arithmetic syntax. Better static results are usually achieved with bit-wise operations. This, however, is better divided into 2 steps, than merged into 1 single step.
Code:
    if (((rightr < centerr) | (leftr < rightr)) & ((rightr < leftr) | (centerr < rightr)))
        final->rgba[CCVG_RED] = final->rgba[CCVG_RED];
    else
        final->rgba[CCVG_RED] = rightr;

    if (((leftr < centerr) | (rightr < leftr)) & ((leftr < rightr) | (centerr < leftr)))
        final->rgba[CCVG_RED] = final->rgba[CCVG_RED];
    else
        final->rgba[CCVG_RED] = leftr;
g) Given signed integers a and b (and the standard rule that sizeof(int) >= 2 bytes) encompassing the unsigned 8-bit RGB color components (in the safe, smaller range of values: from 0 to 255), then we can safely conclude that (a < b) can be computed as (a - b < 0), which in the worst case can translate to 16-bit x86 code as:
Code:
SUB ax, bx
SHR ax, 15
Code:
    if (((rightr - centerr < 0) | (leftr - rightr < 0)) & ((rightr - leftr < 0) | (centerr - rightr < 0)))
        final->rgba[CCVG_RED] = final->rgba[CCVG_RED];
    else
        final->rgba[CCVG_RED] = rightr;

    if (((leftr - centerr < 0) | (rightr - leftr < 0)) & ((leftr - rightr < 0) | (centerr - leftr < 0)))
        final->rgba[CCVG_RED] = final->rgba[CCVG_RED];
    else
        final->rgba[CCVG_RED] = leftr;
h) if ((a - b < 0) | (c - d < 0)), then (((a - b) | (c - d)) < 0). That is, if one of the two differences sets the SIGN or most significant bit, then the bit-wise OR-mask of the two differences does so.
Code:
    if ((((rightr - centerr) | (leftr - rightr)) < 0) & (((rightr - leftr) | (centerr - rightr)) < 0))
        final->rgba[CCVG_RED] = final->rgba[CCVG_RED];
    else
        final->rgba[CCVG_RED] = rightr;

    if ((((leftr - centerr) | (rightr - leftr)) < 0) & (((leftr - rightr) | (centerr - leftr)) < 0))
        final->rgba[CCVG_RED] = final->rgba[CCVG_RED];
    else
        final->rgba[CCVG_RED] = leftr;
i) Similarly, if ((m < 0) & (n < 0)), then ((m & n) < 0) because the SIGN MSB is set in both registers, under 2's cmpl.
Code:
    if ((((rightr - centerr) | (leftr - rightr)) & ((rightr - leftr) | (centerr - rightr))) < 0)
        final->rgba[CCVG_RED] = final->rgba[CCVG_RED];
    else
        final->rgba[CCVG_RED] = rightr;

    if ((((leftr - centerr) | (rightr - leftr)) & ((leftr - rightr) | (centerr - leftr))) < 0)
        final->rgba[CCVG_RED] = final->rgba[CCVG_RED];
    else
        final->rgba[CCVG_RED] = leftr;
j) Set up an index-able mask for use in a future LUT.
Code:
    unsigned char possibilities[2];
    unsigned int mask;

    possibilities[0] = rightr;
    possibilities[1] = final -> rgba[CCVG_RED];

    mask   = (rightr - centerr) | (leftr - rightr);
    mask  &= (rightr - leftr) | (centerr - rightr);
    mask >>= 8*sizeof(unsigned int) - 1;
    final->rgba[CCVG_RED] = possibilities[mask];

    possibilities[0] = leftr;
 /* possibilities[1] = final -> rgba[CCVG_RED]; */

    mask   = (leftr - centerr) | (rightr - leftr);
    mask  &= (leftr - rightr) | (centerr - leftr);
    mask >>= 8*sizeof(unsigned int) - 1;
    final->rgba[CCVG_RED] = possibilities[mask];
And then, there are even steps beyond that.

Notice, for example, the common pattern in the `mask' updates:
Code:
    mask   = (rightr - centerr) | (leftr - rightr);
    mask  &= (rightr - leftr) | (centerr - rightr);

...

    mask   = (leftr - centerr) | (rightr - leftr);
    mask  &= (leftr - rightr) | (centerr - leftr);
We have the basic algebraic form:
Code:
    mask  = (a - b) | (c - a);
    mask &= (a - c) | (b - a);
Obviously, -(a - b) == b - a; and -(c - a) == a - c.
Reply With Quote
  #1235  
Old 21st July 2014, 10:49 PM
zuzma zuzma is offline
Junior Member
 
Join Date: Jan 2013
Posts: 28
Default

Quote:
Originally Posted by GPDP View Post
hue
Huh, did that never work before or something?

If it's working because of bsmiles32 clean up that'd be a sad reason for it never working properly.

Last edited by zuzma; 21st July 2014 at 10:51 PM.
Reply With Quote
  #1236  
Old 21st July 2014, 10:59 PM
GPDP GPDP is offline
Senior Member
 
Join Date: May 2013
Posts: 146
Default

Quote:
Originally Posted by zuzma View Post
Huh, did that never work before or something?
It's an old build of Mupen64Plus from when it still used the Zilmar plugin spec that I just found. I tried this plugin on it for shits, and it actually works and plays some games that regular Mupen64 does not, such as Rogue Squadron. Too bad the whole thing is probably even more unstable than 1964.
Reply With Quote
  #1237  
Old 21st July 2014, 11:09 PM
zuzma zuzma is offline
Junior Member
 
Join Date: Jan 2013
Posts: 28
Default

That's still pretty nifty either way. I've yet to actually test any extreme case games with HatCats RDP plugin. The only game I seem to play now a days is mario 64. Plus I've been tinkering at failing to understand how the nes works.
Reply With Quote
  #1238  
Old 21st July 2014, 11:13 PM
RPGMaster's Avatar
RPGMaster RPGMaster is offline
Alpha Tester
Project Supporter
Super Moderator
 
Join Date: Dec 2013
Posts: 2,008
Default

Quote:
Originally Posted by HatCat View Post
I would probably G**gle if I wanted to know what mine was.
But then again, I don't myself.
Alright, will do.
Quote:
Originally Posted by HatCat View Post
It's not what version of OpenGL you use.
It's what functions of OpenGL you use.

Creating an OpenGL 1.0 context could be every bit as fast as running things off a GL 3.1 context as long as you're not using things like deprecated, fixed-function pipeline stuff in GL 1.x that got deprecated in 3.1 (which I fear may be impossible for me to avoid even for a basic software-rendering plugin >.<). So maybe I'm not really saying much, but the point is, whether glGetString(GL_VERSION) says you're running a 2.0 context or a 3.1 context, doesn't affect whether it "works especially well on [my] hardware", only what features/GL functions are unconditionally supported at the developer's disposal.

Based on what you told me your gl_ext.txt was it sounds like your video card has a lot of extensions, so is probably compatible with a high version number of OpenGL that unconditionally supports some of them. Even if you were programming on OpenGL 1.1 (which in case you didn't know, Microsoft practically forces you to do), you can still access them using the extensions API in gl.
Oh ok. Thanks for the explanation. Well I'm relieved then. Now I'm more motivated to learn it .

Quote:
Originally Posted by HatCat View Post
There is no significant difference other than how much extra optimization is done by the compiler versus by the linker. You could put every function in its own .c file for thousands of functions (thousands of .c files) and the linker could still optimize some code across .obj boundaries if link-time code generation is enabled. The more organized approach of course is to not use so many .c files, but only so many as there are different groups of functions, different modules of supported features of the program. Just think of it as doing a multi-tabbed GUI for this plugin with different GUI options "functions" in each tab ".c source file".
Hmm. So I would have wasted my time testing that out lol. I guess there's not much I can do about functions, besides convert some of them into macros if the compiler isn't inlining an inline function. I hate when I see a function being called, that uses stack parameters instead of registers. I don't mean API functions either. Maybe I should just try using gcc to see how well that compiler does.

Quote:
Originally Posted by GPDP View Post
It's an old build of Mupen64Plus from when it still used the Zilmar plugin spec that I just found. I tried this plugin on it for shits, and it actually works and plays some games that regular Mupen64 does not, such as Rogue Squadron. Too bad the whole thing is probably even more unstable than 1964.
Wow I didn't know the old version was compatible with Zilmar plugin spec. How old is it? That might be what I'm looking for.
Reply With Quote
  #1239  
Old 21st July 2014, 11:19 PM
GPDP GPDP is offline
Senior Member
 
Join Date: May 2013
Posts: 146
Default

The build is from late 2009 if I understand correctly. It's from right before they dropped the Zilmar spec for their own, I believe.

https://onedrive.live.com/?cid=EC92A...47A89073B!5066

It's the one labeled r1416.

Fair warning, though: it's quite prone to crashes, and despite using the Zilmar spec, a lot of plugins do not work or make the emulator crash. I can confirm this plugin as well as HatCat's Static Interpreter RSP plugin both work. Not sure about a working audio plugin, but I honestly didn't try to look for one very hard.
Reply With Quote
  #1240  
Old 21st July 2014, 11:50 PM
zuzma zuzma is offline
Junior Member
 
Join Date: Jan 2013
Posts: 28
Default

It looks like the svn revision that was copied to the mupen64plus github account is 1416 too. So there's source code for it too if anyone cares.

Last edited by zuzma; 22nd July 2014 at 12:01 AM.
Reply With Quote
Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Forum Jump


All times are GMT. The time now is 11:10 AM.


Powered by vBulletin® Version 3.7.3
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.