Go Back   Project64 Forums > General Discussion > Open Discussion

Reply
 
Thread Tools Display Modes
  #11  
Old 17th January 2015, 04:03 PM
HatCat's Avatar
HatCat HatCat is offline
Alpha Tester
Project Supporter
Senior Member
 
Join Date: Feb 2007
Location: In my hat.
Posts: 16,256
Default

Quote:
Originally Posted by Tarek701 View Post
I'm kinda confused now on the elements. I took now the example VSUB;

VSUB v10, v14, v13[0]

010010 1 0000 01101 01110 01010 010001
{COP2} {El} {VS} {VT} {VD} {VSUB}

That's what I got now. I have no idea what that "1" means (maybe a flag, like carry flag?)
The "1" is just part of the instruction decode.
If you've seen R4300i COP0 decoding you may have seen an op matrix like this one I wrote in 2009:
https://dl.dropboxusercontent.com/u/...---------.html

Here you'll see that COP0 with (rs >= 10000_2) is "C0". It's a special coprocessor operation on main n64 CP0 (the system control coprocessor). Otherwise, it's a generic COP0 instruction such as MFC0 or MTC0.

On the n64 RCP, the RSP has an analogous "C2" opcode matrix for when this condition is true for COP2--Michael Tedder calls this "VECTOP" in his Project Unreality source.

Quote:
Originally Posted by Tarek701 View Post
From a syntactic point of view, you meant to make CajeASM assemble RSP asm elements like in hex values or octal values?:
VSUB v10, v14, v13[0x1]
VSUB v10, v14, v13[0xA]
VSUB v10, v14, v13[0xB]
VSUB v10, v14, v13[17] // octal
etc.
No my octal explanation wasn't really the best.

I mean that the only valid element specifiers permit only these ways of writing that VSUB example you gave:
Code:
VSUB    $v10, $v14, $v13 // vector operand mode
VSUB    $v10, $v14, $v13[0q] // scalar quarter broadcasting
VSUB    $v10, $v14, $v13[1q] // scalar quarters, alternating odds
VSUB    $v10, $v14, $v13[0h] // scalar halves broadcasting
VSUB    $v10, $v14, $v13[1h]
VSUB    $v10, $v14, $v13[2h]
VSUB    $v10, $v14, $v13[3h]
VSUB    $v10, $v14, $v13[0] // scalar whole broadcasting
VSUB    $v10, $v14, $v13[1]
VSUB    $v10, $v14, $v13[2]
VSUB    $v10, $v14, $v13[3]
VSUB    $v10, $v14, $v13[4]
VSUB    $v10, $v14, $v13[5]
VSUB    $v10, $v14, $v13[6]
VSUB    $v10, $v14, $v13[7]
$v13[8], $v13[9], etc. ... $v13[15] are all invalid. IDA Pro just seems to give the 4-bit number in plain decimal form, but this value isn't accepted by RSP assembler.

The last 8 lines, where it says $v13[0:7]...some vector units write this as 0w, 1w, 2w, 3w, 4w, 5w, 6w, 7w ... but on the RSP's particular VU it's written without the "w". We just say an octal digit plain and simple.
Reply With Quote
  #12  
Old 17th January 2015, 04:58 PM
Tarek701's Avatar
Tarek701 Tarek701 is offline
Member
 
Join Date: Mar 2009
Posts: 58
Default

Quote:
Originally Posted by HatCat View Post
The "1" is just part of the instruction decode.
If you've seen R4300i COP0 decoding you may have seen an op matrix like this one I wrote in 2009:
https://dl.dropboxusercontent.com/u/...---------.html

Here you'll see that COP0 with (rs >= 10000_2) is "C0". It's a special coprocessor operation on main n64 CP0 (the system control coprocessor). Otherwise, it's a generic COP0 instruction such as MFC0 or MTC0.

On the n64 RCP, the RSP has an analogous "C2" opcode matrix for when this condition is true for COP2--Michael Tedder calls this "VECTOP" in his Project Unreality source.



No my octal explanation wasn't really the best.

I mean that the only valid element specifiers permit only these ways of writing that VSUB example you gave:
Code:
VSUB    $v10, $v14, $v13 // vector operand mode
VSUB    $v10, $v14, $v13[0q] // scalar quarter broadcasting
VSUB    $v10, $v14, $v13[1q] // scalar quarters, alternating odds
VSUB    $v10, $v14, $v13[0h] // scalar halves broadcasting
VSUB    $v10, $v14, $v13[1h]
VSUB    $v10, $v14, $v13[2h]
VSUB    $v10, $v14, $v13[3h]
VSUB    $v10, $v14, $v13[0] // scalar whole broadcasting
VSUB    $v10, $v14, $v13[1]
VSUB    $v10, $v14, $v13[2]
VSUB    $v10, $v14, $v13[3]
VSUB    $v10, $v14, $v13[4]
VSUB    $v10, $v14, $v13[5]
VSUB    $v10, $v14, $v13[6]
VSUB    $v10, $v14, $v13[7]
$v13[8], $v13[9], etc. ... $v13[15] are all invalid. IDA Pro just seems to give the 4-bit number in plain decimal form, but this value isn't accepted by RSP assembler.

The last 8 lines, where it says $v13[0:7]...some vector units write this as 0w, 1w, 2w, 3w, 4w, 5w, 6w, 7w ... but on the RSP's particular VU it's written without the "w". We just say an octal digit plain and simple.
Wow, really useful information. Thanks. I guess I should learn something about vectors and scalars because I really have no idea of broadcasting, scalar and vectors in general.

I'm going to implement the way you told me.
__________________
==========================
Familiar with MIPS r4300i ASM, Basic stuff in C.
Reply With Quote
  #13  
Old 17th January 2015, 05:57 PM
HatCat's Avatar
HatCat HatCat is offline
Alpha Tester
Project Supporter
Senior Member
 
Join Date: Feb 2007
Location: In my hat.
Posts: 16,256
Default

I could try to go into detail, but the explanation is probably best within the scope of RSP interpreter source.

Here:
https://github.com/project64/project...RSP/Cpu.c#L102

The `EleSpec` array in Project64 stores 16 possible values for the 4-bit element mask in the IW you decoded as:
010010 1 ???? 01101 01110 01010 010001

Since the 1 bit to the left should always be 1, we really have 16 values for ????. They're all listed in $project64/Source/RSP/CPU.c, but explaining the shuffling algorithm without something hard to read like look-up tables is difficult.

Generally speaking, if ???? = 0000_2, then there is no shuffling.

Given vector register file VR[][], let VD = VR[vd], VS = VR[vs], VT = VR[vt].
Code:
for (unsigned int i = 0; i < 8; i++)
    VD[i] = VS[i] <operation> VT[i]; /* Simple, normal vector SIMD. */
If ???? = 0010_2 or 0011_2, it operates in scalar quarters with this broadcasting:
Code:
for (unsigned int i = 0; i < 8; i++)
    VD[i] = VS[i] <operation> VT[i & ~01]; // 0q, or 0010_2

for (unsigned int i = 0; i < 8; i++)
    VD[i] = VS[i] <operation> VT[i |  01]; // 1q, or 0010_w
If ???? = 0100_2, 0101_2, 0110_2, or 0111_2 (0h, 1h, 2h, 3h) it broadcasts in scalar halves:
Code:
for (unsigned int i = 0; i < 8; i++)
    VD[i] = VS[i] <operation> VT[(i & 04) | 0]; // 0h
for (unsigned int i = 0; i < 8; i++)
    VD[i] = VS[i] <operation> VT[(i & 04) | 1]; // 1h
for (unsigned int i = 0; i < 8; i++)
    VD[i] = VS[i] <operation> VT[(i & 04) | 2]; // 2h
for (unsigned int i = 0; i < 8; i++)
    VD[i] = VS[i] <operation> VT[(i & 04) | 3]; // 3h
Finally, if ????_2 = 1???_2, we are doing SIMD in what some vector unit patents (none by SGI) refer to as "broadcast mode". (I think AVX for x86 calls it that too.) RSP this is known as "scalar whole".

Code:
for (unsigned int i = 0; i < 8; i++)
    VD[i] = VS[i] <operation> VT[0]; // 0w (or just "0" in RSP asm)
for (unsigned int i = 0; i < 8; i++)
    VD[i] = VS[i] <operation> VT[1]; // 1w
for (unsigned int i = 0; i < 8; i++)
    VD[i] = VS[i] <operation> VT[2]; // 2w
for (unsigned int i = 0; i < 8; i++)
    VD[i] = VS[i] <operation> VT[3]; // 3w
for (unsigned int i = 0; i < 8; i++)
    VD[i] = VS[i] <operation> VT[4]; // 4w
for (unsigned int i = 0; i < 8; i++)
    VD[i] = VS[i] <operation> VT[5]; // 5w
for (unsigned int i = 0; i < 8; i++)
    VD[i] = VS[i] <operation> VT[6]; // 6w
for (unsigned int i = 0; i < 8; i++)
    VD[i] = VS[i] <operation> VT[7]; // 7w
So the last one should be pretty easy to read and understand--all it's really doing is operating in MIPS-based machine code rd = rs (op) rt, where all the vector elements of rt really point to one single, scalar element read.

Would have linked to RSP source to explain this for me, but mine isn't the best and zilmar's has no explanation. Before then the only way to know this kind of stuff was bpoint's notes in Project Unreality so that attempts at reverse-engineering could be done less hazardously--that was the extent of the information leaked to him.
Reply With Quote
  #14  
Old 18th January 2015, 12:24 PM
Tarek701's Avatar
Tarek701 Tarek701 is offline
Member
 
Join Date: Mar 2009
Posts: 58
Default

Quote:
Originally Posted by HatCat View Post
I could try to go into detail, but the explanation is probably best within the scope of RSP interpreter source.

Here:
https://github.com/project64/project...RSP/Cpu.c#L102

The `EleSpec` array in Project64 stores 16 possible values for the 4-bit element mask in the IW you decoded as:
010010 1 ???? 01101 01110 01010 010001

Since the 1 bit to the left should always be 1, we really have 16 values for ????. They're all listed in $project64/Source/RSP/CPU.c, but explaining the shuffling algorithm without something hard to read like look-up tables is difficult.

Generally speaking, if ???? = 0000_2, then there is no shuffling.

Given vector register file VR[][], let VD = VR[vd], VS = VR[vs], VT = VR[vt].
Code:
for (unsigned int i = 0; i < 8; i++)
    VD[i] = VS[i] <operation> VT[i]; /* Simple, normal vector SIMD. */
If ???? = 0010_2 or 0011_2, it operates in scalar quarters with this broadcasting:
Code:
for (unsigned int i = 0; i < 8; i++)
    VD[i] = VS[i] <operation> VT[i & ~01]; // 0q, or 0010_2

for (unsigned int i = 0; i < 8; i++)
    VD[i] = VS[i] <operation> VT[i |  01]; // 1q, or 0010_w
If ???? = 0100_2, 0101_2, 0110_2, or 0111_2 (0h, 1h, 2h, 3h) it broadcasts in scalar halves:
Code:
for (unsigned int i = 0; i < 8; i++)
    VD[i] = VS[i] <operation> VT[(i & 04) | 0]; // 0h
for (unsigned int i = 0; i < 8; i++)
    VD[i] = VS[i] <operation> VT[(i & 04) | 1]; // 1h
for (unsigned int i = 0; i < 8; i++)
    VD[i] = VS[i] <operation> VT[(i & 04) | 2]; // 2h
for (unsigned int i = 0; i < 8; i++)
    VD[i] = VS[i] <operation> VT[(i & 04) | 3]; // 3h
Finally, if ????_2 = 1???_2, we are doing SIMD in what some vector unit patents (none by SGI) refer to as "broadcast mode". (I think AVX for x86 calls it that too.) RSP this is known as "scalar whole".

Code:
for (unsigned int i = 0; i < 8; i++)
    VD[i] = VS[i] <operation> VT[0]; // 0w (or just "0" in RSP asm)
for (unsigned int i = 0; i < 8; i++)
    VD[i] = VS[i] <operation> VT[1]; // 1w
for (unsigned int i = 0; i < 8; i++)
    VD[i] = VS[i] <operation> VT[2]; // 2w
for (unsigned int i = 0; i < 8; i++)
    VD[i] = VS[i] <operation> VT[3]; // 3w
for (unsigned int i = 0; i < 8; i++)
    VD[i] = VS[i] <operation> VT[4]; // 4w
for (unsigned int i = 0; i < 8; i++)
    VD[i] = VS[i] <operation> VT[5]; // 5w
for (unsigned int i = 0; i < 8; i++)
    VD[i] = VS[i] <operation> VT[6]; // 6w
for (unsigned int i = 0; i < 8; i++)
    VD[i] = VS[i] <operation> VT[7]; // 7w
So the last one should be pretty easy to read and understand--all it's really doing is operating in MIPS-based machine code rd = rs (op) rt, where all the vector elements of rt really point to one single, scalar element read.

Would have linked to RSP source to explain this for me, but mine isn't the best and zilmar's has no explanation. Before then the only way to know this kind of stuff was bpoint's notes in Project Unreality so that attempts at reverse-engineering could be done less hazardously--that was the extent of the information leaked to him.
Ah, I seem to understand it better now. So, I do an example to get a better illustration of it, if I now write following:

VADD v1, v2, v3[2h]

The above would now add the elements of v2 to all 2nd scalar halves of v3 and store it to v1, right?. So, like:
4 5 4 5 4 5 4 5
v2[0] + v3[4]
v2[1] + v3[5]
v2[2] + v3[4]
v2[3] + v3[5]
v2[4] + v3[4]
v2[5] + v3[5]
v2[6] + v3[4]
v2[7] + v3[5]
__________________
==========================
Familiar with MIPS r4300i ASM, Basic stuff in C.

Last edited by Tarek701; 18th January 2015 at 12:26 PM.
Reply With Quote
  #15  
Old 18th January 2015, 03:04 PM
HatCat's Avatar
HatCat HatCat is offline
Alpha Tester
Project Supporter
Senior Member
 
Join Date: Feb 2007
Location: In my hat.
Posts: 16,256
Default

Quote:
Originally Posted by Tarek701 View Post
VADD v1, v2, v3[2h]

The above would now add the elements of v2 to all 2nd scalar halves of v3 and store it to v1, right?. So, like:
4 5 4 5 4 5 4 5
v2[0] + v3[4]
v2[1] + v3[5]
v2[2] + v3[4]
v2[3] + v3[5]
v2[4] + v3[4]
v2[5] + v3[5]
v2[6] + v3[4]
v2[7] + v3[5]
Close, though it's a little different.
In 2h's case I remember thinking something like--
Code:
#define N       8

typedef int16_t         element;
typedef element*        p_elements;
typedef p_elements      vector;

vector VU_execute_2h(vector VS, vector VT)
{
    vector VD;
    register unsigned int i;

    VD = malloc(N * sizeof(element));
    VU_function = RSP_vadd;

    for (i = 0; i < N; i++)
        VD[i] = VU_function(VS[i], VT[(i & ~3) | 2]);
    return (VD);
}
So actually it would be like:
Code:
v1[0] = v2[0] + v3[2]
v1[1] = v2[1] + v3[2]
v1[2] = v2[2] + v3[2]
v1[3] = v2[3] + v3[2]
v1[4] = v2[4] + v3[6]
v1[5] = v2[5] + v3[6]
v1[6] = v2[6] + v3[6]
v1[7] = v2[7] + v3[6]
[Note that, if v1 and v3 co-share the same register file, then the above iterations must complete in simultaneous parallel--otherwise the result is defined out of arbitrary order.]

Last edited by HatCat; 18th January 2015 at 03:23 PM.
Reply With Quote
  #16  
Old 18th January 2015, 03:21 PM
Tarek701's Avatar
Tarek701 Tarek701 is offline
Member
 
Join Date: Mar 2009
Posts: 58
Default

Quote:
Originally Posted by HatCat View Post
Close, though it's a little different.
In 2h's case I remember thinking something like--
Code:
#define N       8

typedef element*        p_elements;
typedef p_elements      vector;

vector VU_execute_2h(vector VS, vector VT)
{
    vector VD;
    register unsigned int i;

    VD = malloc(N * sizeof(element));
    VU_function = RSP_vadd;

    for (i = 0; i < N; i++)
        VD[i] = VU_function(VS[i], VT[(i & ~3) | 2]);
    return (VD);
}
So actually it would be like:
Code:
v1[0] = v2[0] + v3[2]
v1[1] = v2[1] + v3[2]
v1[2] = v2[2] + v3[2]
v1[3] = v2[3] + v3[2]
v1[4] = v2[4] + v3[6]
v1[5] = v2[5] + v3[6]
v1[6] = v2[6] + v3[6]
v1[7] = v2[7] + v3[6]
[Note that, assuming either v1 or v2, or v1 and v3, co-share the same register file, then the above iterations must complete in simultaneous parallel--otherwise the result is defined out of arbitrary order.]
Mh, I see. Well, I just took the "4 5 4 5 4 5 4 5" out of anarko's docs. Well, he said himself that some stuff wasn't accurate in there. So, it's

2 2 2 2 6 6 6 6

Ok. I think I better follow the information of the RSP/cpu.c next time, heh.
__________________
==========================
Familiar with MIPS r4300i ASM, Basic stuff in C.
Reply With Quote
  #17  
Old 18th January 2015, 05:29 PM
HatCat's Avatar
HatCat HatCat is offline
Alpha Tester
Project Supporter
Senior Member
 
Join Date: Feb 2007
Location: In my hat.
Posts: 16,256
Default

ya that was the old way of teaching it.

anarko's n64ops was mostly just a prototype before Project64 came to fruition, which has most of the final corrections.

It came from $ProjectUnreality/RCP/RSPINFO.TXT, though I have no idea how bpoint came up with that for 0q/1q/nh as it was clearly a misinterpretation of what was leaked. His reverse-engineering of the RSP was not exactly thorough--it covered some interesting unknowns on about the level that krom does on #n64dev. But it was never correct enough to play any games, just boot the intros to some N64 ROMs that never even used the RSP, like Namco Museum or the Waverace 64 N logo intro screen.
Reply With Quote
  #18  
Old 19th January 2015, 01:40 AM
Tarek701's Avatar
Tarek701 Tarek701 is offline
Member
 
Join Date: Mar 2009
Posts: 58
Default

Thank you very much, HatCat. I implemented the element scalar modes (quarter, halves, wholes and normal, register-operative) to my thing. I'm testing out the first instructions.



A small test I made here. It seems to work.

EDIT:
A bit more:
__________________
==========================
Familiar with MIPS r4300i ASM, Basic stuff in C.

Last edited by Tarek701; 19th January 2015 at 01:44 AM.
Reply With Quote
  #19  
Old 19th January 2015, 02:18 AM
HatCat's Avatar
HatCat HatCat is offline
Alpha Tester
Project Supporter
Senior Member
 
Join Date: Feb 2007
Location: In my hat.
Posts: 16,256
Default

Nice! Looks great. In fact the only thing I could suggest is that the 'w' in 0:7[w] be optional, since strictly speaking RSP assembly language forbids it and requires just the plain digit itself. (I imagine any Nemu64 feature to handle this element of RSP debugging probably would not log the 'w' part anyway.)
Reply With Quote
  #20  
Old 19th January 2015, 02:21 AM
Tarek701's Avatar
Tarek701 Tarek701 is offline
Member
 
Join Date: Mar 2009
Posts: 58
Default

Quote:
Originally Posted by HatCat View Post
Nice! Looks great. In fact the only thing I could suggest is that the 'w' in 0:7[w] be optional, since strictly speaking RSP assembly language forbids it and requires just the plain digit itself. (I imagine any Nemu64 feature to handle this element of RSP debugging probably would not log the 'w' part anyway.)
No worries. I allowed both options. It can be either written as [2] or [2w]. I just did it here to be very exact and detailed.

EDIT:
And yeah, Nemu64 logs it without the 'w' keyword. Just tested it.
__________________
==========================
Familiar with MIPS r4300i ASM, Basic stuff in C.

Last edited by Tarek701; 19th January 2015 at 02:25 AM.
Reply With Quote
Reply

Tags
assembler, mips, r4300i, sm64, tarek701

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Forum Jump


All times are GMT. The time now is 05:42 AM.


Powered by vBulletin® Version 3.7.3
Copyright ©2000 - 2019, Jelsoft Enterprises Ltd.