 Project64 Forums (Test Sample) Vector Multiply Fraction
 FAQ Members List Calendar Search Today's Posts Mark Forums Read

#1 HatCat Alpha Tester Project Supporter Senior Member Join Date: Feb 2007 Location: In my hat. Posts: 16,236 (Test Sample) Vector Multiply Fraction

By far the most-executed RSP instruction (under any sub-op-code matrix) for audio tasks on the RSP is VMULF, a rudimentary base for comparison to the basic template I am still working on getting the other op tables to comply to:

Code:
```void VMULF(int vd, int vs, int vt, int element)
{
register int i;

if (element == 0x0) /* if (element >> 1 == 00) */
{
for (i = 0; i < 8; i++)
{
register int product = VR[vs].s[i] * VR[vt].s[i];

product <<= 1; /* shift of partial product */
VACC[i].q = product + 0x8000; /* fraction rounding */
}
}
else if ((element & 0xE) == 02) /* scalar quarter */
// removed for shortness, see (element >> 1 == 0x0) for basic alg.
else if ((element & 0xC) == 04) /* scalar half */
// [...]
else /* if ((element & 0b1000) == 0b1000) /* scalar whole */
// [...]
}

for (i = 0; i < 8; i++)
{ /* Sign-clamp bits 31..16 of ACC file to destination VR file. */
if (VACC[i].q & 0x800000000000) /* acc < 0 */
{
if (~VACC[i].q & ~0x00007FFFFFFF) /* short underflow */
VR[vd].s[i] = 0x8000;
else
VR[vd].s[i] = (short)(VACC[i].q >> 16);
}
else
{
if (VACC[i].q & ~0x00007FFFFFFF) /* short overflow */
VR[vd].s[i] = 0x7FFF;
else
VR[vd].s[i] = (short)(VACC[i].q >> 16);
}
}

for (i = 0; i < 8; i++) /* 48 bits left by 16 to use high DW sign bit */
VACC[i].q <<= 16; /*
for (i = 0; i < 8; i++)
VACC[i].q >>= 16; /* reverse zilmar's VACC sign-extension hack */
return;
}```
At the moment I'm in the process of rewriting all of the VU ops (currently the multiplies; adds are all finished) for smaller function block size (don't use a switch on element with 16 case values, use a natural if-else chain for 4 basic element codes), updated accuracy (before it was my revisions of the MAME source, now it is entire rewrites off of the standard, suggested algorithm of each vector op in the Ultra64 informations), and possibly slower code (but, shorter, and more likely to let the compiler choose to expand each categorized loop and, therefore incidentally faster and more accurate, not slower!).

Using VMULF as an example interpreter, the basic emulation table structure for each VU op (multiply or not) is classifiable:
1. Use an if-else chain to solve for the element encoding type. element is either == 0, between 2 and 3 (quarter intervals), between 4 and 7 (half overlays), or greater than 7 (single-element "broadcast mode" as defined in other, public domain VU manuals by non-SGI vendors).
2. Do a `for (i = 0; i < N_elements_in_SIMD; i += 1)` per each source element, loaded first to the vector accumulator file of acc. elements. If it is a vector multiply instruction `(opcode < 16)`, divide the opcode bits to determine whether to round and mov (multiply fraction), or += the accumulator (multiply-accumulate VMAC*) with no round.
3. You typically need to find whether saturated arithmetic (where applied) is conducted on bits 0..47, 16..47, or 32..47 of each accumulator element, to [un-]signed "clamp" the final result over to the destination VR file. In some operations, outputs read to both register files may be safely assumed equal.
4. Last, update the accumulator elements file. You either need to store the 48-bit accumulators as a LO subset of a 64-bit register in C, or use zilmar's technique and shift 0..47 to the left by 16 over to 16..63 and let the C register initializations handle sign-extension for you. The former method is more accurate (not to mention faster I believe).
When I finish templating all the other VU ops to have the behavior in step 4 I will have a much easier time taking out the accumulator hack that shifts them all to the left 16 without breaking a crap load of things.

Last edited by HatCat; 6th February 2013 at 05:06 AM.
#2 HatCat Alpha Tester Project Supporter Senior Member Join Date: Feb 2007 Location: In my hat. Posts: 16,236 None of that talk is really copyrights, btw.

The op-codes for vector multiplies (VMUL* and VMAC*) are public domain informations discussed in non-SGI vector unit manuals and patents. It is traditional to use the basic operation schematic discussed above.

Many other vector systems the * in VMUL* or VMAC* is the "condition" sub-op-code ("F" meaning "fraction" or "false", for example).

What is unique to SGI it seems are VMUD* and VMAD*. In particular, VMAD* is totally undiscussed in other vector unit references (except for references to "multiply-add" which is inaccurate (we use that term under "accumulation")), while "VMUDz" is usually described as "multiply double" (slightly accurate, but in this case the multiplication is double-precision, not the operand quantities).
#3 HatCat Alpha Tester Project Supporter Senior Member Join Date: Feb 2007 Location: In my hat. Posts: 16,236 God the appendix is so full of bugs.

It keeps saying things like, clamp the least-significant accumulator element, while defining clamp masks for 32/48 of the accumulator bits (making it impossible to clamp accurately). It just keeps finding ways to contradict itself. It's incredible how unorganized....

One of the examples of that is VMUDL, but since we have an unsigned 32-bit product shifted to the right by 1 16-bit halfword, clamping by element is applied in a situation where there is absolutely no chance it can affect the arithmetic result, so emulating the phase is wasteful.

And, if you use 32-bit clamp masks for the accumulator, then why detect clamping by comparing LT zero (negative), if you only sign-extend a 16-bit short by another 16 bits (described in the appendix but not the tests for the standard simulator)? If the accumulator is 48 bits then it always skips that condition blissfully!

This thing is full of shit, but I'll try to adhere to it as much as possible regardless for readability and accuracy.

 Thread Tools Show Printable Version Email this Page Display Modes Linear Mode Switch to Hybrid Mode Switch to Threaded Mode Posting Rules You may not post new threads You may not post replies You may not post attachments You may not edit your posts BB code is On Smilies are On [IMG] code is On HTML code is Off
 Forum Jump User Control Panel Private Messages Subscriptions Who's Online Search Forums Forums Home General Discussion     Site News     Open Discussion Public Version     Project 64 - v2.x - Suggestions     Project 64 - v2.x - Issues     Project64 - Android     Project 64 - v1.6

All times are GMT. The time now is 05:13 PM. Contact Us - Archive - Top