
#1




(Test Sample) Vector Multiply Fraction
By far the mostexecuted RSP instruction (under any subopcode matrix) for audio tasks on the RSP is VMULF, a rudimentary base for comparison to the basic template I am still working on getting the other op tables to comply to:
Code:
void VMULF(int vd, int vs, int vt, int element) { register int i; if (element == 0x0) /* if (element >> 1 == 00) */ { for (i = 0; i < 8; i++) { register int product = VR[vs].s[i] * VR[vt].s[i]; product <<= 1; /* shift of partial product */ VACC[i].q = product + 0x8000; /* fraction rounding */ } } else if ((element & 0xE) == 02) /* scalar quarter */ // removed for shortness, see (element >> 1 == 0x0) for basic alg. else if ((element & 0xC) == 04) /* scalar half */ // [...] else /* if ((element & 0b1000) == 0b1000) /* scalar whole */ // [...] } for (i = 0; i < 8; i++) { /* Signclamp bits 31..16 of ACC file to destination VR file. */ if (VACC[i].q & 0x800000000000) /* acc < 0 */ { if (~VACC[i].q & ~0x00007FFFFFFF) /* short underflow */ VR[vd].s[i] = 0x8000; else VR[vd].s[i] = (short)(VACC[i].q >> 16); } else { if (VACC[i].q & ~0x00007FFFFFFF) /* short overflow */ VR[vd].s[i] = 0x7FFF; else VR[vd].s[i] = (short)(VACC[i].q >> 16); } } for (i = 0; i < 8; i++) /* 48 bits left by 16 to use high DW sign bit */ VACC[i].q <<= 16; /* for (i = 0; i < 8; i++) VACC[i].q >>= 16; /* reverse zilmar's VACC signextension hack */ return; } Using VMULF as an example interpreter, the basic emulation table structure for each VU op (multiply or not) is classifiable:
__________________
http://theoatmeal.com/comics/cat_vs_internet Last edited by HatCat; 6th February 2013 at 05:06 AM. 
#2




None of that talk is really copyrights, btw.
The opcodes for vector multiplies (VMUL* and VMAC*) are public domain informations discussed in nonSGI vector unit manuals and patents. It is traditional to use the basic operation schematic discussed above. Many other vector systems the * in VMUL* or VMAC* is the "condition" subopcode ("F" meaning "fraction" or "false", for example). What is unique to SGI it seems are VMUD* and VMAD*. In particular, VMAD* is totally undiscussed in other vector unit references (except for references to "multiplyadd" which is inaccurate (we use that term under "accumulation")), while "VMUDz" is usually described as "multiply double" (slightly accurate, but in this case the multiplication is doubleprecision, not the operand quantities).
__________________
http://theoatmeal.com/comics/cat_vs_internet 
#3




God the appendix is so full of bugs.
It keeps saying things like, clamp the leastsignificant accumulator element, while defining clamp masks for 32/48 of the accumulator bits (making it impossible to clamp accurately). It just keeps finding ways to contradict itself. It's incredible how unorganized.... One of the examples of that is VMUDL, but since we have an unsigned 32bit product shifted to the right by 1 16bit halfword, clamping by element is applied in a situation where there is absolutely no chance it can affect the arithmetic result, so emulating the phase is wasteful. And, if you use 32bit clamp masks for the accumulator, then why detect clamping by comparing LT zero (negative), if you only signextend a 16bit short by another 16 bits (described in the appendix but not the tests for the standard simulator)? If the accumulator is 48 bits then it always skips that condition blissfully! This thing is full of shit, but I'll try to adhere to it as much as possible regardless for readability and accuracy.
__________________
http://theoatmeal.com/comics/cat_vs_internet 