|
#131
|
||||
|
||||
![]() Quote:
__________________
My rig: CPU: Intel Core i7 4470 3.4 GHz to 3.9 GHz Video card:: MSI nVidia GTX 970 4 GB GDDR5 OS: Windows 7 Professional 64-bit RAM: 16 GB DDR3 SDRAM 10600 HDD: 2 x Western Digital 1 TB HDDs Monitor: 23" Asus Full HD LED Oh, and Snes9x > Zsnes in every way |
#132
|
||||
|
||||
![]() Code:
#if defined(EXTERN_COMMAND_LIST_GBI) && defined(EXTERN_COMMAND_LIST_ABI) #define L_NAME "Iconoclast's SP Interpreter (HLE)" #elif defined(EXTERN_COMMAND_LIST_GBI) #define L_NAME "Iconoclast's SP Interpreter (MLE)" #elif defined(EXTERN_COMMAND_LIST_ABI) #define L_NAME "Iconoclast's SP Interpreter (LLE)" #elif defined(SEMAPHORE_LOCK_CORRECTIONS) || defined(WAIT_FOR_CPU_HOST) #define L_NAME "Iconoclast's SP Interpreter (PJ642)" #elif defined(SP_EXECUTE_LOG) || defined(VU_EMULATE_SCALAR_ACCUMULATOR_READ) #define L_NAME "Iconoclast's SP Interpreter (debug)" #else #define L_NAME "Iconoclast's SP Interpreter" #endif Code:
@ECHO OFF TITLE MinGW Compiler Suite Invocation CD ..\..\MINGW\BIN\ ECHO Building RSP interpreter... (LLE gfx, LLE audio) gcc --shared -s -O3 -m3dnow -mmmx -msse -msse2 -o ../rsp/rsp.dll ../../rsp/rsp.c ECHO Building RSP interpreter... (LLE gfx, HLE audio) gcc --shared -s -O3 -m3dnow -mmmx -msse -msse2 -DEXTERN_COMMAND_LIST_ABI -o ../rsp/rsp_lle.dll ../../rsp/rsp.c ECHO Building RSP interpreter... (HLE gfx, LLE audio) gcc --shared -s -O3 -m3dnow -mmmx -msse -msse2 -DEXTERN_COMMAND_LIST_GBI -o ../rsp/rsp_mle.dll ../../rsp/rsp.c ECHO Building RSP interpreter... (HLE gfx, HLE audio) gcc --shared -s -O3 -m3dnow -mmmx -msse -msse2 -DEXTERN_COMMAND_LIST_GBI -DEXTERN_COMMAND_LIST_ABI -o ../rsp/rsp_hle.dll ../../rsp/rsp.c ECHO Building RSP interpreter... (need Project64 2.x) gcc --shared -s -O3 -m3dnow -mmmx -msse -msse2 -DSEMAPHORE_LOCK_CORRECTIONS -DWAIT_FOR_CPU_HOST -o ../rsp/rsp_pj64.dll ../../rsp/rsp.c ECHO Building RSP interpreter... (debugging module) gcc --shared -s -O3 -m3dnow -mmmx -msse -msse2 -DSP_EXECUTE_LOG -DVU_EMULATE_SCALAR_ACCUMULATOR_READ -o ../rsp/rsp_dbg.dll ../../rsp/rsp.c PAUSE ![]() The other two files (`dummyvid.dll` and `rsp_free.dll`) are unrelated to the GNU compiler toolchain for the RSP plugin. `rsp_free.dll` is the Microsoft Visual Studio 2010 build (slower than the GCC build ![]() Plus, encouraging the activity around inferior compilers is worth it if it means I can get extra feedback. ![]() |
#133
|
||||
|
||||
![]() Quote:
![]()
__________________
My rig: CPU: Intel Core i7 4470 3.4 GHz to 3.9 GHz Video card:: MSI nVidia GTX 970 4 GB GDDR5 OS: Windows 7 Professional 64-bit RAM: 16 GB DDR3 SDRAM 10600 HDD: 2 x Western Digital 1 TB HDDs Monitor: 23" Asus Full HD LED Oh, and Snes9x > Zsnes in every way |
#134
|
||||
|
||||
![]()
Actually I was replying to your post.
The internal names of the plugins mov'd to the GetDllName buffer when you go to change plugins are listed in the C macro preprocessing template in that first code block I replied to your post with. So those will be the various names of the different DLL versions. Any complaints about those TELL ME NAOOOOOOOOOOOOOOOOOOOO *throws a donkey at a mule*
__________________
http://theoatmeal.com/comics/cat_vs_internet |
#135
|
||||
|
||||
![]() Quote:
![]() No complaints from me.
__________________
My rig: CPU: Intel Core i7 4470 3.4 GHz to 3.9 GHz Video card:: MSI nVidia GTX 970 4 GB GDDR5 OS: Windows 7 Professional 64-bit RAM: 16 GB DDR3 SDRAM 10600 HDD: 2 x Western Digital 1 TB HDDs Monitor: 23" Asus Full HD LED Oh, and Snes9x > Zsnes in every way |
#136
|
|||
|
|||
![]()
I've heard of this happening to other people on github before where their public-facing account/repo vanished. Could you open a support ticket with them?
|
#137
|
||||
|
||||
![]() Quote:
Quote:
But, you gotta admit. It's kind of l33t to have an invisible GitHub account. I'm basically getting private GitHub repositories for free right now. ![]() The plugin is still, after all, open-source, like everything else I have done so far. It happened while I had no Internet access... Around April 30th, Sven at Mupen64Plus updated his branch of my RSP plugin. I suppose that might have been around the time that my GitHub account broke to users who aren't me. Say I want the GitHub issue to go away. How should I report it to the GitHub staff / raise a ticket / whatever?
__________________
http://theoatmeal.com/comics/cat_vs_internet |
#138
|
||||
|
||||
![]()
I don't have much else blog space so am going to keep to this thread.
I gained a 3-6 VI/s speed-up last night by multiplying the DMA transfer into MIPS double-words, instead of 8-bit safe DMA transfers. All the crazy DMA-obsessed games seem to remain stable; this speed-up is safe. Still, I am not quite satisfied, as somewhere there is bound to be another chance at speeding up the vector instructions generalized emulation, namely the vector decode and management. method A -- currently used method Use two-dimensional scalar index look-up table, then split writeback to the accumulator in a separate phase. Code:
#include "vu.h" static void VMADH(int vd, int vs, int vt, int e) { register signed long long product; register int i; for (i = 0; i < 8; i++) { product = VR[vs][i]*VR[vt][ei[e][i]]; VACC[i].DW += product << 16; } SIGNED_CLAMP(vd, 0); return; } Iterate the entire vector-scalar coefficient multiplier using a switch jump table. Code:
#include "vu.h" static void VMADH(int vd, int vs, int vt, int e) { register signed long long product; register int i; switch (e) { case 0x0: case 0x1: for (i = 0; i < 8; i++) VACC[i].DW += (long long)(VR[vs][i]*VR[vt][i]) << 16; break; case 0x2: VACC[00].DW += (long long)(VR[vs][00]*VR[vt][00]) << 16; VACC[01].DW += (long long)(VR[vs][01]*VR[vt][00]) << 16; VACC[02].DW += (long long)(VR[vs][02]*VR[vt][02]) << 16; VACC[03].DW += (long long)(VR[vs][03]*VR[vt][02]) << 16; VACC[04].DW += (long long)(VR[vs][04]*VR[vt][04]) << 16; VACC[05].DW += (long long)(VR[vs][05]*VR[vt][04]) << 16; VACC[06].DW += (long long)(VR[vs][06]*VR[vt][06]) << 16; VACC[07].DW += (long long)(VR[vs][07]*VR[vt][06]) << 16; break; case 0x3: VACC[00].DW += (long long)(VR[vs][00]*VR[vt][01]) << 16; VACC[01].DW += (long long)(VR[vs][01]*VR[vt][01]) << 16; VACC[02].DW += (long long)(VR[vs][02]*VR[vt][03]) << 16; VACC[03].DW += (long long)(VR[vs][03]*VR[vt][03]) << 16; VACC[04].DW += (long long)(VR[vs][04]*VR[vt][05]) << 16; VACC[05].DW += (long long)(VR[vs][05]*VR[vt][05]) << 16; VACC[06].DW += (long long)(VR[vs][06]*VR[vt][07]) << 16; VACC[07].DW += (long long)(VR[vs][07]*VR[vt][07]) << 16; break; case 0x4: VACC[00].DW += (long long)(VR[vs][00]*VR[vt][00]) << 16; VACC[01].DW += (long long)(VR[vs][01]*VR[vt][00]) << 16; VACC[02].DW += (long long)(VR[vs][02]*VR[vt][00]) << 16; VACC[03].DW += (long long)(VR[vs][03]*VR[vt][00]) << 16; VACC[04].DW += (long long)(VR[vs][04]*VR[vt][04]) << 16; VACC[05].DW += (long long)(VR[vs][05]*VR[vt][04]) << 16; VACC[06].DW += (long long)(VR[vs][06]*VR[vt][04]) << 16; VACC[07].DW += (long long)(VR[vs][07]*VR[vt][04]) << 16; break; case 0x5: VACC[00].DW += (long long)(VR[vs][00]*VR[vt][01]) << 16; VACC[01].DW += (long long)(VR[vs][01]*VR[vt][01]) << 16; VACC[02].DW += (long long)(VR[vs][02]*VR[vt][01]) << 16; VACC[03].DW += (long long)(VR[vs][03]*VR[vt][01]) << 16; VACC[04].DW += (long long)(VR[vs][04]*VR[vt][05]) << 16; VACC[05].DW += (long long)(VR[vs][05]*VR[vt][05]) << 16; VACC[06].DW += (long long)(VR[vs][06]*VR[vt][05]) << 16; VACC[07].DW += (long long)(VR[vs][07]*VR[vt][05]) << 16; break; case 0x6: VACC[00].DW += (long long)(VR[vs][00]*VR[vt][02]) << 16; VACC[01].DW += (long long)(VR[vs][01]*VR[vt][02]) << 16; VACC[02].DW += (long long)(VR[vs][02]*VR[vt][02]) << 16; VACC[03].DW += (long long)(VR[vs][03]*VR[vt][02]) << 16; VACC[04].DW += (long long)(VR[vs][04]*VR[vt][06]) << 16; VACC[05].DW += (long long)(VR[vs][05]*VR[vt][06]) << 16; VACC[06].DW += (long long)(VR[vs][06]*VR[vt][06]) << 16; VACC[07].DW += (long long)(VR[vs][07]*VR[vt][06]) << 16; break; case 0x7: VACC[00].DW += (long long)(VR[vs][00]*VR[vt][03]) << 16; VACC[01].DW += (long long)(VR[vs][01]*VR[vt][03]) << 16; VACC[02].DW += (long long)(VR[vs][02]*VR[vt][03]) << 16; VACC[03].DW += (long long)(VR[vs][03]*VR[vt][03]) << 16; VACC[04].DW += (long long)(VR[vs][04]*VR[vt][07]) << 16; VACC[05].DW += (long long)(VR[vs][05]*VR[vt][07]) << 16; VACC[06].DW += (long long)(VR[vs][06]*VR[vt][07]) << 16; VACC[07].DW += (long long)(VR[vs][07]*VR[vt][07]) << 16; break; case 0x8: case 0x9: case 0xA: case 0xB: case 0xC: case 0xD: case 0xE: case 0xF: e &= 07; VACC[00].DW += (long long)(VR[vs][00]*VR[vt][e]) << 16; VACC[01].DW += (long long)(VR[vs][01]*VR[vt][e]) << 16; VACC[02].DW += (long long)(VR[vs][02]*VR[vt][e]) << 16; VACC[03].DW += (long long)(VR[vs][03]*VR[vt][e]) << 16; VACC[04].DW += (long long)(VR[vs][04]*VR[vt][e]) << 16; VACC[05].DW += (long long)(VR[vs][05]*VR[vt][e]) << 16; VACC[06].DW += (long long)(VR[vs][06]*VR[vt][e]) << 16; VACC[07].DW += (long long)(VR[vs][07]*VR[vt][e]) << 16; break; } SIGNED_CLAMP(vd, 0); return; } Do a vector shuffle instruction in pure software (no SSE) to allocate the vector coefficient register, vt, such that loop iterations count 0..7 on all three accesses to the vector register file. Code:
static short VR[32][8]; static short VC[8]; /* vector/scalar coefficient */ int sub_mask[16] = { 0x0, 0x0, 0x1, 0x1, 0x3, 0x3, 0x3, 0x3, 0x7, 0x7, 0x7, 0x7, 0x7, 0x7, 0x7, 0x7 }; inline void SHUFFLE_VECTOR(int vt, int e) { register int i, j; j = sub_mask[e]; e = j ^ 07; for (i = 0; i < 8; i++) VC[i] = VR[vt][(i & e) | j]; /* if (e & 0x8) for (i = 0; i < 8; i++) VC[i] = VR[vt][(i & 00) | (e &= 0x7)]; else if (e & 0x4) for (i = 0; i < 8; i++) VC[i] = VR[vt][(i & 04) | (e & 0x3)]; else if (e & 0x2) for (i = 0; i < 8; i++) VC[i] = VR[vt][(i & 06) | (e & 0x1)]; else // e == 0b0000 || e == 0b0001 for (i = 0; i < 8; i++) VC[i] = VR[vt][(i & 07) | (e & 0x0)]; */ return; } [... example vector instruction] void VAND(int vd, int vs, int vt, int e) { register int i; for (i = 0; i < 8; i++) VR[vd][i] = VR[vs][i] & VC[i]; for (i = 0; i < 8; i++) VACC[i].s[LO] = VR[vd][i]; return; } Chiefly, it enables singularly accessing between the three instances of the vector register file. `VR[vd] = VR[vs] * VR[vt];` The previous two methods would conflict a premature source overwrite due to special scalar element decoders, in which case the accumulator needed writing to first. I'll see about blending this in to the RSP core methodology before I do the next release with the fixes you guys helped with.
__________________
http://theoatmeal.com/comics/cat_vs_internet Last edited by HatCat; 8th May 2013 at 12:27 AM. |
#139
|
||||
|
||||
![]()
Did anyone else except Ziggy and Jabo have written a LLE graphics plugin?
__________________
--------------------- CPU: Intel U7300 1.3 GHz GPU: Mobile Intel 4 Series (on board) AUDIO: Realtek HD Audio (on board) RAM: 4 GB OS: Windows 7 - 32 bit |
#140
|
||||
|
||||
![]()
What is the difference between LLE Plugin and Software Plugin?
|