|
#191
|
||||
|
||||
![]()
__________________
http://theoatmeal.com/comics/cat_vs_internet |
#192
|
||||
|
||||
![]() Quote:
![]() I guess I wasn't clear enough ![]() I meant to say "if you need any testers, just PM me..."
__________________
My rig: CPU: Intel Core i7 4470 3.4 GHz to 3.9 GHz Video card:: MSI nVidia GTX 970 4 GB GDDR5 OS: Windows 7 Professional 64-bit RAM: 16 GB DDR3 SDRAM 10600 HDD: 2 x Western Digital 1 TB HDDs Monitor: 23" Asus Full HD LED Oh, and Snes9x > Zsnes in every way |
#193
|
||||
|
||||
![]()
Hmmm so anyway as I was saying.
Splitting the scalar software emulation loop into individual loops saves some instruction space and is half of the time (Vector Logical, Vector Divide/Multiply) faster (except for some possible hybrids of having to re-pop slice segments off the vector stack using the external loop), half of the time slower (Vector Select, Vector Add). Vector Select Less Than illustrates a key problem with attempting to vectorize combined loops into parallel transfers, particularly when such compiler predictions are unsuccessful: Code:
#include "vu.h" static void VLT(int vd, int vs, int vt, int e) { int lt; /* less than, or if (CARRY && NOTEQUAL), equal */ register int i; VCC = 0x0000; for (i = 0; i < 8; i++) { const signed short VS = VR[vs][i]; const signed short VT = VR_T(i); lt = (VS == VT); lt &= (VCO >> 0) & (unsigned char)(VCO >> 8) & 0x01; lt |= (VS < VT); VCC |= lt <<= i; VCO >>= 1; /* We need to clear this entire register anyway. */ } for (i = 0; i < 8; i++) ACC_R(i) = VCC & (1 << i) ? VR[vs][i] : VR_T(i); for (i = 0; i < 8; i++) ACC_W(i) = ACC_R(i); VCO = 0x0000; return; } It was better kept as a single loop: Code:
static void VLT(int vd, int vs, int vt, int e) { int lt; /* less than, or if (CARRY && NOTEQUAL), equal */ register int i; VCC = 0x0000; for (i = 0; i < 8; i++) { const signed short VS = VR[vs][i]; const signed short VT = VR_T(i); lt = (VS == VT); lt &= (VCO >> 0) & (unsigned char)(VCO >> 8) & 0x01; // or, if stdlib.h, `... & _byteswap_ushort(VCO) & 0x01` lt |= (VS < VT); VCC |= lt <<= i; ACC_R(i) = lt ? VS : VT; VCO >>= 1; /* We need to clear this entire register anyway. */ } for (i = 0; i < 8; i++) ACC_W(i) = ACC_R(i); VCO = 0x0000; return; } Quote:
I'm still working my ass off on this thing. ![]() If you're impatient that I haven't released an update yet, no worries, you're not alone. I am, too. I would recommend trying to play through Resident Evil 2 completely, but it's not playable on Jabo's LLE due to incomplete Direct3D-software communications and also not on z64gl due to a tremendous frame buffer drawing obstructing the view of the game.
__________________
http://theoatmeal.com/comics/cat_vs_internet Last edited by HatCat; 12th May 2013 at 08:35 PM. |
#194
|
||||
|
||||
![]() Quote:
__________________
My rig: CPU: Intel Core i7 4470 3.4 GHz to 3.9 GHz Video card:: MSI nVidia GTX 970 4 GB GDDR5 OS: Windows 7 Professional 64-bit RAM: 16 GB DDR3 SDRAM 10600 HDD: 2 x Western Digital 1 TB HDDs Monitor: 23" Asus Full HD LED Oh, and Snes9x > Zsnes in every way |
#195
|
||||
|
||||
![]()
To be honest I would somewhat rather people wait for my next release.
It has a couple new things that may need testing. That info won't be available when putting those efforts to test for only the current build. But, at the same time would rather not cause anyone to have to wait.
__________________
http://theoatmeal.com/comics/cat_vs_internet |
#196
|
||||
|
||||
![]() Quote:
![]()
__________________
My rig: CPU: Intel Core i7 4470 3.4 GHz to 3.9 GHz Video card:: MSI nVidia GTX 970 4 GB GDDR5 OS: Windows 7 Professional 64-bit RAM: 16 GB DDR3 SDRAM 10600 HDD: 2 x Western Digital 1 TB HDDs Monitor: 23" Asus Full HD LED Oh, and Snes9x > Zsnes in every way |
#197
|
||||
|
||||
![]()
Two questions about the idea you mentioned before to use LLE graphics as the the rendering engine for HLE graphics emulation:
(1) how to define the interface between HLE and LLE? (2) what if add LLE RSP as the graphics task interpreter to solve the customized ucode problem, then again how to define the interface.
__________________
--------------------- CPU: Intel U7300 1.3 GHz GPU: Mobile Intel 4 Series (on board) AUDIO: Realtek HD Audio (on board) RAM: 4 GB OS: Windows 7 - 32 bit |
#198
|
||||
|
||||
![]() Quote:
Could you point me to where I said that? I know I had been thinking about it, just don't remember posting it. Excluding some basic HLE of RSP tasks from inside the RSP plugin internally I have no experience with HLE of the RCP, so I have little insight on how to interpret your meaning there. It should be able to use all the same structures that LLE does, except you are pre-allocating the recompiler for the RSP. The RSP interpreter plugin will be the simple LLE plugin. The RSP HLE plugin will "call" the RSP functions as inline macros, sort of, which will pull them into the ucode simulation process and have the compiler "re-compile" all the code statically for you, which is much stronger than writing a dynamic recompiler. But, I never tried it, just been more fascinated with optimizing LLE. Hope I partially answered your question somehow. ![]() Quote:
It would be useful also to add pre-compilers for the other custom graphics microcodes (Star Wars and such), but I don't find the endeavor of the utmost interest since such ucodes are nonstandard and rarely used. It would be better to worry about the hardware than the software; hence LLE usually does the trick.
__________________
http://theoatmeal.com/comics/cat_vs_internet |
#199
|
||||
|
||||
![]()
lulz, looks like I might have had a bug in VCH:
Code:
... static void VCH(int vd, int vs, int vt, int e) { int ge, le, neq; register int i; VCO = 0x0000; VCC = 0x0000; VCE = 0x00; for (i = 0; i < 8; i++) { const signed short VS = VR[vs][i]; const signed short VT = VR_T(i); const int sn = (VS ^ VT) < 0; /* sn = (unsigned short)(VS ^ VT) >> 15 */ if (sn) { ge = (VT < 0); le = (VS + VT <= 0); neq = (VS + VT == -1); /* compare extension */ VCE |= neq << i; neq ^= !(VS + VT == 0); ACC_R(i) = le ? -VT : VS; VCC |= (ge << (i + 0x8)) | (le << (i + 0x0)); VCO |= (neq << (i + 0x8)) | (0x0001 << i); } else { ... I guess I thought I was doing it faster by saying neq = (VS + VT == -1) ^ !(VS + VT == 0), when it should really just be: neq = !(VS + VT == -1) & !(VS + VT == 0) or, alternatively, neq = !((VS + VT == -1) | (VS + VT == 0)) Doing the deskwork in C++ Boolean math: Code:
extern bool f(x, y); extern bool g(x, y); extern bool h(x, y); // #define x (VS + VT == -1) // #define y (VS + VT == 0) bool f(x, y) { return (!x & !y); // correct } bool g(x, y) { return !(x | y); // also correct } bool h(x, y) { return (x ^ !y); // NOT correct } Code:
f(x, y) :: !0 & !0 == 1 & 1 == true !0 & !1 == 1 & 0 == false !1 & !0 == 0 & 1 == false !1 & !1 == 0 & 0 == false g(x, y) :: !(0 | 0) == !0 == true !(0 | 1) == !1 == false !(1 | 0) == !1 == false !(1 | 1) == !1 == false h(x, y) :: 0 ^ !0 == 0 ^ 1 == true 0 ^ !1 == 0 ^ 0 == false 1 ^ !0 == 1 ^ 1 == false 1 ^ !1 == 1 ^ 0 == true [/code]
__________________
http://theoatmeal.com/comics/cat_vs_internet |
#200
|
||||
|
||||
![]()
*facepalm*
Of course, one would be hard-pressed to call that a bug, if the one missed condition is both x && y are TRUE. Because (VS + VT == 0) and (VS + VT == -1) is not a valid combination of true conditions...so I guess I was right. Changing it to the more readable version throws in a few ten extra statements, so I just need to comment that confusion somehow.
__________________
http://theoatmeal.com/comics/cat_vs_internet |