![]() |
I've been learning about recompilers. I'm actually baffled at how people end up with inaccurate recompilers. It just doesn't make any sense to me. One should start with a slow and accurate recompiler and eventually tweak it for speed. In the case of PJ64's RSP, I can understand the inaccuracy, since even the interpreter wasn't super accurate itself.
I'm curious though. For android, what does it use for RSP? At this rate, I'm almost certain you are better off doing it yourself, instead of waiting for someone else to do it. In my case, it looks like no one is going to bother making or improving an RSP recompiler anytime soon. So it's up to me to make that happen. I'm already making progress :D . It doesn't matter how little experience/knowledge you have with programming. Tweaking code is much easier than writing your own code. Even a noob can change numbers that end up benefiting the program. Adding a few if statements could speed up a recompiler that hasn't been well optimized. Wow I didn't know Mupen64Plus AE was written in Java. That pretty much seals the deal. Would not even bother looking at it. |
Yeah, wouldn't use MMX for RSP. I was happy using MMX with certain parts of the RDP since it had a few facilities for 64-bit pack or unpack, shuffle and other decoding processes with the DP command shuffler. With the RSP...not so much. Most vector stuff on the RSP is 128-bit (except stuff like SP DMA, SDV/LDV, few others), so it wouldn't really be "accurate" to resign oneself to MMX on the RSP.
|
Quote:
I agree that there's no point in using MMX when dealing with 128-bit data. Looking at the Recompiler source was definitely worthwhile, just because it made me realize how much room for improvement there is. When I looked at PJ64 1.6 and 1964's core recompiler, I thought to myself "Wow this looks complex, I doubt I can improve this". Then I look at the RSP recompiler and was intrigued at how much simpler and incomplete it was. I used to be so confused as to how a compiler can effect the speed of the recompiler, then I find out it still uses interpreter. So ya, I'm going to continue learning from your interpreter, and then figure out a few things about recompiler and I'll be able to make my own RSP recompiler :D . I wonder how the speed for LLE audio will compare to HLE audio. |
ADD is not a simple vector instruction.
NOR isn't as simple as OR. With vector NOR you have to do ~(VR[vs][i] | VR[vt][i]). With OR, it's just VR[vs][i] | VR[vt][i]. All it really means is you're taking 128 bits and doing a bit-wise operation with that and another 128 bits. If you're trying to practice SSE or MMX then you're looking the wrong way. VMULF is not a simple application of SSE, and ADD, XOR and NOR are only scalar opcodes with no use for SIMD. You should be practicing on VAND, VOR, VXOR, ... |
Quote:
Quote:
Specifically for the arm dynarec: https://github.com/mupen64plus-ae/mu...00/new_dynarec |
Quote:
I've already practiced with instructions like VAND, VOR, VXOR with 1964 audio HLE a while back. Now I just need to understand the complex vector instructions. I'm not solely focusing on SSE and MMX, I want to learn all of the RSP instructions so I can do a complete job. Quote:
Quote:
Quote:
|
Quote:
zilmar ESPECIALLY needs to do this. To be honest one of zilmar's biggest screw-ups with his RSP interpreter, intended as "readable" and "accurate" but in his claim, not for speed/performance, was forcibly inlining his horrible (not to mention inaccurate with real hardware algorithm) shuffle code. Because he forces shuffling to happen within EVERY vector opcode, ALL of them are annoying to read to figure out what zilmar is trying to do. Take the simple Vector AND opcode as an example from zilmar's Interpreter Ops.c: Code:
void RSP_Vector_VAND (void) { Second, the names "el" and "del" are switched. The SOURCE element is the target vector register slice of VR[vt], and the destination is VR[vd] computed off of VR[vs]. Third, if he had just centralized this horrible word-swapping shuffling algorithm to happen into one single function, rather than 40+ vector opcode functions, like accurate RSP timing would do to shuffle in the main vector scheduler thread rather than inside each opcode (not that making this change exactly promotes cycle-accuracy, but still it's readability++ and filesize--), his VAND interpreter opcode would be reduced down to only this: Code:
void RSP_Vector_VAND (void) { He also made that change after RSP 1.7 to use VECTOR union type so he could say VR[vd] = result;, bragging about how it moved 128 bits at once using the C language. What he doesn't get is, he's actually multiplying the amount of 16-bit moves he's doing, not reducing it. He's temporarily moving 16-bit elements into a temporary `result' union, delaying the actual writeback to VR[vd = RSPOpC.sa]. So it adds to the file size, adds to the C code, adds to the amount of variables/objects declared...there is no need for a "result" temporary. Thus, further reduced to: Code:
void RSP_Vector_VAND (void) { Finally, it shouldn't be: Code:
void RSP_Vector_VAND (void) { Code:
#ifdef ARCH_MIN_SSE2 And then there's more shit I could go on about. I'm not telling you all this to bore you though. I'm telling you because you're learning about RSP recompiler from Jabo's MMX recompiler ... well the unfortunate thing is it was based on zilmar's interpreter, so you might have to understand both sides and how the interpreter really could have been different, and simpler. |
I'm actually really interested in the RSP right now. I honestly can't stand looking at some of those vector instructions in PJ64's RSP.
I was thinking that using 3 arrays would be great for speed. That was one of the reasons I didn't want to use PJ64's rsp as a base. Now that I have a basic understanding of how recompilers works, I'm going to need to fully understand how interpreter works before I begin the recompiler plugin. At this point, I wouldn't even benefit from looking at that MMX code in the recompiler. Reading your interpreter source and figuring out the SSE code generated is far more useful. Once I know exactly what each instruction does, I'll be able to make my own sse recompiler implementation. Lol one weird thing about zilmar's RSP is that it assigns addresses to a function pointer array in the BuildInterpreterCPU and BuildRecompilerCPU functions. Is there any reason to assign the addresses at runtime? It just seems like overhead, to me. |
Gotta bring this up again since porting just became a lot less difficult if I understand correctly.
PJ64 primarily uses .NET Framework,right? Microsoft recently made .NET Framework open-sourced and are getting into stuff for other platforms including the platform with Android! I think Visual Studio or Visual C (might be C# instead) is also in the picture as well. At least there already is so much progress with Mupen64Plus AE since DK64 collision has been completely fixed and Banjo-Tooie is now crashless for me after play-testing it for twelve hours straight! (got to just before fighting Weldar) I wonder if PCSX2 is now more capable of being ported,even if it is incredibly slow. (Dolphin Emulator for Gamecube AND Wii got ported to Android) |
It doesn't. So no this doesn't help at all. And yeah the .NET framework implies C# which again doesn't matter at all in this context.
This doesn't help PCSX2 at all, PCSX2 is maybe even worse off than PJ64 and it doesn't run on anything except x86 (not even x64), the core is so dependant on the architecture it would need a rewrite to even start to think about running on ARM You can't compare dolphin and pcsx2, this won't do you any good (don't ever do this with any emulator of different machines anyway). The reason that dolphin runs is a) It was designed to be portable from the getgo (relatively small, but substantial nonetheless, part of why it works) b) It has at least 2 developers that are pretty much solely dedicated to make it work on android (the biggest part, something PJ64 and PCSX2 lack, for the reason that those 2 are even there, see a) ) |
All times are GMT. The time now is 08:34 AM. |
Powered by vBulletin® Version 3.7.3
Copyright ©2000 - 2022, Jelsoft Enterprises Ltd.