|
#1181
|
||||
|
||||
![]()
Just cmpeq is fine actually, shouldn't have to risk saying set1_epi16(~0x0000).
Maybe it just needs to be round = _mm_set1_epi16(round, round); instead or something, I can look at it later. But we're talking about a GCC bug there, not malpractice of either inline assembly or intrinsics. Quote:
VMULF started out a lot faster than how it ended up when I was done stabilizing it. ![]()
__________________
http://theoatmeal.com/comics/cat_vs_internet |
#1182
|
||||
|
||||
![]() Quote:
Lol dude, I'm very careful. Especially since it's easy to test. |
#1183
|
||||
|
||||
![]()
It's so easy to test, that those code pastes you just edited out of your post were in fact incomplete, or broken, and showed no conceptual understanding of the ultimate clamping process. It is more complicated than that. Doesn't matter how many games you tested, doesn't make it easy to test. I had to revise VMULF many times to fix more than just [non-]MusyX regressions, even before writing a separate C program to loop all combinations of VS and VT, multiply and compare them like you started testing. Even the old code before testing intrinsics in it probably had it wrong.
__________________
http://theoatmeal.com/comics/cat_vs_internet |
#1184
|
||||
|
||||
![]() Quote:
Quote:
![]() |
#1185
|
||||
|
||||
![]()
>I don't see how an algorithm...
I don't see an algorithm period in your posts; I just see repeated references being made that the function is somehow supposed to become more optimized, through methods with little detail being given over. Even before editing you weren't actually quoting the clamping algorithm itself anyway, just a part of it. Running loops and comparing results isn't everything. The entire function had parts of different things being solved, scattered over the function, and special RSP element restrictions applied to it that made the RSP opcode bug-free, even where results might not necessarily match. (I remember MarathonMan finding a "bug" in my VMADN opcode...it was caused by a huge, nearly prime number product, which could never be reached by 2 16-bit integers.) You have to understand the objective, what is being solved, and the algorithm...just pasting code that I have done in the past doesn't mean it's pasting a sign-clamp algorithm. VMULF just happens to be easier to avoid doing a clamp and weeding out a single corner case. The only truly accurate signed clamping algorithm is more like this: Code:
i16 signed_clamp(i32 product) { if (product < INT16_MIN) return INT16_MIN; if (product > INT16_MAX) return INT16_MAX; return ((i16)product); }
__________________
http://theoatmeal.com/comics/cat_vs_internet Last edited by HatCat; 29th January 2015 at 02:10 AM. |
#1186
|
||||
|
||||
![]()
I see what you're saying. I could have worded it better. What I really meant to say is just optimizing the corner case handling. I basically do just enough to get the correct result, rather than implementing the complete algorithm when I don't have to.
So I was testing BattleTanx(U) (LLE audio & LLE gfx) with PJ64's RSP and got an error message saying "SP_DRAM_ADDR_REG not in RDRam space". So I put that same error check in your RSP and it also triggered. Kinda odd ;/ . Last edited by RPGMaster; 29th January 2015 at 12:33 PM. |
#1187
|
||||
|
||||
![]()
I might remove the corner-case handling anyway.
All VMULF is doing is signed-clamping. I just happen to only care about the corner case of -32768 * -32768 because it's the only possible way to overflow or underflow in mulf's particular case (confirmed also by zilmar's success at reverse-engineering the algorithm, but also just a mathematically inducible thing). So better to use the same signed-clamp method for all the functions. This way, if I break signed clamp function, it affects VMULF as well as everything else and makes games easier to test. Quote:
This is either an interpreter bug in PJ64's RSP plugin, or you're getting said SP DRAM address boundary errors as artifacts from using the recompiler and should just try testing using the CPU interpreter core instead.
__________________
http://theoatmeal.com/comics/cat_vs_internet |
#1188
|
||||
|
||||
![]()
I was testing Battle for Naboo with your latest RSP + cpu interpreter + modified pj64. I get "LLV Odd addr." when I get close to a certain enemy and start shooting. I'm curious whether there's a problem with the emulator or if the game actually does it intentionally. I know it's from gfx task, since I also got the message when HLE audio was enabled.
I was testing out the code exhalatio posted. The graphics look fine after the change. Maybe I'll try testing your rsp with old zilmar spec M64p. |
#1189
|
||||
|
||||
![]() Quote:
Would always help to be able to test for sure with a saved state though so that I could have it implemented on my own end.
__________________
http://theoatmeal.com/comics/cat_vs_internet |
#1190
|
||||
|
||||
![]() Quote:
https://www.dropbox.com/s/em4mutzttd...29.pj.zip?dl=0 |