Go Back   Project64 Forums > General Discussion > Open Discussion

Reply
 
Thread Tools Display Modes
  #1181  
Old 28th January 2015, 11:15 PM
HatCat's Avatar
HatCat HatCat is offline
Alpha Tester
Project Supporter
Senior Member
 
Join Date: Feb 2007
Location: In my hat.
Posts: 16,236
Default

Just cmpeq is fine actually, shouldn't have to risk saying set1_epi16(~0x0000).

Maybe it just needs to be round = _mm_set1_epi16(round, round); instead or something, I can look at it later. But we're talking about a GCC bug there, not malpractice of either inline assembly or intrinsics.

Quote:
Originally Posted by RPGMaster View Post
If you want to improve your VMULF, the sign clamp algorithm can be improved.
Doubt it. You have to be very careful when making these kinds of changes.

VMULF started out a lot faster than how it ended up when I was done stabilizing it.
Reply With Quote
  #1182  
Old 29th January 2015, 12:10 AM
RPGMaster's Avatar
RPGMaster RPGMaster is offline
Alpha Tester
Project Supporter
Super Moderator
 
Join Date: Dec 2013
Posts: 2,008
Default

Quote:
Originally Posted by HatCat View Post
Just cmpeq is fine actually, shouldn't have to risk saying set1_epi16(~0x0000).

Maybe it just needs to be round = _mm_set1_epi16(round, round); instead or something, I can look at it later. But we're talking about a GCC bug there, not malpractice of either inline assembly or intrinsics.
You're not the first person to dislike set1_epi16(~0x0000). Out of curiosity, what's wrong with using that? GCC isn't the only compiler with a bug in _mm_cmpeq_epi16.

Quote:
Originally Posted by HatCat View Post
Doubt it. You have to be very careful when making these kinds of changes.

VMULF started out a lot faster than how it ended up when I was done stabilizing it.
Lol dude, I'm very careful. Especially since it's easy to test.
Reply With Quote
  #1183  
Old 29th January 2015, 12:37 AM
HatCat's Avatar
HatCat HatCat is offline
Alpha Tester
Project Supporter
Senior Member
 
Join Date: Feb 2007
Location: In my hat.
Posts: 16,236
Default

It's so easy to test, that those code pastes you just edited out of your post were in fact incomplete, or broken, and showed no conceptual understanding of the ultimate clamping process. It is more complicated than that. Doesn't matter how many games you tested, doesn't make it easy to test. I had to revise VMULF many times to fix more than just [non-]MusyX regressions, even before writing a separate C program to loop all combinations of VS and VT, multiply and compare them like you started testing. Even the old code before testing intrinsics in it probably had it wrong.
Reply With Quote
  #1184  
Old 29th January 2015, 01:27 AM
RPGMaster's Avatar
RPGMaster RPGMaster is offline
Alpha Tester
Project Supporter
Super Moderator
 
Join Date: Dec 2013
Posts: 2,008
Default

Quote:
Originally Posted by HatCat View Post
It's so easy to test, that those code pastes you just edited out of your post were in fact incomplete, or broken, and showed no conceptual understanding of the ultimate clamping process. It is more complicated than that.
I editted because I realize it was a bad example. I jumped to conclusions and thought the difference in efficiency was due to the sign clamping algorithm, but unsure what caused the difference now. I sometimes read too fast. My point is still valid, that it could be more optimized. I just tried to give an example of code you're familiar with, but you're even unsure about the accuracy of your old code.
Quote:
Originally Posted by HatCat View Post
Doesn't matter how many games you tested, doesn't make it easy to test. I had to revise VMULF many times to fix more than just [non-]MusyX regressions, even before writing a separate C program to loop all combinations of VS and VT, multiply and compare them like you started testing. Even the old code before testing intrinsics in it probably had it wrong.
I don't test games as extensively as I used to. Imo, it's much more time efficient to just test & analyze algorithms. I don't see how an algorithm that produces the exact same results for every single combination is inaccurate .
Reply With Quote
  #1185  
Old 29th January 2015, 02:04 AM
HatCat's Avatar
HatCat HatCat is offline
Alpha Tester
Project Supporter
Senior Member
 
Join Date: Feb 2007
Location: In my hat.
Posts: 16,236
Default

>I don't see how an algorithm...

I don't see an algorithm period in your posts; I just see repeated references being made that the function is somehow supposed to become more optimized, through methods with little detail being given over.

Even before editing you weren't actually quoting the clamping algorithm itself anyway, just a part of it. Running loops and comparing results isn't everything. The entire function had parts of different things being solved, scattered over the function, and special RSP element restrictions applied to it that made the RSP opcode bug-free, even where results might not necessarily match. (I remember MarathonMan finding a "bug" in my VMADN opcode...it was caused by a huge, nearly prime number product, which could never be reached by 2 16-bit integers.) You have to understand the objective, what is being solved, and the algorithm...just pasting code that I have done in the past doesn't mean it's pasting a sign-clamp algorithm. VMULF just happens to be easier to avoid doing a clamp and weeding out a single corner case.

The only truly accurate signed clamping algorithm is more like this:
Code:
i16 signed_clamp(i32 product)
{
    if (product < INT16_MIN)
        return INT16_MIN;
    if (product > INT16_MAX)
        return INT16_MAX;
    return ((i16)product);
}
Anything else I've done is just so that it gets vectorized...at quite a heavy price of readability. No matter how many tests you do, a lot of the source can be open to interpretation, and isn't always void of C strict behavior regulations either.

Last edited by HatCat; 29th January 2015 at 02:10 AM.
Reply With Quote
  #1186  
Old 29th January 2015, 04:27 AM
RPGMaster's Avatar
RPGMaster RPGMaster is offline
Alpha Tester
Project Supporter
Super Moderator
 
Join Date: Dec 2013
Posts: 2,008
Default

I see what you're saying. I could have worded it better. What I really meant to say is just optimizing the corner case handling. I basically do just enough to get the correct result, rather than implementing the complete algorithm when I don't have to.

So I was testing BattleTanx(U) (LLE audio & LLE gfx) with PJ64's RSP and got an error message saying "SP_DRAM_ADDR_REG not in RDRam space". So I put that same error check in your RSP and it also triggered. Kinda odd ;/ .

Last edited by RPGMaster; 29th January 2015 at 12:33 PM.
Reply With Quote
  #1187  
Old 29th January 2015, 02:18 PM
HatCat's Avatar
HatCat HatCat is offline
Alpha Tester
Project Supporter
Senior Member
 
Join Date: Feb 2007
Location: In my hat.
Posts: 16,236
Default

I might remove the corner-case handling anyway.

All VMULF is doing is signed-clamping. I just happen to only care about the corner case of -32768 * -32768 because it's the only possible way to overflow or underflow in mulf's particular case (confirmed also by zilmar's success at reverse-engineering the algorithm, but also just a mathematically inducible thing). So better to use the same signed-clamp method for all the functions. This way, if I break signed clamp function, it affects VMULF as well as everything else and makes games easier to test.

Quote:
Originally Posted by RPGMaster View Post
So I was testing BattleTanx(U) (LLE audio & LLE gfx) with PJ64's RSP and got an error message saying "SP_DRAM_ADDR_REG not in RDRam space". So I put that same error check in your RSP and it also triggered. Kinda odd ;/ .
I don't have an error message for SP DMA in my plugin, so by "triggered" I guess you mean that the same CP0 inputs for DMA are in my plugin as the ones resulting from the PJ64 RSP interpreter.

This is either an interpreter bug in PJ64's RSP plugin, or you're getting said SP DRAM address boundary errors as artifacts from using the recompiler and should just try testing using the CPU interpreter core instead.
Reply With Quote
  #1188  
Old 23rd February 2015, 10:22 AM
RPGMaster's Avatar
RPGMaster RPGMaster is offline
Alpha Tester
Project Supporter
Super Moderator
 
Join Date: Dec 2013
Posts: 2,008
Default

I was testing Battle for Naboo with your latest RSP + cpu interpreter + modified pj64. I get "LLV Odd addr." when I get close to a certain enemy and start shooting. I'm curious whether there's a problem with the emulator or if the game actually does it intentionally. I know it's from gfx task, since I also got the message when HLE audio was enabled.

I was testing out the code exhalatio posted. The graphics look fine after the change. Maybe I'll try testing your rsp with old zilmar spec M64p.
Reply With Quote
  #1189  
Old 23rd February 2015, 03:05 PM
HatCat's Avatar
HatCat HatCat is offline
Alpha Tester
Project Supporter
Senior Member
 
Join Date: Feb 2007
Location: In my hat.
Posts: 16,236
Default

Quote:
Originally Posted by RPGMaster View Post
I was testing Battle for Naboo with your latest RSP + cpu interpreter + modified pj64. I get "LLV Odd addr." when I get close to a certain enemy and start shooting. I'm curious whether there's a problem with the emulator or if the game actually does it intentionally.
If it was a problem with the main CPU emulator, then most likely other parts of the ICACHE would get corrupted, not just that one single instruction to be unaligned or illegal LLV. So it most likely was intended on the RSP.

Would always help to be able to test for sure with a saved state though so that I could have it implemented on my own end.
Reply With Quote
  #1190  
Old 23rd February 2015, 09:08 PM
RPGMaster's Avatar
RPGMaster RPGMaster is offline
Alpha Tester
Project Supporter
Super Moderator
 
Join Date: Dec 2013
Posts: 2,008
Default

Quote:
Originally Posted by HatCat View Post
If it was a problem with the main CPU emulator, then most likely other parts of the ICACHE would get corrupted, not just that one single instruction to be unaligned or illegal LLV. So it most likely was intended on the RSP.

Would always help to be able to test for sure with a saved state though so that I could have it implemented on my own end.
Ok, made a save state. All you have to do is keep shooting lasers (just hold B) and it eventually pops up.

https://www.dropbox.com/s/em4mutzttd...29.pj.zip?dl=0
Reply With Quote
Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Forum Jump


All times are GMT. The time now is 08:19 PM.


Powered by vBulletin® Version 3.7.3
Copyright ©2000 - 2023, Jelsoft Enterprises Ltd.