Go Back   Project64 Forums > General Discussion > Open Discussion

Reply
 
Thread Tools Display Modes
  #951  
Old 30th August 2014, 01:43 AM
HatCat's Avatar
HatCat HatCat is offline
Alpha Tester
Project Supporter
Senior Member
 
Join Date: Feb 2007
Location: In my hat.
Posts: 16,236
Default

Quote:
Originally Posted by MarathonMan View Post
That comment was made high in the post due to the fact that our two functions, do_abs and RSP_VABS, both take the similar input operands -- post-shuffled arguments, etc.
They do not take similar input operands at all. >.<
*sigh*, why do I even bother.

Quote:
Originally Posted by MarathonMan View Post
So, on that basis, I said 'sue me' for trying to be "deceptive" as I pasted the entirety of my function which carries out the same task and produces the same output, including the ret.
The ret still has nothing to do with this. I never said anything about a ret, nor you missing one. Nor is the task the same, due to old crap functional code I wrote.

Quote:
Originally Posted by MarathonMan View Post
Also note that in debates, it's informal and classless to label someone based on their medical condition, especially when that condition is of no fault of their own.
I didn't label you based on medical condition. How would I know your actual medical diagnosis?

I'm observing problems you seem to be having with not skipping over relevant parts of posts, which had always been a problem. Now if you're convinced that for some strange reason, I *know* what your medical diagnosis is, notice that I didn't say you're diagnosed in so-and-so a manner. Probably what I should have said was "being ADHD". It's perfectly valid for a normal person to be an ass and the other way around.

Quote:
Originally Posted by MarathonMan View Post
I don't care what you implemented then. My response was to the function that RPGMaster has posted... duh?
Clearly that's not accurate because your "response" actually was this in fact:
"If you had used intrinsics, you could have just seen it from the getgo instead of having to refactor your entire CP2 codebase and you wouldn't have the "mess" of SSE2 instructions that you have now. Then I would have never even responded."

If you don't care what I implemented, why is it true that my implementation using intrinsics would have caused you to never respond?

It's fine to have advice for people, but when you already know someone is adverse to it, it really just starts to become trolling. Notice how I don't pop on CEN64 forums just to talk to everyone there but you about things you're not doing that I think you should, when I know you're opposed to them. And if I keep posting about it in threads (MarathonMan's not using ANSI or zilmar specs!) to various users, I imagine that you as the owner of said forum would eventually find that worth moderating.

For someone who's so sure that I "kept arrogantly assuming" (as you said on IRC after cxd4 /quit) that you were interested in adopting my RCP code, I find it interesting that I haven't bitched you out about one single thing of how you personally do things, yet you usually only post on this forum about things you disagree of how I'm doing it. Poor you; I'm sure you must feel on the defensive.

Quote:
Originally Posted by MarathonMan View Post
You should take your own advice and not "construct accusations in the other person's mouth":
- do_abs, as posted, doesn't shuffle either.
- I even mentioned in my original post... "This is after the load and shuffle."
- my function, as posted, also did writeback of the low part of the accumulator as posted (just not vd).
It may have seemed safe to "assume" RPGMaster would read it that way, but clearly that's not how he read it, as you may see from his responses. He thought the shorter code was simply a result of using AVX support, like he said. It was really because you used a different algorithm, and algorithm is the key!

It's hard to feel butthurt when you actually know you're right, especially if year-old code is the best example you have to criticize. I've been posting modern examples of ANSI C loops all over my RDP/angrylion thread and how that perfectly compiled to optimized SSE2 intructions; why didn't you participate in an argument then when I made those posts? Why only now and not then do you bring it up? Because at the time, I was right and you were wrong. In this case, you're right and I'm wrong, because shitty usage of ANSI C loops from my 1-year-less-experienced-self in old RSP code from back then has given you a false hope of proving that intrinsics always guarantee beating out portable code.

That being said, you're right that you commented in your first post, "this is after the load and shuffle." Apparently I missed that part, or possibly forgot. See, maybe I have ADHD. Hell maybe everyone does. To me, it's more an observation.

Quote:
Originally Posted by MarathonMan View Post
And my code was from over 6 months ago. Do you want a cookie or something? Milk to go with it? Do you get bonus points for arguing with older code?
lol were you talking to yourself??

I hardly argued with your code. All of this was in defensive of what I didn't know back then about SSE when I wrote that year-old RSP code. None of it was about *your* code, except that time I proved to RPGMaster that auto-vectorization in part of my method potentially beat out a part of your intrinsics code.

I'm telling you that the shit that is those ANSI C loops right now just ain't optimized, and all you can keep doing is bashing that as the reason why it supposedly can't result in as good a thing as intrinsics, even though I've been posting ANSI C loops way more recently than those in my RDP thread that DO equate to the output of intrinsics. You're the one arguing with older code here; get the story straight!

Quote:
Originally Posted by MarathonMan View Post
I must be the devil! I agreed the affirmative:
So now you're saying you're agreeing "the affirmative" with our observation that Clang's vectorized output is worse than GCC's?

If that's true, then why did you say this:

"Clang's IR all the way up to codegen looks awesome from an auto-vectorization standpoint. I'd be surprised if GCC's was any better;"

Also, why did you post GCC output with absolutely no vectorization at all, if you think Clang's is worse, when it at least vectorizes?

Quote:
Originally Posted by MarathonMan View Post
I love these terms that you come up with to make yourself sound smart. "ansi auto-vectorization" is definitely a good one, thanks for the laugh. Is there a standards committee for auto-vectorization? It's just "auto-vectorization", buddy.
?? How is saying "ANSI" auto-vectorization, instead of auto-vectorization, trying to make myself "sound smart"? I'm simply referring to the technique of using ANSI C code to vectorize. Hell, even "auto-vectorize" itself was just a term I made up (dunno if it's official, but you seem to be using it too). I didn't make it up just to sound smart! It's just what I think it is.

But whatever, I understand you feel so wrongly accused on a number of things that it's only right to seek revenge.

Quote:
Originally Posted by MarathonMan View Post
Maybe I've been doing vast amounts of research in auto-vectorization?
No you haven't, and yes I am one to judge.

Quote:
Originally Posted by MarathonMan View Post
I prefer vendor lock-in over assuming capabilities are present in the host compiler. I agree it'd be nice if there was a standardized way of doing all this.
So far RPGMaster/myself have found four compilers:
GCC, MSVC, Clang and Intel compiler

I don't really know that many compilers, probably not compared to you at least. What's an optimizing compiler you trust in the competition for best output that DOESN'T have the capability to auto-vectorize?

Is my RSP VABS emulator an accurate reason of why intrinsic functions for SSE should be minimized/avoided? No, but that doesn't mean it proves every fucking example is invalid. If realizing that wouldn't have prevented the length of this argument, then I can't guess what it is for you. Well, aside from some political obsession.

Quote:
Originally Posted by MarathonMan View Post
Mmmm... not so much. Again, see what I said about passing data and whatnot around above. I also like the reassurance that my compiler is going to spew out vector instructions regardless of flags passed or what have you.
Using intrinsics does not excuse not passing -msse2 to the compiler.
Even if you fully use intrinsics everywhere you can think of, it's still a good idea to pass -msse2 and other relevant flags to signify that this is the limit, and help vectorize maybe other things you might have missed.

If you really liked the reassurance that your compiler is going to always spew out optimized code, regardless of what flags passed, then don't pass -O3 to gcc. Pass -O0 and code in 1990's "I'm better than my compiler" mode.

Quote:
Originally Posted by MarathonMan View Post
Doesn't matter what you didn't or did know, it's what you did.
It's what I did over a year ago, jesus.

Think about it. You're bashing the way I wrote my non-intrinsic code from back then. You even accused me of willingly passing (int vd, int vs, int vt), when really I told you I had no idea back then that passing __m128i's would have avoided pushes or pops and been superior to that.

So yes, actually, in the case of being accused of knowing what I was doing while I was doing it, it DOES matter what I did or didn't know. You just don't think it does because you keep pointing out mistakes from the past of which I'm already aware.

Quote:
Originally Posted by MarathonMan View Post
Well! Since you asked, I'll compile that function from your trunk with a bleeding edge release of gcc (4.9.1) and -O2 -march=native:

**** see next post for code ****

Hm... looks like it didn't auto-vectorize. Nor did it with -march=native -O3 -ftree-vectorize.
So, even though RPGMaster was able to figure out how to get my VABS function to auto-vectorize on GCC 4.8.1, yet you weren't able to in GCC 4.9.1 on your setup, you somehow assume you're not forgetting something?

I have GCC 4.9.something (forget) too on that laptop you sent. It auto-vectorizes just fine. Dunno what you're missing.
Reply With Quote
  #952  
Old 30th August 2014, 02:17 AM
theboy181's Avatar
theboy181 theboy181 is offline
Alpha Tester
Project Supporter
Senior Member
 
Join Date: Aug 2014
Location: Prince Rupert,British Columbia Canada
Posts: 424
Default

I assume that this type of banter is common among coders, and that it actually helps the progression of the scene.

Anyhow, this is making a really good read, and I hope it continues.
Reply With Quote
  #953  
Old 30th August 2014, 02:26 AM
RPGMaster's Avatar
RPGMaster RPGMaster is offline
Alpha Tester
Project Supporter
Super Moderator
 
Join Date: Dec 2013
Posts: 2,008
Default

I knew that the argument formats were different. It's just that seeing the code made me realize that writing AVX code in assembly would probably be simpler. I'd like to emphasize the fact that I'm focused on making a recompiler, so the code gen is all done manually. Later on, I may try making multiple code paths, but I doubt that will be anytime soon. I'm fairly confident this sse2 one will satisfy my goals.

I admit I am surprised by that output MarathonMan posted. I'm guessing perhaps gcc isn't good with 64bit? I don't see how you could mess up the compiler settings if you put -march=native . I'm pretty sure when i did -march=native, I saw better asm output, although it obviously wasn't limited to sse2 ;/ .

Clang can vectorize fine, still probably not as good as GCC, but it's certainly better than MSVC right now. Only reason it looked so bad in HatCat's fork is because for some reason using intrinsics, inhibits that compiler's ability for auto-vectorization. When i disabled the intrinsics, in Draw Triangle, the asm output looked much better in Clang, than when it was mixed. So what I'll have to do for any project I decide to work on / make from scratch is, have 2 different implementations. One will be intrinsics and the other will be pure ansi-c because they don't seem to mix well. I wouldn't be surprised if other compilers have a problem with intrinsics interfering with auto vectorization too.

I've gone through most of the vector instructions now. It's been pretty interesting looking at the output. There were functions where both GCC and Intel did weird stuff, so I had optimized them, myself.

Man I've been pretty much working like a machine these past few days. The one thing I'm worried about is whether it will even work. It will be a huge hassle figuring out bugs. Other than that, recompilers are very interesting.
Reply With Quote
  #954  
Old 30th August 2014, 02:28 AM
MarathonMan's Avatar
MarathonMan MarathonMan is offline
Alpha Tester
Project Supporter
Senior Member
 
Join Date: Jan 2013
Posts: 454
Default

Quote:
Originally Posted by HatCat View Post
It's fine to have advice for people, but when you already know someone is adverse to it, it really just starts to become trolling. Notice how I don't pop on CEN64 forums just to talk to everyone there but you about things you're not doing that I think you should, when I know you're opposed to them.
Point taken.

Still bothers me to all hell I can't convince you of anything regardless of how much evidence I supply without getting lip, so regardless of the fact that I feel like I could respond strongly I'll just save both our efforts for development and ban/register myself from here. Peace.
Reply With Quote
  #955  
Old 30th August 2014, 02:59 AM
HatCat's Avatar
HatCat HatCat is offline
Alpha Tester
Project Supporter
Senior Member
 
Join Date: Feb 2007
Location: In my hat.
Posts: 16,236
Default

Well I adopted your SSSE3 shuffle code. I didn't understand pshufb all that well to begin wtih, so I looked at how you implemented it and even commented credit to you in my shuffle.h source. So when I build my RSP with -DARCH_MIN_SSSE3 and -mssse3, it uses those intrinsics instead.

There are plenty of other places where I had to use intrinsics (your movemask example being another one); I just want to keep it minimal/portable to non-Intel (like, on mips, for loops could compile to SGI/rsp VAND instruction, rather than sse2 pand!). The days where we both found fixes/corrections for RSP interpreter speed/accuracy were just so long ago that the future just isn't the past I guess. Either way, I'd never even heard of SSE before you brought it up, so like, try not to feel like you've never convinced me of anything before.

RPGMaster: I just got done reading that entire post and can't remember a single goddamn thing about what I just read. Something about working hard I guess.
Reply With Quote
  #956  
Old 8th September 2014, 01:10 AM
RPGMaster's Avatar
RPGMaster RPGMaster is offline
Alpha Tester
Project Supporter
Super Moderator
 
Join Date: Dec 2013
Posts: 2,008
Default

Rofl wow I goofed pretty bad. Some of the CP2 functions I implemented have some clear mistakes ;/ .

Taking it slow sure is better. I made a lot of silly mistakes when I was trying to do it fast. And I was staying up till 5 am working on this stuff for a few days.

Anyway, is there a convenient way to use different versions of GCC? Or will I have to constantly install and uninstall?

Hopefully soon I can get my code working ;/ .
Reply With Quote
  #957  
Old 8th September 2014, 04:09 AM
HatCat's Avatar
HatCat HatCat is offline
Alpha Tester
Project Supporter
Senior Member
 
Join Date: Feb 2007
Location: In my hat.
Posts: 16,236
Default

CP2 don't mean shit. MFC2/CTC2/?? V*?

Just start out small. Try doing something like LBV and SBV in recompiler, then try MFC2 and MTC2, including the 8-bit wrap-around possibility exclusive to MFC2.

I just have different folders for different versions of GCC. $MinGW/libexec/bin/... tends to show you all the versions of GCC you have installed in folders, not just one.
Reply With Quote
  #958  
Old 8th September 2014, 04:20 AM
RPGMaster's Avatar
RPGMaster RPGMaster is offline
Alpha Tester
Project Supporter
Super Moderator
 
Join Date: Dec 2013
Posts: 2,008
Default

You're right. I should start with LBV and SBV, the finish all the other Lc2 and Sc2. I originally tried to cut corners but that failed miserably.

Alright, I guess I'll go ahead and try installing dif versions and do a comparison.
Reply With Quote
  #959  
Old 8th September 2014, 11:12 PM
RPGMaster's Avatar
RPGMaster RPGMaster is offline
Alpha Tester
Project Supporter
Super Moderator
 
Join Date: Dec 2013
Posts: 2,008
Default

So right now, I'm workin on LDV. Since it's a switch table, I'm wondering the best way to implement the jump table. Is there a good way to allocate a jump table, without using something like malloc? For the time being, I'm using a large static 2d array, but it will need to be changed in the future.

Lol first I was goin too fast, now I'm goin too slow ;/ . Time to pick up the pace !
Reply With Quote
  #960  
Old 9th September 2014, 03:55 AM
HatCat's Avatar
HatCat HatCat is offline
Alpha Tester
Project Supporter
Senior Member
 
Join Date: Feb 2007
Location: In my hat.
Posts: 16,236
Default

Quote:
Originally Posted by RPGMaster View Post
So right now, I'm workin on LDV. Since it's a switch table, I'm wondering the best way to implement the jump table. Is there a good way to allocate a jump table, without using something like malloc? For the time being, I'm using a large static 2d array, but it will need to be changed in the future.
Silly, don't generate a switch table. I don't even understand dynamic re-compilers that well, and even I know a table isn't needed here.

LDV is a static operation...loads 64 bits to a vector register.
I only made the interpreter implementation of it a switch table to optimize for all the possible alignments. If you do not assume an alignment (addr & 07), you must do 8 1-byte writes with constant endianness conversion of the byte address every time.

I never said to finish LWC2 and SWC2, just to practice on LBV and SBV and M*C2. Little-endian CPU makes writing 64 bits at once break accuracy with LDV.
Reply With Quote
Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Forum Jump


All times are GMT. The time now is 04:38 PM.


Powered by vBulletin® Version 3.7.3
Copyright ©2000 - 2023, Jelsoft Enterprises Ltd.