Go Back   Project64 Forums > General Discussion > Open Discussion

Reply
 
Thread Tools Display Modes
  #901  
Old 12th August 2014, 10:46 PM
Melchior's Avatar
Melchior Melchior is offline
Alpha Tester
Project Supporter
Super Moderator
 
Join Date: Apr 2007
Location: NH, USA
Posts: 230
Default

Quote:
Originally Posted by RPGMaster View Post
For some reason, I'll get stuck on something. Then after taking a break, things all the sudden start making more sense ;/ . I must sound crazy though, all the things I've said I would try to do .
not at all I know the feeling, like a nice cozzy nap ZZZZzzzzzz ^_^Zzzzzz
__________________
(PC Specs)
CPU: AMD FX-9590 4.7GHz 8-core
CPU Instructions: MMX, SSE1-4
Motherboard: Asus SABERTOOTH 990FX R2.0
GPU: nVidia GTX 1070 Ti 8GB
GFX Drivers: Nvidia v441.66
OS: Windows 7 Ultimate 64-bit SP1
RAM: 32GB Kingston 1866MHz DDR3

Favorite Emulators:
PS2 : PCSX2 (Auto-Builds)
SNES : ZSNES

My PJ64 setup:
EXE : v2.4.0.1114
GFX : Project64-Video (v2.2.0.1114)
SPU : AziAudioNEW (v0.70)(2017-09-14)
INPUT : NRage(v2.5.3.1114)
RSP : RSP (v1.7.4.1114)
Reply With Quote
  #902  
Old 14th August 2014, 03:37 AM
RPGMaster's Avatar
RPGMaster RPGMaster is offline
Alpha Tester
Project Supporter
Super Moderator
 
Join Date: Dec 2013
Posts: 2,008
Default

Lol now I'm going through a benchmarking phase . All I feel like doing right now is learning how to properly benchmark code and practice algorithms & intrinsics! That being said, later on sometime, I might try learning from this RSP source.

I'm curious about something though. I've had instances where benchmarking code made in C was inaccurate. The compiler ended up changing how many times it looped and I got fooled by it . Have you ever run across an issue like this? I'll probably end up writing assembly code for benchmarks tbh, due to my past issues.

Quote:
Originally Posted by Melchior View Post
not at all I know the feeling, like a nice cozzy nap ZZZZzzzzzz ^_^Zzzzzz
I feel better now .
Reply With Quote
  #903  
Old 14th August 2014, 07:22 PM
HatCat's Avatar
HatCat HatCat is offline
Alpha Tester
Project Supporter
Senior Member
 
Join Date: Feb 2007
Location: In my hat.
Posts: 16,236
Default

I think he was talking about one of those cozy marijuana naps.

Also, I read your question, but couldn't follow.
Reply With Quote
  #904  
Old 14th August 2014, 08:05 PM
RPGMaster's Avatar
RPGMaster RPGMaster is offline
Alpha Tester
Project Supporter
Super Moderator
 
Join Date: Dec 2013
Posts: 2,008
Default

What I was basically saying was, one time I was benchmarking code and 1 function had a loop that was incrementing by 4 instead of 1 for some reason. I'm pretty sure it was the compiler's fault, because I used the same type of loop in both functions I was comparing.

So ever since then, I've been writting the benchmark code in assembly. I usually just copy pasted the assembly output of 2 pieces of C code, so it wasn't too bad. Now I know that it's better to just use a benchmark program in general cases. I'll only write benchmark code when the differences are very small, because I believe it's more accurate when you write your own benchmark code.
Reply With Quote
  #905  
Old 15th August 2014, 02:54 AM
Melchior's Avatar
Melchior Melchior is offline
Alpha Tester
Project Supporter
Super Moderator
 
Join Date: Apr 2007
Location: NH, USA
Posts: 230
Default

Quote:
Originally Posted by HatCat View Post
I think he was talking about one of those cozy marijuana naps.

Also, I read your question, but couldn't follow.

"not at all I know the feeling, like a nice cozzy nap ZZZZzzzzzz ^_^Zzzzzz "
I how he/you are feeling.... or maybe "that feeling"

lol...
__________________
(PC Specs)
CPU: AMD FX-9590 4.7GHz 8-core
CPU Instructions: MMX, SSE1-4
Motherboard: Asus SABERTOOTH 990FX R2.0
GPU: nVidia GTX 1070 Ti 8GB
GFX Drivers: Nvidia v441.66
OS: Windows 7 Ultimate 64-bit SP1
RAM: 32GB Kingston 1866MHz DDR3

Favorite Emulators:
PS2 : PCSX2 (Auto-Builds)
SNES : ZSNES

My PJ64 setup:
EXE : v2.4.0.1114
GFX : Project64-Video (v2.2.0.1114)
SPU : AziAudioNEW (v0.70)(2017-09-14)
INPUT : NRage(v2.5.3.1114)
RSP : RSP (v1.7.4.1114)
Reply With Quote
  #906  
Old 15th August 2014, 11:46 PM
RPGMaster's Avatar
RPGMaster RPGMaster is offline
Alpha Tester
Project Supporter
Super Moderator
 
Join Date: Dec 2013
Posts: 2,008
Default

So while profiling, I noticed I saw some interpreter functions being used in Ziggy's PJ64 RSP in recompiler mode. That explains my confusion as to why LLE gfx was faster with Ziggy's PJ64 RSP than PJ64 1.6's RSP and i think 2.1's too, yet was slower in LLE audio. Ziggy basically disabled a few more recompiler functions than the original 1.4 RSP, so that's why LLE audio was slower.

I looked at PJ64 2.1's RSP source and noticed it also had an incomplete recompiler. I'm happy though because that means there's more room for improvement than I thought!
Reply With Quote
  #907  
Old 18th August 2014, 06:37 AM
RPGMaster's Avatar
RPGMaster RPGMaster is offline
Alpha Tester
Project Supporter
Super Moderator
 
Join Date: Dec 2013
Posts: 2,008
Default

I've been studying SSE and RSP a lot lately. It's been fun benchmarking and learning! I realized how inaccurate it is to just use a loop for benchmarks . So it turns out for me that using SSE sometimes doesn't even give a speed up.

Anyway, I've also been using assembly more often again, I'm starting to like it a lot. I guess I'm just one of those people who need breaks after overdoing something . I'm curious where using assembly in your RSP would be advantageous. You mind giving me an example?

I'm getting closer to understanding how recompilers work. I think I'm also going to look into static recompilation. I need to find out the RSP instruction sequence works. Man multitasking and taking breaks is amazing. I feel like I understand a lot more now. Lol just the other day, I realized that I can choose which type of RSP instructions I want to debug, by changing the RSP settings for Audio and Gfx . Before, I used to just do LLE for both, when debugging. Now I select HLE for the type I'm not interested in looking at.

Last edited by RPGMaster; 18th August 2014 at 11:04 AM.
Reply With Quote
  #908  
Old 18th August 2014, 06:55 PM
HatCat's Avatar
HatCat HatCat is offline
Alpha Tester
Project Supporter
Senior Member
 
Join Date: Feb 2007
Location: In my hat.
Posts: 16,236
Default

No real advantage to it, just something I thought of doing over intrinsics.

The only thing I'd really have to do in asm is optimized SSE2 shuffling; that's about it.
Reply With Quote
  #909  
Old 18th August 2014, 10:23 PM
RPGMaster's Avatar
RPGMaster RPGMaster is offline
Alpha Tester
Project Supporter
Super Moderator
 
Join Date: Dec 2013
Posts: 2,008
Default

I guess it's still worth doing in assembly then . I think I know why I'm liking assembly more again. It's probably due to my frustration with compilers . I know I nitpick probably more than I should, but I just hate when I see the compiler doing things poorly. I'll need to test out GCC more often.

I wonder if self modifying code is worth implementing for the SSE2 shuffle. I might give it another try.

My opinion of intrinsics has been changed ever since I saw that it ruined auto vectorization with Clang ;/ . Not sure when I'd use it anymore. Now I know that I'll have to either rely solely on auto vectorization or do all intrinsics in functions where mixing the 2 doesn't work well. I should probably start benchmarking before I get too picky about compiler output , just incase my "fixes" don't make a significant difference. Good thing my optimizations usually lead to a smaller binary size though .

Last edited by RPGMaster; 21st August 2014 at 02:12 AM.
Reply With Quote
  #910  
Old 22nd August 2014, 12:41 PM
MarathonMan's Avatar
MarathonMan MarathonMan is offline
Alpha Tester
Project Supporter
Senior Member
 
Join Date: Jan 2013
Posts: 454
Default

Quote:
Originally Posted by RPGMaster View Post
\I wonder if self modifying code is worth implementing for the SSE2 shuffle. I might give it another try.
If you get some results on this, I'd be interested in hearing how it goes. I've always wondered how SMC code is handled in modern x86 pipelines due to the fact that, as you probably know, the ISA says that you don't need to explicitly flush cache lines or anything of that nature.

If the penalty/overhead associated with invalidating internal CPU structures after SMC is detected is low enough, the SSE2 shuffle technique should result in enough reduced overhead to realize noticeable performance improvements.
Reply With Quote
Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Forum Jump


All times are GMT. The time now is 08:05 PM.


Powered by vBulletin® Version 3.7.3
Copyright ©2000 - 2023, Jelsoft Enterprises Ltd.