|
#901
|
||||
|
||||
![]() Quote:
![]()
__________________
(PC Specs) CPU: AMD FX-9590 4.7GHz 8-core CPU Instructions: MMX, SSE1-4 Motherboard: Asus SABERTOOTH 990FX R2.0 GPU: nVidia GTX 1070 Ti 8GB GFX Drivers: Nvidia v441.66 OS: Windows 7 Ultimate 64-bit SP1 RAM: 32GB Kingston 1866MHz DDR3 Favorite Emulators: PS2 : PCSX2 (Auto-Builds) SNES : ZSNES My PJ64 setup: EXE : v2.4.0.1114 GFX : Project64-Video (v2.2.0.1114) SPU : AziAudioNEW (v0.70)(2017-09-14) INPUT : NRage(v2.5.3.1114) RSP : RSP (v1.7.4.1114) |
#902
|
||||
|
||||
![]()
Lol now I'm going through a benchmarking phase
![]() I'm curious about something though. I've had instances where benchmarking code made in C was inaccurate. The compiler ended up changing how many times it looped and I got fooled by it ![]() Quote:
![]() |
#903
|
||||
|
||||
![]()
I think he was talking about one of those cozy marijuana naps.
Also, I read your question, but couldn't follow.
__________________
http://theoatmeal.com/comics/cat_vs_internet |
#904
|
||||
|
||||
![]()
What I was basically saying was, one time I was benchmarking code and 1 function had a loop that was incrementing by 4 instead of 1 for some reason. I'm pretty sure it was the compiler's fault, because I used the same type of loop in both functions I was comparing.
So ever since then, I've been writting the benchmark code in assembly. I usually just copy pasted the assembly output of 2 pieces of C code, so it wasn't too bad. Now I know that it's better to just use a benchmark program in general cases. I'll only write benchmark code when the differences are very small, because I believe it's more accurate when you write your own benchmark code. |
#905
|
||||
|
||||
![]() Quote:
"not at all I know the feeling, like a nice cozzy nap ZZZZzzzzzz ^_^Zzzzzz " I how he/you are feeling.... or maybe "that feeling" lol... ![]() ![]() ![]()
__________________
(PC Specs) CPU: AMD FX-9590 4.7GHz 8-core CPU Instructions: MMX, SSE1-4 Motherboard: Asus SABERTOOTH 990FX R2.0 GPU: nVidia GTX 1070 Ti 8GB GFX Drivers: Nvidia v441.66 OS: Windows 7 Ultimate 64-bit SP1 RAM: 32GB Kingston 1866MHz DDR3 Favorite Emulators: PS2 : PCSX2 (Auto-Builds) SNES : ZSNES My PJ64 setup: EXE : v2.4.0.1114 GFX : Project64-Video (v2.2.0.1114) SPU : AziAudioNEW (v0.70)(2017-09-14) INPUT : NRage(v2.5.3.1114) RSP : RSP (v1.7.4.1114) |
#906
|
||||
|
||||
![]()
So while profiling, I noticed I saw some interpreter functions being used in Ziggy's PJ64 RSP in recompiler mode. That explains my confusion as to why LLE gfx was faster with Ziggy's PJ64 RSP than PJ64 1.6's RSP and i think 2.1's too, yet was slower in LLE audio. Ziggy basically disabled a few more recompiler functions than the original 1.4 RSP, so that's why LLE audio was slower.
I looked at PJ64 2.1's RSP source and noticed it also had an incomplete recompiler. I'm happy though because that means there's more room for improvement than I thought! |
#907
|
||||
|
||||
![]()
I've been studying SSE and RSP a lot lately. It's been fun benchmarking and learning! I realized how inaccurate it is to just use a loop for benchmarks
![]() Anyway, I've also been using assembly more often again, I'm starting to like it a lot. I guess I'm just one of those people who need breaks after overdoing something ![]() I'm getting closer to understanding how recompilers work. I think I'm also going to look into static recompilation. I need to find out the RSP instruction sequence works. Man multitasking and taking breaks is amazing. I feel like I understand a lot more now. Lol just the other day, I realized that I can choose which type of RSP instructions I want to debug, by changing the RSP settings for Audio and Gfx ![]() Last edited by RPGMaster; 18th August 2014 at 11:04 AM. |
#908
|
||||
|
||||
![]()
No real advantage to it, just something I thought of doing over intrinsics.
![]() The only thing I'd really have to do in asm is optimized SSE2 shuffling; that's about it.
__________________
http://theoatmeal.com/comics/cat_vs_internet |
#909
|
||||
|
||||
![]()
I guess it's still worth doing in assembly then
![]() ![]() I wonder if self modifying code is worth implementing for the SSE2 shuffle. I might give it another try. My opinion of intrinsics has been changed ever since I saw that it ruined auto vectorization with Clang ;/ . Not sure when I'd use it anymore. Now I know that I'll have to either rely solely on auto vectorization or do all intrinsics in functions where mixing the 2 doesn't work well. I should probably start benchmarking before I get too picky about compiler output ![]() ![]() Last edited by RPGMaster; 21st August 2014 at 02:12 AM. |
#910
|
||||
|
||||
![]() Quote:
If the penalty/overhead associated with invalidating internal CPU structures after SMC is detected is low enough, the SSE2 shuffle technique should result in enough reduced overhead to realize noticeable performance improvements. |