Go Back   Project64 Forums > General Discussion > Open Discussion

Reply
 
Thread Tools Display Modes
  #1081  
Old 24th October 2014, 09:44 AM
RPGMaster's Avatar
RPGMaster RPGMaster is offline
Alpha Tester
Project Supporter
Super Moderator
 
Join Date: Dec 2013
Posts: 2,008
Default

Anyway, i decided to just do some benchmarks out of curiosity. Gotta say I'm somewhat surprised about conker. I guess it was hard for me to measure, due to the VI/s fluxing like crazy in the intro. So I compiled your latest rsp and used your latest gfx plugin, here are the results, using No Audio + HLE audio enabled and refresh set to 2 and triangles set to 0. I left conker on for about 3 mins and 45 seconds, if that matters. For kirby, i just loaded a savestate and stood still. For F-zero, i loaded a save state and played the first level.

Conker


Kirby


F-zero


As for kirby, not surprised that it's using ~40% since it's the game I noticed the most significant difference. For Conker, looks like it's heavy on both RDP and RSP. After some careful eye balling, I can definitely confirm RSP makes a big difference. Just harder to see cause of all the VI/s fluxing! As for F-zero, no surprise at all.

Also, i tried manually using your config, to enable HLE audio. Dunno what happened but couldn't get it working, so I just edited source to always enable HLE audio .

Perhaps i should actually play conker, to see how important RSP is in gameplay. Intros aren't too important to me xD. + they are just too hard to benchmark!

I wish I knew a good way to benchmark recompiler lol. Honestly I used to not like relying on triangle skip, but with focus, it can prove to be useful. Still for benchmarking RSP, I still think that dll should be made. No Audio has been a great benchmarking plugin for LLE audio.

Edit: went back and fixed the rsp config. I just added in
Code:
(mode[0] == 'r') ? OPEN_EXISTING : CREATE_ALWAYS,
(mode[0] == 'r') ? FILE_ATTRIBUTE_NORMAL : FILE_FLAG_WRITE_THROUGH,

Last edited by RPGMaster; 24th October 2014 at 11:26 AM.
Reply With Quote
  #1082  
Old 25th October 2014, 02:07 AM
oddMLan's Avatar
oddMLan oddMLan is offline
Alpha Tester
Project Supporter
Senior Member
 
Join Date: Jan 2009
Location: Parappa Town
Posts: 210
Default

Quote:
Originally Posted by RPGMaster View Post
Perhaps i should actually play conker
Yes, you should.
Reply With Quote
  #1083  
Old 25th October 2014, 02:55 AM
HatCat's Avatar
HatCat HatCat is offline
Alpha Tester
Project Supporter
Senior Member
 
Join Date: Feb 2007
Location: In my hat.
Posts: 16,236
Default

This is one polite way of putting it.
Reply With Quote
  #1084  
Old 27th October 2014, 09:06 PM
RPGMaster's Avatar
RPGMaster RPGMaster is offline
Alpha Tester
Project Supporter
Super Moderator
 
Join Date: Dec 2013
Posts: 2,008
Default

After finishing up all my optimizations, I decided to benchmark WDC. That game is odd ;/ . For some reason my recompiler was extra slow, same with stunt racer. Turns out it's something I did with LQV and SDV ;/ . Time for me to move onto another project though. RSP was a fun project to work on .

Anyway, here's a benchmark for WDC using your latest source.


Lol im thinking I should have prolly benchmarked using your older source ;/ . I may go back and do that, later on.

Anyone interested in more game benchmarks?

Last edited by RPGMaster; 28th October 2014 at 12:38 AM.
Reply With Quote
  #1085  
Old 28th October 2014, 06:21 PM
HatCat's Avatar
HatCat HatCat is offline
Alpha Tester
Project Supporter
Senior Member
 
Join Date: Feb 2007
Location: In my hat.
Posts: 16,236
Default

I was just working with VMADN and saw that all the games I checked behaved the exact same way even if I do not emulate signed clamping. It goes as an unexploited bug (possibly even unexploit-able with the way these games do shit).

VMADN without signed clamping:
Code:
_VMADN:
	movdqa	xmm3, xmm0
	movdqa	xmm2, xmm0
	pmullw	xmm3, xmm1
	pmulhuw	xmm2, xmm1
	psraw	xmm1, 15
	pxor	xmm4, xmm4
	pand	xmm0, xmm1
	psubw	xmm2, xmm0
	movdqa	xmm0, XMMWORD PTR _VACC+32
	paddw	xmm0, xmm3
	movdqa	xmm1, xmm0
	psubusw	xmm1, xmm3
	pcmpeqw	xmm3, xmm0
	pcmpeqw	xmm1, xmm4
	pandn	xmm3, xmm1
	psubw	xmm2, xmm3
	movdqa	xmm3, XMMWORD PTR _VACC+16
	paddw	xmm3, xmm2
	movdqa	XMMWORD PTR _VACC+32, xmm0
	movdqa	XMMWORD PTR _VACC+16, xmm3
	movdqa	xmm1, xmm3
	psubusw	xmm1, xmm2
	pcmpeqw	xmm3, xmm2

	psraw	xmm2, 15
	paddw	xmm2, XMMWORD PTR _VACC
	pcmpeqw	xmm1, xmm4
	pandn	xmm3, xmm1
	psubw	xmm2, xmm3
	movdqa	XMMWORD PTR _VACC, xmm2
	ret
with signed clamping:
Code:
_VMADN:
	movdqa	xmm3, xmm0
	movdqa	xmm2, xmm0
	pmullw	xmm3, xmm1
	pmulhuw	xmm2, xmm1
	psraw	xmm1, 15
	pxor	xmm4, xmm4
	pand	xmm0, xmm1
	psubw	xmm2, xmm0
	movdqa	xmm0, XMMWORD PTR _VACC+32
	paddw	xmm0, xmm3
	movdqa	xmm1, xmm0
	psubusw	xmm1, xmm3
	pcmpeqw	xmm3, xmm0
	pcmpeqw	xmm1, xmm4
	pandn	xmm3, xmm1
	psubw	xmm2, xmm3
	movdqa	xmm1, XMMWORD PTR _VACC+16
	paddw	xmm1, xmm2
	movdqa	XMMWORD PTR _VACC+32, xmm0
	movdqa	XMMWORD PTR _VACC+16, xmm1
	movdqa	xmm3, xmm1
	psubusw	xmm3, xmm2
	movdqa	xmm6, xmm1
	pcmpeqw	xmm3, xmm4

	movdqa	xmm4, xmm2
	psraw	xmm2, 15
	paddw	xmm2, XMMWORD PTR _VACC
	pcmpeqw	xmm4, xmm1
	movdqa	xmm5, xmm4
	pandn	xmm5, xmm3
	movdqa	xmm3, xmm1
	psubw	xmm2, xmm5
	punpckhwd	xmm3, xmm2
	movdqa	XMMWORD PTR _VACC, xmm2
	punpcklwd	xmm6, xmm2
	movdqa	xmm2, xmm6
	packssdw	xmm2, xmm3
	pcmpeqw	xmm2, xmm1
	pand	xmm0, xmm2
	ret
with signed clamping but compiled with MSVC 2013, not GCC:
Code:
_VMADN	PROC
	push	ebp
	mov	ebp, esp
	and	esp, -8
	movdqa	xmm7, XMMWORD PTR _VACC+32
	movdqa	xmm2, xmm0
	movdqa	xmm6, XMMWORD PTR _VACC+16
	movdqa	xmm4, xmm0
	pmullw	xmm2, xmm1
	xorps	xmm5, xmm5
	pmulhuw	xmm4, xmm1
	psraw	xmm1, 15				; 0000000fH
	pand	xmm1, xmm0
	paddw	xmm7, xmm2
	psubw	xmm4, xmm1
	movdqa	XMMWORD PTR _VACC+32, xmm7
	movdqa	xmm0, xmm7
	movdqa	xmm1, xmm7
	psubusw	xmm0, xmm2
	pcmpeqw	xmm1, xmm2
	pcmpeqw	xmm0, xmm5
	pandn	xmm1, xmm0
	psubw	xmm4, xmm1
	paddw	xmm6, xmm4
	movdqa	xmm3, xmm4
	movdqa	xmm0, xmm6
	psraw	xmm3, 15
	psubusw	xmm0, xmm4
	movdqa	XMMWORD PTR _VACC+16, xmm6
	paddw	xmm3, XMMWORD PTR _VACC
	pcmpeqw	xmm0, xmm5
	movdqa	xmm1, xmm6
	movdqa	xmm2, xmm6
	pcmpeqw	xmm1, xmm4
	movdqa	xmm4, xmm6
	pandn	xmm1, xmm0
	psubw	xmm3, xmm1
	punpckhwd xmm4, xmm3
	punpcklwd xmm2, xmm3
	packssdw xmm2, xmm4
	pcmpeqw	xmm4, xmm4
	pcmpeqw	xmm6, xmm2
	movdqa	XMMWORD PTR _VACC, xmm3
	pxor	xmm4, xmm6
	movdqa	xmm1, xmm6
	movdqa	xmm0, xmm4
	pand	xmm1, xmm7
	pand	xmm0, xmm2
	psllw	xmm4, 15
	por	xmm0, xmm1
	pxor	xmm0, xmm4
	mov	esp, ebp
	pop	ebp
	ret	0
_VMADN	ENDP
Reply With Quote
  #1086  
Old 28th October 2014, 11:36 PM
HatCat's Avatar
HatCat HatCat is offline
Alpha Tester
Project Supporter
Senior Member
 
Join Date: Feb 2007
Location: In my hat.
Posts: 16,236
Default

Did some more profiling, using original z64gl r17 build in MinGW to test.
Even with VMADL and VMADN (the latter being by far at the top of the latent opcodes list, about the same position as SHUFFLE_VECTOR) optimized the living crap out of, it's still incredible how often VMADN and VMADH are used to the point where they're still bottlenecks even after fixing GCC's inability to auto-vectorize them.

1024x768 Screenshot of Profiler Results: http://ft.trillian.im/785abe041074fd...c3ir2zJdmG.jpg

It even lists some other familiar friends, like VMUDL, which this time if I may say for certain for those who remember me already claiming this honestly a year ago , really can't be optimized any further.
Reply With Quote
  #1087  
Old 29th October 2014, 01:28 AM
theboy181's Avatar
theboy181 theboy181 is offline
Alpha Tester
Project Supporter
Senior Member
 
Join Date: Aug 2014
Location: Prince Rupert,British Columbia Canada
Posts: 424
Default

Are you saying that that there just wont be anything worth optimizing at this point, because if these bottlenecks?

How is your RSP? Is it faster than the last BIN on page one, and will we see full speed on more games with a 4GHz system?

Last edited by theboy181; 29th October 2014 at 01:33 AM.
Reply With Quote
  #1088  
Old 29th October 2014, 09:43 AM
Frank74's Avatar
Frank74 Frank74 is offline
Alpha Tester
Project Supporter
Senior Member
 
Join Date: Aug 2013
Location: UK
Posts: 828
Default

Wondering why rsp.dll crashes with an exception, but rsp_sse2.dll works fine.

My processor is supposed to support sse3.
Code:
Processor 1			ID = 0
	Number of cores		2 (max 2)
	Number of threads	2 (max 2)
	Name			Intel Pentium D 820
	Codename		SmithField
	Specification		Intel(R) Pentium(R) D CPU 2.80GHz
	Package (platform ID)	Socket 775 LGA (0x4)
	CPUID			F.4.7
	Extended CPUID		F.4
	Core Stepping		B0
	Technology		90 nm
	Core Speed		2792.8 MHz
	Multiplier x Bus Speed	14.0 x 199.5 MHz
	Rated Bus speed		798.0 MHz
	Stock frequency		2800 MHz
	Instructions sets	MMX, SSE, SSE2, SSE3, EM64T
	L1 Data cache		2 x 16 KBytes, 8-way set associative, 64-byte line size
	Trace cache		2 x 12 Kuops, 8-way set associative
	L2 cache		2 x 1024 KBytes, 8-way set associative, 64-byte line size
	FID/VID Control		no
Reply With Quote
  #1089  
Old 29th October 2014, 09:51 AM
RPGMaster's Avatar
RPGMaster RPGMaster is offline
Alpha Tester
Project Supporter
Super Moderator
 
Join Date: Dec 2013
Posts: 2,008
Default

Quote:
Originally Posted by Frank74 View Post
Wondering why rsp.dll crashes with an exception, but rsp_sse2.dll works fine.

My processor is supposed to support sse3.
rsp.dll actually requires SSSE3. It needs it for instructions like pshufb.
Reply With Quote
  #1090  
Old 29th October 2014, 09:54 AM
Frank74's Avatar
Frank74 Frank74 is offline
Alpha Tester
Project Supporter
Senior Member
 
Join Date: Aug 2013
Location: UK
Posts: 828
Default

Quote:
Originally Posted by RPGMaster View Post
rsp.dll actually requires SSSE3. It needs it for instructions like pshufb.
Aha!

Must have a touch of dyslexia, reading SSSE3 as SSE3.

Hopefully I'll be getting a new PC end of the year.
Reply With Quote
Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Forum Jump


All times are GMT. The time now is 11:15 PM.


Powered by vBulletin® Version 3.7.3
Copyright ©2000 - 2022, Jelsoft Enterprises Ltd.