Go Back   Project64 Forums > General Discussion > Open Discussion

Reply
 
Thread Tools Display Modes
  #771  
Old 21st May 2014, 05:07 PM
RPGMaster's Avatar
RPGMaster RPGMaster is offline
Alpha Tester
Project Supporter
Super Moderator
 
Join Date: Dec 2013
Posts: 1,983
Default

I do remember trying assembly through MSVC, since it had some MASM thing built in. I didn't know you could use that to mix with C code though. I bet it would be better to just use MASM for the asm file since it supports nice macros like declaring bytes in the code section. Although it's not as useful as it would be in a C compiler since my main reason for doing that would be to have the compiler not interfere with my inline assembly. I'm pretty sure there's other uses for declaring bytes in code section. Oh ya, maybe for optimized storage of variables xD. Placing them at the end of functions instead of padding with int3. It's also nice for aligning loops if the assembler doesn't support those obscure multibyte NOP's. Lol you know... I never found a convenient way to do stuff like *(int*)0x404040 = 24 in assembly lol.

What I don't get is why the compiler did better on a different source file of the same project. Maybe I should turn off PGO since I made a ton of changes. I need to figure out how to do stuff like MOVAPS in intrinsics or something. I'd prefer doing things like *(__m128*)(var1) = *(__m128*)(var2); though, but it sometimes uses MOVUPS which is annoying because I made sure it's aligned.

How do I do byteswap with intrinsics? Then I could get rid of some inline assembly in pj64 that's messing up the functions they're in. I noticed a compiler warning and got rid of one of the inline ASM pieces of code that had already commented out the bswap. Dunno why he left it there lol. This just makes it seem more like Zilmar was super busy back then, probably still is today lol.

I want to get rid of FPU stuff in the recompiler for sure, since it converts MIPS floating point to x86 FPU code. I bet I could also speed up the interpreter core using manual SSE instead of auto generated lol. Lol for the neg instruction in the interpreter I explicitly xor'd the sign bit and or'd the sign bit for ABS.

Idk if I'll even bother with PJ64's RSP, unless I completely understand how recompilers work and am able to improve it. It would be interesting to see how much faster your graphics plugin would run with a recompiler RSP. If I never get a full understanding of how recompilers work, I'd rather look at your RSP plugin, since it's more accurate and the interpreter is waaay faster xD. I wonder how fast 2.1's RSP interpreter is though. I should test that.

Edit: I decided to start abusing compiler specific features. Since PJ64 is pretty much stuck to MSVC, I might as well optimize it for MSVC . My initial problem with using __assume(0), aside from it not being portable, is that I'd have to write out all the cases. Well what I do now is have a program generate all the cases! Then the ones that are duplicates, will be highlighted in red, then I delete those and BOOM I have a perfect jump table!

Also, for 1964 0.85, I'm getting a constant 61.0 VI/s. Is there a way to fix it and set it to 60?

Last edited by RPGMaster; 21st May 2014 at 11:11 PM.
Reply With Quote
  #772  
Old 22nd May 2014, 02:14 AM
ReyVGM ReyVGM is offline
Project Supporter
Senior Member
 
Join Date: Mar 2014
Posts: 212
Default

The links on the OP are dead, btw.
Reply With Quote
  #773  
Old 22nd May 2014, 03:22 AM
the_randomizer's Avatar
the_randomizer the_randomizer is offline
Alpha Tester
Project Supporter
Senior Member
 
Join Date: Sep 2008
Location: USA
Posts: 1,127
Default

Yep, OP links are dead, they should probably be fixed ASAP, but that's just me Or possibly a different site like Dropbox where links never expire
__________________
My rig:
CPU: Intel Core i7 4470 3.4 GHz to 3.9 GHz
Video card:: MSI nVidia GTX 970 4 GB GDDR5
OS: Windows 7 Professional 64-bit
RAM: 16 GB DDR3 SDRAM 10600
HDD: 2 x Western Digital 1 TB HDDs
Monitor: 23" Asus Full HD LED

Oh, and Snes9x > Zsnes in every way
Reply With Quote
  #774  
Old 22nd May 2014, 05:09 AM
HatCat's Avatar
HatCat HatCat is offline
Alpha Tester
Project Supporter
Senior Member
 
Join Date: Feb 2007
Location: In my hat.
Posts: 16,247
Default

Nah that's alright, OP is a fag anyway.

The links aren't dead, just the entire frekkin' website is under maintenance.
http://filesend.net/

I was actually hoping the main site would be fixed soon, so that then the downloads would also come back up, but those people have a very slow development time budget I see.

Well, I didn't want to use Dropbox because then it's not possible for me to count downloads. I just think it's a neat little stat to see how many people download the plugin, although maybe I shouldn't care. Last I checked on FileSend, it was 1,000 something for the latest RSP 7-zip, but now I cannot check....

Quote:
Originally Posted by the_randomizer View Post
Yep, OP links are dead, they should probably be fixed ASAP, but that's just me
Yes, it is just you.
Or is it?

Do people actually care that my RSP plugin can no longer be downloaded (just atm :P), or is this all a bunch of hearsay he said; she said? I thought you said if there was anything about my RSP plugin I needed you to test, I should let you know. So if I did not let you know, why is the subject of this plugin only relevant now that the downloads are momentarily inaccessible?

Keep in mind this thread and its downloads have been up for over a year and a half. That's a lot of time to make sure everyone's downloaded my plugin.
Reply With Quote
  #775  
Old 22nd May 2014, 05:54 AM
HatCat's Avatar
HatCat HatCat is offline
Alpha Tester
Project Supporter
Senior Member
 
Join Date: Feb 2007
Location: In my hat.
Posts: 16,247
Default

Quote:
Originally Posted by RPGMaster View Post
I do remember trying assembly through MSVC, since it had some MASM thing built in. I didn't know you could use that to mix with C code though.
I'm sure it just externally calls ML/ML64 for asm in MS Visual Studio and CL.EXE for C code. I wouldn't say it's "built-in" if Visual Studio ships with ML externally; that would be somewhat redundant.

Quote:
Originally Posted by RPGMaster View Post
Lol you know... I never found a convenient way to do stuff like *(int*)0x404040 = 24 in assembly lol.
...Wait...huh??

Wtf was that lol

I think you modify sp or something, idk. Absolute addressing is the last thing I played around with in asm.

Quote:
Originally Posted by RPGMaster View Post
I'd prefer doing things like *(__m128*)(var1) = *(__m128*)(var2); though, but it sometimes uses MOVUPS which is annoying because I made sure it's aligned.
Then I would just use the former.

In fact with MMX I think you have to say *(__m64 *)(dst) = *(__m64 *)src, as I'm not sure I ever found an intrinsic for that.

Quote:
Originally Posted by RPGMaster View Post
How do I do byteswap with intrinsics?
dk;dc

Quote:
Originally Posted by RPGMaster View Post
Then I could get rid of some inline assembly in pj64 that's messing up the functions they're in.
Byte-swapping is just about the only damn reason anybody should ever have to use inline assembly inside a C project. Things like rotates, byte-swaps, exchanging bytes...things that on the C level always require an extra variable or temporary pre-buffer to avoid losing data.

There are some intrinsics for byte-swapping, but which one you use depends on which compiler you use. MM wrote some shit about it, but I didn't bother remembering because not once have I ever needed a byteswap operation in all of my C programming. Even with N64 emulators on a little-endian PC, I've always found ways to get around the byte endian issues without having to byte swap. Should almost never need to byte-swap something.

Code:
unsigned short flip_word(unsigned short ax)
{
    register unsigned short ret_slot;
    const unsigned char ah = (ax >> 8) & 0xFF;
    const unsigned char al = (ax >> 0) & 0xFF;

    ret_slot = (al << 8) | (ah << 0);
    return (ret_slot);
}
... or maybe ...

Code:
unsigned short rotate_word(unsigned short ax)
{
    register unsigned short ret_slot;

    ret_slot  = ax << 8;
    ret_slot |= ax >> 8;
    return (ret_slot);
}
You really have to byte-swap something? I say just use one of those.
Unless it's necessary to force an emission of XCHG or w/e in something performance critical you're doing.

Quote:
Originally Posted by RPGMaster View Post
I want to get rid of FPU stuff in the recompiler for sure, since it converts MIPS floating point to x86 FPU code. I bet I could also speed up the interpreter core using manual SSE instead of auto generated lol.
I don't know much about COP1 (are MIPS floating-point registers 32-bit or 64-bit? or...both...actually, haven't looked at N64 floating-point opcodes since 2009, jesus), but if they're 32-bit floating-point opcodes then I imagine just SSE1 would be sufficient. Project64 is released with builds assuming SSE2 hardware support anyway, which is a pretty decent assumption considering that any 64-bit-capable home PC includes SSE2 support (and tl;dr naive joke here about using a 64-bit PC to emulate the Nintendo 64 ).

Quote:
Originally Posted by RPGMaster View Post
Lol for the neg instruction in the interpreter I explicitly xor'd the sign bit and or'd the sign bit for ABS.
-(ax) is the same as (ax ^ ~0) - ~0 on 2's compliment, or just ~ax + 1.

Quote:
Originally Posted by RPGMaster View Post
It would be interesting to see how much faster your graphics plugin would run with a recompiler RSP.
Probably not all that much faster.
Not saying a recompiler version of my RSP would be slower than my interpreter, but the speed difference wouldn't be normal. My interpreter loop is mostly optimized. It is free from *some*, but not all, of the performance constraints of the constant fetch-decode-execute pattern of an interpreter cycle.

It does still branch at least twice for every RSP instruction though, so there's no doubt a recompiler version of my plugin would fundamentally be able to exceed the interpreter in speed.

Quote:
Originally Posted by RPGMaster View Post
If I never get a full understanding of how recompilers work, I'd rather look at your RSP plugin, since it's more accurate and the interpreter is waaay faster xD. I wonder how fast 2.1's RSP interpreter is though. I should test that.
I think maybe you shouldn't XD?

Might have mentioned something like this before, but Project64 2.1's RSP was not intended for speed. Well the recompiler was, but it doesn't quite outperform the SSSE3 build of my interpreter plugin, which is sometimes even faster than their recompiler in my observation.

Quote:
Originally Posted by RPGMaster View Post
Also, for 1964 0.85, I'm getting a constant 61.0 VI/s. Is there a way to fix it and set it to 60?
That's weird as shit. I only get 60. Sure you didn't activate the +/- speed limit adjust feature 1964 has or something?
Reply With Quote
  #776  
Old 22nd May 2014, 06:52 AM
RPGMaster's Avatar
RPGMaster RPGMaster is offline
Alpha Tester
Project Supporter
Super Moderator
 
Join Date: Dec 2013
Posts: 1,983
Default

That example I posted "*(int*)0x404040 = 24" was just a way to assign the value of 24, to the address of 0x404040. I'm not sure if the syntax I posted was exactly correct, but using addresses can be useful. I did that for read process memory since I didn't feel like initializing a variable, just to give it the address of a another variable.

Lol for byte swapping, MSVC does a terrible job with compiling the xor swap method, which Zilmar does sometimes use. I'm really confused why he did assembly for a byte swap in a few parts, then C for byte swapping in other parts of his code. If I can't efficiently byte swap in C, then I'll just leave his asm code there lol. One of the major things that need byte swapping, is roms.

LOL I agree with you about the download count. I shouldn't use dropbox anymore .

Now I need to learn more about SSE. For things like convert float to int for his core interpreter, he used inline assembly to do things like
_asm fld var
_asm fistp var
So I'm wondering if theres a faster way to do stuff like that in sse . Man I was looking at the recompiler code and got lost .

For 1964 0.8.5, the only options I see if Counter Factor and Video Speed Sync. When I use jabo's video plugin, I get around 61 VI's . Oh well, if I can't fix it, then hopefully I can implement some of its pro's to another emulator.
Reply With Quote
  #777  
Old 22nd May 2014, 07:12 AM
HatCat's Avatar
HatCat HatCat is offline
Alpha Tester
Project Supporter
Senior Member
 
Join Date: Feb 2007
Location: In my hat.
Posts: 16,247
Default

What the hell is fistp? Can't any of these FP mnemonics not be sexual

Quote:
Originally Posted by RPGMaster View Post
LOL I agree with you about the download count. I shouldn't use dropbox anymore .
Oh I use DropBox all the time really. I just couldn't think of a way to track download counts other than a) attaching them to OP of this thread b) using FileSend.net or something.

Quote:
Originally Posted by RPGMaster View Post
Now I need to learn more about SSE. For things like convert float to int for his core interpreter, he used inline assembly to do things like <poop>
_mm_cvt something.
Check the xmmintrin.h header file for a list of conversion intrinsics. Some of those will handle converting between XMM float and XMM int or XMM float to scalar register int etc. if that's what you were wondering. The names are all funky though, so I don't remember any of them.
Reply With Quote
  #778  
Old 22nd May 2014, 07:40 AM
RPGMaster's Avatar
RPGMaster RPGMaster is offline
Alpha Tester
Project Supporter
Super Moderator
 
Join Date: Dec 2013
Posts: 1,983
Default

FPU stuff makes no sense. Fistp stores the rounded to integer value of st(0) into a variable and pops the FPU stack. The irony is, it's faster to do fld var, fistp var, rather than using the FRNDINT instruction (which rounds st(0) to the nearest integer). I practiced FPU in assembly, months back .

Rofl I've been avoiding intrinsics just because I hate the ugly names xD. I'll look into it though. I wish I was better at programming . I'm trying to understand both the interpreter core and recompiler core, but it just blows my mind at this point lol. Plus it's harder to read c++ (I'm comparing 2.1 to 1.6).

Do you know anything that 1.6 is > 2.1 at? Now that I realized that 2.1's core is better, idk if I want to continue using 1.6 anymore. Seeing 1964's interpreter's speed has given me hope as well. Although I still couldn't get 60fps with both interpreter and Glide64.

Edit: Lol wth... At first, I was getting around 9-12% of the emulator's cpu usage to r3400i while I'm getting 61 VI/s on pj64 2.1 . Now the cpu is ~9-15% and 60 VI/s. Idk what's going on!

Last edited by RPGMaster; 22nd May 2014 at 08:44 AM.
Reply With Quote
  #779  
Old 22nd May 2014, 08:45 AM
HatCat's Avatar
HatCat HatCat is offline
Alpha Tester
Project Supporter
Senior Member
 
Join Date: Feb 2007
Location: In my hat.
Posts: 16,247
Default

Sure it's not some weird audio plugin you're using or something? I've never gotten 61.0 VI/s before as far as I recall.
Reply With Quote
  #780  
Old 22nd May 2014, 08:49 AM
RPGMaster's Avatar
RPGMaster RPGMaster is offline
Alpha Tester
Project Supporter
Super Moderator
 
Join Date: Dec 2013
Posts: 1,983
Default

Ya i just found out, that it was the sync audio option in the sound plugin. I still dunno why that would mess up the emulator's cpu core though.

The problem is, when I leave sync audio and fixed audio timing both on, I get ~58-59 fps, so I uncheck those ;/ . When i disable sync audio in the audio plugin settings, I got 61 VI's again, but a more stable cpu usage.
Reply With Quote
Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Forum Jump


All times are GMT. The time now is 04:42 PM.


Powered by vBulletin® Version 3.7.3
Copyright ©2000 - 2018, Jelsoft Enterprises Ltd.