#1  
Old 7th July 2013, 02:53 AM
HatCat's Avatar
HatCat HatCat is offline
Alpha Tester
Project Supporter
Senior Member
 
Join Date: Feb 2007
Location: In my hat.
Posts: 16,256
Default Surface DMA using GDI?

Does anyone know how to directly access the surface memory via the Windows Graphics Device Interface?

From what I'm understanding, DCs are what get treated as surfaces, but I don't see any methods for locking/unlocking them so that I can do offset-precision DMA on them.

In zilmar's Basic CFB plugin we DMA to the DirectDraw on-screen CFB surface like so:
(*assuming 32-bit big endian, no MemoryBSwaped)
Code:
    if (DD_CFB->Lock(NULL,&ddsd,DDLOCK_WAIT,NULL) != DD_OK) {
        //DisplayError(GFXInfo.hWnd,"Failed To lock Surface");
        return;
    }
    buffer = GFXInfo.RDRAM + (*GFXInfo.VI_ORIGIN_REG & 0x00FFFFFF);

    SurfBuf = (BYTE*)ddsd.lpSurface;
    for (count = 0; count < ((*GFXInfo.VI_WIDTH_REG >> 2) * 3); count++)
    {
        for (x = 0; x < (*GFXInfo.VI_WIDTH_REG << 2); x += 8, buffer+=4)
        {
            *(INT32 *)(SurfBuf + x + 0) = Convet16to32[*(UINT16 *)(buffer + 0)];
            *(INT32 *)(SurfBuf + x + 4) = Convet16to32[*(UINT16 *)(buffer + 2)];
        }
        SurfBuf+=ddsd.lPitch;
    }
    DD_CFB->Unlock(ddsd.lpSurface);
I tried setting SurfBuf as a byte pointer to a HDC (either the one fetched off Gfx_Info.hWnd or the one compatible as that), but the DLL crashes with Win32 addressing exception.

Do I need to write single pixels one at a time using GDI functions, or is there a way to replicate zilmar's DMA?

Do I really just need to give up and use a better API?
Reply With Quote
  #2  
Old 7th July 2013, 02:18 PM
HatCat's Avatar
HatCat HatCat is offline
Alpha Tester
Project Supporter
Senior Member
 
Join Date: Feb 2007
Location: In my hat.
Posts: 16,256
Thumbs up Good idea.

I started to think it was more appropriate to DMA them to an object of type HBITMAP, than to a type of HDC, since realistically speaking we can read from bitmaps in way more functions, and they're always image-specific.

Speaking of which, do you know of a quick way to capture the screen to a BMP file?

With SDL it was extremely simple. I could capture the screen with a single call:
http://www.libsdl.org/docs/html/sdlsavebmp.html

A flag in the RASTERCAPS returned from gdi32 GetDeviceCaps suggests there is a specific support for "saving" bitmaps?
Code:
#define RC_SAVEBITMAP       0x0040 // from WinGDI.h
However, this seems to be very poorly documented, and not discussed.
Reply With Quote
  #3  
Old 7th July 2013, 06:03 PM
HatCat's Avatar
HatCat HatCat is offline
Alpha Tester
Project Supporter
Senior Member
 
Join Date: Feb 2007
Location: In my hat.
Posts: 16,256
Red face Strange idea...

Sounds good, I will look into it.
I was hoping for a way to directly write out the HBITMAP binary contents to a .bmp file with maybe some manipulations to standardize it to the correct BMP file format.



How are you declaring `pixels`?

I'm guessing it should be working if declared as `int *pixels`?
I do want to port also, zilmar's 16-bpp compatibility code though, which means I might want it to be a pointer to short to do 16-bit transfers.

I'm going to see about adding support for 8 bits per pixel, too, since that's the minimum that Windows 7 compatibility options will let me force for testing.

Quote:
Originally Posted by suanyuan View Post
usually we don't use DMA to call this, since all memory accesses are still by CPU not by memory access controller.
Yeah, I only meant DMA in the more traditional sense.
I only needed an OS/API-free way of direct access to the RAM segment containing the screen pixels, without having to call a function just to R/W a pixel.
Reply With Quote
  #4  
Old 7th July 2013, 07:12 PM
HatCat's Avatar
HatCat HatCat is offline
Alpha Tester
Project Supporter
Senior Member
 
Join Date: Feb 2007
Location: In my hat.
Posts: 16,256
Lightbulb Improving idea

I have another bug on the way to this goal it seems.

Currently my plugin is unsuccessful at creating the DIB section:
Code:
int system_info[128];
/* Array of system device capabilities returned by GetDeviceCaps. */

HDC CFB;
HDC FBR;
HDC desktop;

int      PixelFormat, FullScreen, ChangingWindow = FALSE;
short    Convet16to16[65536];
long     Convet16to32[65536];
DWORD    ViStatus, ViWidth;
RECT     rcWindow;
GFX_INFO GFXInfo;

/* here is how I'm initializing the source and destination DCs */
void RomOpen(void)
{
    CFB = GetDC(GFXInfo.hWnd);
    if (CFB == NULL)
        DisplayError(GFXInfo.hWnd, "Could not get render window DC.");

    FBR = CreateCompatibleDC(CFB);
    if (FBR == NULL)
        DisplayError(GFXInfo.hWnd, "CreateCompatibleDC(CFB) == NULL");
    return;
}

UINT32 *cfb; // you called this "pixels"

void UpdateScreen(void)
{
BITMAPINFO bmi;
HBITMAP screen;
    DWORD count, x;
    BYTE* SurfBuf, *FB;
    const int pitch = (*GFXInfo.VI_WIDTH_REG + 0x003) & 0xFFC;

    if (CFB == NULL)
        return;
    memset(bmi.bmiColors, 0, sizeof(RGBQUAD));
    bmi.bmiHeader.biSize = sizeof(BITMAPINFOHEADER);
    bmi.bmiHeader.biWidth = ViWidth;
    bmi.bmiHeader.biHeight = (4*ViWidth)/3; // fixme:  wrong
    bmi.bmiHeader.biPlanes = 1;
    bmi.bmiHeader.biBitCount = PixelFormat;
    bmi.bmiHeader.biCompression = BI_RGB; // BI_BITFIELDS ?

    screen = CreateDIBSection(FBR, &bmi, DIB_RGB_COLORS, (void**)&cfb, NULL, 0);
    if ((cfb == NULL) | (screen == NULL))
    {
        DisplayError(NULL, "Failed to set DIB.");
        return;
    }
I don't yet figure why, but on multiple emulators it complains that this function always fails.

Quote:
Originally Posted by suanyuan View Post
I think Win7 still support 8-bit color depth anymore.

And it is tricky for 16-bit color depth, since there are RGBA5551, RGBA4444 and RGB565 pixel format.

You need to check the win32 SDK document for detail.
The way zilmar did it was he used DirectDraw surface format analysis to read in the data to a DDPIXELFORMAT structure.

Similarly, I was successful at counter-projecting his method using a GDI pixel format structure.

He only checks RGBA5551 and RGB565 formats though, not that RGBA4444 thing you mentioned. I have never heard of that.
Reply With Quote
  #5  
Old 7th July 2013, 07:40 PM
HatCat's Avatar
HatCat HatCat is offline
Alpha Tester
Project Supporter
Senior Member
 
Join Date: Feb 2007
Location: In my hat.
Posts: 16,256
Default

Yeah actually that was more of a plugin spec issue as it turns out.

When the thread for emulation began many emulators received the UpdateScreen request before VI_WIDTH_REG got set, so I was basically requesting a dynamic screen width of 0 and a screen height of 0, which it seems is why the function had failed.

When I check if ViWidth is NULL first before attempting the rest of the function it fixes that issue.

Quote:
Originally Posted by suanyuan View Post
(2) CreateDIBSection() should pass DC not a memory DC, a memory DC just exists in memory,
What's the difference between "DC" and "a memory DC"?

You declared hdc as type `HDC`.
I declared FBR, DC input for this function, as type `HDC`.
Reply With Quote
  #6  
Old 7th July 2013, 08:20 PM
HatCat's Avatar
HatCat HatCat is offline
Alpha Tester
Project Supporter
Senior Member
 
Join Date: Feb 2007
Location: In my hat.
Posts: 16,256
Cool Working idea

Sweet, memory access is working properly now with this code:

Code:
UINT32 *cfb;

void UpdateScreen(void)
{
    DWORD x, y;
    BYTE* SurfBuf, *FB;
    const int pitch = (*GFXInfo.VI_WIDTH_REG + 0x003) & 0xFFC;

    if (ViWidth == NULL)
        return;
    CFB = GetDC(GFXInfo.hWnd);
    if (CFB == NULL)
        DisplayError(GFXInfo.hWnd, "Could not get render window DC.");

    FBR = CreateCompatibleDC(CFB);
    if (FBR == NULL)
        DisplayError(GFXInfo.hWnd, "CreateCompatibleDC(CFB) == NULL");

    memset(bmi.bmiColors, 0, sizeof(RGBQUAD));
    bmi.bmiHeader.biSize = sizeof(BITMAPINFOHEADER);
    bmi.bmiHeader.biWidth = ViWidth;
    bmi.bmiHeader.biHeight = (4*ViWidth)/3; // fixme:  wrong
    bmi.bmiHeader.biPlanes = 1;
    bmi.bmiHeader.biBitCount = PixelFormat;
    bmi.bmiHeader.biCompression = BI_RGB; // BI_BITFIELDS ?

    screen = CreateDIBSection(FBR, &bmi, DIB_RGB_COLORS, (void**)&cfb, NULL, 0);
    if ((cfb == NULL) | (screen == NULL))
    {
        DisplayError(NULL, "Failed to set DIB.");
        return;
    }
    SelectObject(FBR, screen);
    FB = GFXInfo.RDRAM + (*GFXInfo.VI_ORIGIN_REG & 0x00FFFFFF);

    SurfBuf = (BYTE *)cfb;
    if (PixelFormat == 16)
    {
/*
 * omitted -- We don't care about this.
 * PixelFormat == 32
 */
    }
    else
    {
        for (y = 0; y < bmi.bmiHeader.biHeight; y++)
        {
            for (x = 0; x < bmi.bmiHeader.biWidth; x += 2, FB += 4)
            {
                UINT32 *p = &cfb[y*bmi.bmiHeader.biWidth + x];
#ifdef BIG_ENDIAN
                *p = Convet16to32[*(short *)(FB + 0)];
                *p = Convet16to32[*(short *)(FB + 2)];
#else
                *p = Convet16to32[*(short *)(FB + 2)];
                *p = Convet16to32[*(short *)(FB + 0)];
#endif
            }
            SurfBuf += pitch;
        }
    }
    DrawScreen();
    return;
}

void DrawScreen(void)
{
    const int x_start = rcWindow.left;
    const int y_start = rcWindow.top;
    const int x_end = rcWindow.right;
    const int y_end = rcWindow.bottom;
    const int x_delta = (x_end - x_start);
    const int y_delta = (y_end - y_start);

    BitBlt(CFB, 0, 0, 320, 240, FBR, 0, 0, SRCCOPY);
    return;
}
I know I didn't finish implementing your function-local GetDC/ReleaseDC cycle yet; I didn't want to apply this from the get-go this since UpdateScreen gets called very many times and I was afraid it would just cause more latency, but maybe the memory it frees may be worth it.

It is successfully blitting pixels now, just not the right ones.
I think I need to rewrite zilmar's "Convet" LUT palettes to use GDI's 0x00BBGGRR (or was it 00RRGGBB), instead of DirectDraw's 0xRRGGBBAA.

Also, slightly contrary to your suggestion, while I otherwise implementing your pixel plotter loop exactly the way you specified.
I figured we would do better to loop X instead of Y, not Y instead of X.
Three reasons:
  • X max is generally bigger than Y max (e.g. 640 > 480). You had it loop the smaller loop inside of the bigger loop. I was thinking the loop with more iterations (X) should be looped inside the loop with fewer iterations (Y) as a fair speed-up. This way, you DMA more pixels using fewer parent loop branches and increments done by Y.
  • Bitmaps, as documented at MSDN, are classifiable via both "rows" and "scanlines". The difference is that a row of pixels is linearly stored as consecutive bytes in the binary, while a "scanline" is the array-indexed offset into the BMP file format, usually inverting the actual row number since BMPs tend to be stored bottom-up, more often than not.
  • It makes more sense because pixels in row X, are contiguous memory addressable by your array pointer, whereas when incrementing Y instead of X, we have to jump several non-consecutive bytes and we lose the potential for doing some writes contiguously.
Reply With Quote
  #7  
Old 8th July 2013, 04:19 AM
HatCat's Avatar
HatCat HatCat is offline
Alpha Tester
Project Supporter
Senior Member
 
Join Date: Feb 2007
Location: In my hat.
Posts: 16,256
Default

With a little touching up to the endian-handler addressing modes, my GDI version of zilmar's CFB plugin is beginning to catch up to his video-memory-accelerated blitter.

Currently, Absolute Crap 2 speed benchmarks done manually by me:
  • zilmar's DirectDraw 4.0: 545 - 575 VI/s
  • Microsoft Windows GDI: 495 - 525 VI/s
Both plugins were built using the exact same compiler and IDE settings.


I can probably boost the speed a little more by handling the halfword endian swap within InitiateGFX and not UpdateScreen, so that we can optimize my original endian-hindered 32b loop:


Code:
    SelectObject(FBR, screen);
    FB = GFXInfo.RDRAM + (*GFXInfo.VI_ORIGIN_REG & 0x00FFFFFF);

    base = (unsigned char *)(scr);
    pitch = (bmi.bmiHeader.biWidth + 0x003) & 0xFFC; // Microsoft's rounder
    if (PixelFormat == 16)
    {
// not yet tested
    }
    else
    {
        pitch *= sizeof(INT32);
        for (scanline = 0; scanline < -bmi.bmiHeader.biHeight; scanline++)
        {
            for (x = 0; x < bmi.bmiHeader.biWidth; x += 2)
            {
#ifdef BIG_ENDIAN
                *(UINT32 *)(base + 4*x + 0) = palette_32[*(UINT16 *)(FB + 0)];
                *(UINT32 *)(base + 4*x + 4) = palette_32[*(UINT16 *)(FB + 2)];
#else
                *(UINT32 *)(base + 4*x + 0) = palette_32[*(UINT16 *)(FB + 2)];
                *(UINT32 *)(base + 4*x + 4) = palette_32[*(UINT16 *)(FB + 0)];
#endif
                FB += 4;
            }
            base += pitch;
        }
    }

To double the parallel-ness of the 16-bit writes into a 32-bit MIPS WORD write for 16-bit screens:
Code:
        for (scanline = 0; scanline < -bmi.bmiHeader.biHeight; scanline++)
        {
            for (x = 0; x < bmi.bmiHeader.biWidth; x++)
            {
                *(UINT32 *)(base + 4*x) = palette_32[*(UINT16 *)FB];
// MMX can now double this and merge these into a 64-bit write
                FB += 2;
            }
            base += pitch;
        }
Reply With Quote
  #8  
Old 8th July 2013, 07:03 AM
oddMLan's Avatar
oddMLan oddMLan is offline
Alpha Tester
Project Supporter
Senior Member
 
Join Date: Jan 2009
Location: Parappa Town
Posts: 210
Default

Offtopic but, damn FatCat, you seem to be very commited to squeeze every performance optimization possible
Reply With Quote
  #9  
Old 8th July 2013, 08:07 PM
HatCat's Avatar
HatCat HatCat is offline
Alpha Tester
Project Supporter
Senior Member
 
Join Date: Feb 2007
Location: In my hat.
Posts: 16,256
Default

It's a necessary evil.

Especially here, because it started out very unlikely that I could make a system-independent software renderer plugin that is faster than zilmar's DirectDraw CFB plugin, due to the gfx card memory access accelerations he took advantage of.

However, if I can make a basic CFB plugin at least equally as fast as his accelerated DirectDraw one, then we maintain this benefit with the ability to use it in practically any Windows machine, graphics-card-independent (unlike ddraw), and some extra portability. (GDI is a bit lower-level to the OS than DirectX is, so it's more simple to port to approximate APIs on other OSes.)

More of the benefit, I needed a much smaller code base than angrylion's to slowly learn about bit-by-bit implementations to RDP LLE and VI filtering, which are the fixations that the zilmar demonstration dll was free of.

More to the point, I was motivated to try this because there are no graphics plugins using the Win32 OS GDI.
It's always either Direct3D, DirectDraw, OpenGL, or rarely enough SDL (forgetting GLIDE).

I thought it was about time we needed a plugin that worked on any computer no matter what its hardware specs, and that the demonstration graphics plugin made by zilmar should have conformed to this level of simplicity. Too bad that in some ways, I admit DirectDraw is more simple than GDI...when you bar that component-object model C++ horseassery.
Reply With Quote
  #10  
Old 9th July 2013, 01:48 AM
HatCat's Avatar
HatCat HatCat is offline
Alpha Tester
Project Supporter
Senior Member
 
Join Date: Feb 2007
Location: In my hat.
Posts: 16,256
Default

This is getting even easier now.
I'm actually beginning to understand the basic CFB palette conversion reader.

Originally zilmar's CFB looping was two-dimensional.
We would loop x positioning into the bitmap scanline (inverse y) counter.
Code:
UINT16 *M = (UINT16 *)(API_SURFACE);

for (scanline = 0; scanline < VI_HEIGHT; scanline++)
{
    for (x = 0; x < VI_WIDTH; x++)
        M[x] = Convert16to16[*(UINT16 *)(buffer)];
    M += VI_WIDTH;
}
(With some slight syntax readability rewrites from the original plugin.)

I realized we could convert this into a one-dimensional loop for a small speed gain.
Code:
UINT16 *M = (UINT16 *)(API_SURFACE_PTR);

pitch = (VI_WIDTH + 0x003) & ~03;
limit = pitch * VI_HEIGHT;
for (cx = 0; cx < limit; cx++)
    M[cx] = Convert16to16[buffer[cx]];
To be more specific, 16- and 32-bit monitors emulate N6416 frame buffer copy more directly now:

Code:
void UpdateScreen(void)
{
// ... GDI32 library calls here for setting up the PC frame buffer ...
    switch (PixelFormat)
    {
        case 1: /* experimental -- any monochrome monitors to test? */
// discussing after this
        case 16:
        {
            UINT16 *M = (UINT16 *)(SFB);

            for (cx = 0; cx < limit; cx++)
                M[cx] = palette_16[FB[cx ^ 01]];
        }
        case 32:
        {
            UINT32 *M = (UINT32 *)(SFB);

            for (cx = 0; cx < limit; cx++)
                M[cx] = palette_32[FB[cx ^ 01]];
        }
    }
I have an idea how to implement this plugin for 1-bit monochrome monitor display settings (possible to set indirectly on Win32).

Mind you, most people in their right mind would not want to emulate N64 games on a black and white monitor.
Still, the concept is somewhat intriguing/educational to implement.

I was thinking we examine the MSB of the R/G/B color components, which will each be 5 bits in Color16.
Effectively, in a readable non-bit-wise way of displaying it,
Code:
unsigned char *M = (UINT8 *)(API_SURFACE);
M[cx]>>bitno = (R16 >= 32) | (G16 >= 32) | (B16 >= 32);
If N64 color is in Color32 format, it's !!(R32 & 128) | !!(G32 & 128) | !!(B32 & 128).

Though my idea there is somewhat flawed, because it means dark blue (RGB24 = 0x000080) would get converted to white in the CFB plugin's monochrome mode.
I guess the right way to do it is to compare the sum of (R+G+B) to the color average (128+128+128==0x000180). If GE, monochrome approximation of the color is white. If LT, it's black.

Last edited by HatCat; 9th July 2013 at 01:58 AM.
Reply With Quote
Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Forum Jump


All times are GMT. The time now is 10:30 PM.


Powered by vBulletin® Version 3.7.3
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.