Sunday, April 17, 2011

Alloca implementation

How does one implement alloca() using inline x86 assembler in languages like D, C, and C++? I want to create a slightly modified version of it, but first I need to know how the standard version is implemented. Reading the disassembly from compilers doesn't help because they perform so many optimizations, and I just want the canonical form.

Edit: I guess the hard part is that I want this to have normal function call syntax, i.e. using a naked function or something, make it look like the normal alloca().

Edit # 2: Ah, what the heck, you can assume that we're not omitting the frame pointer.

From stackoverflow
  • Alloca is easy, you just move the stack pointer up; then generate all the read/writes to point to this new block

    sub esp, 4
    
    jn_ : 1) it's not esi 2) stack grows from high to low adresses
  • I recommend the "enter" instruction. Available on 286 and newer processors (may have been available on the 186 as well, I can't remember offhand, but those weren't widely available anyways).

    Evan Teran : unfortunately, the enter instruction is fairly useless this purpose (implementing alloca in a higher level language) simply because you wouldn't get enough compiler cooperation.
  • alloca is directly implemented in assembly code. That's because you cannot control stack layout directly from high level languages.

    Also note that most implementation will perform some additional optimization like aligning the stack for performance reasons. The standard way of allocating stack space on X86 looks like this:

    sub esp, XXX
    

    Whereas XXX is the number of bytes to allcoate

    Edit:
    If you want to look at the implementation (and you're using MSVC) see alloca16.asm and chkstk.asm.
    The code in the first file basically aligns the desired allocation size to a 16 byte boundary. Code in the 2nd file actually walks all pages which would belong to the new stack area and touches them. This will possibly trigger PAGE_GAURD exceptions which are used by the OS to grow the stack.

  • It would be tricky to do this - in fact, unless you have enough control over the compiler's code generation it cannot be done entirely safely. Your routine would have to manipulate the stack, such that when it returned everything was cleaned, but the stack pointer remained in such a position that the block of memory remained in that place.

    The problem is that unless you can inform the compiler that the stack pointer is has been modified across your function call, it may well decide that it can continue to refer to other locals (or whatever) through the stack pointer - but the offsets will be incorrect.

  • implementing alloca actually requires compiler assistance. A few people here are saying it's as easy as:

    sub esp, <size>
    

    which is unfortunately only half of the picture. Yes that would "allocate space on the stack" but there are a few gotchas.

    1. if the compiler had emitted code which references other variables relative to esp instead of ebp (typical if you compile with no frame pointer). Then those references need to be adjusted.

    2. more importantly, by definition, space allocated with alloca must be "freed" when the function exits.

    The big one is point #2. Because you need the compiler to emit code to symetrically add <size> to esp at every exit point of the function.

    The most likely case is the compile offers some compiler intrinsics which allow library writers to ask the compiler for the help needed.

    EDIT:

    In fact, in glibc (GNU's implementation of libc). The implementation of alloca is simply this:

    #ifdef  __GNUC__
    # define __alloca(size) __builtin_alloca (size)
    #endif /* GCC.  */
    

    EDIT:

    after thinking about it, the minimum I believe would be required would be for the compiler to always use a frame pointer in any functions which use alloca, regardless of optimization settings. This would allow all locals to be referenced through ebp safely and the frame cleanup would be handled by restoring the frame pointer to esp.

    EDIT:

    So i did some experimenting with things like this:

    #include <stdlib.h>
    #include <string.h>
    #include <stdio.h>
    
    #define __alloca(p, N) \
        do { \
         __asm__ __volatile__( \
         "sub %1, %%esp \n" \
         "mov %%esp, %0 \n" \
          : "=m"(p) \
          : "i"(N) \
          : "esp"); \
        } while(0)
    
    int func() {
        char *p;
        __alloca(p, 100);
        memset(p, 0, 100);
        strcpy(p, "hello world\n");
        printf("%s\n", p);
    }
    
    int main() {
        func();
    }
    

    which unfortunately does not work correctly. After analyzing the assembly output by gcc. It appears that optimizations get in the way. The problem seems to be that since the compiler's optimizer is entirely unaware of my inline assembly, it has a habit of doing the things in an unexpected order and still referencing things via esp.

    Here's the resultant ASM:

    8048454: push   ebp
    8048455: mov    ebp,esp
    8048457: sub    esp,0x28
    804845a: sub    esp,0x64                      ; <- this and the line below are our "alloc"
    804845d: mov    DWORD PTR [ebp-0x4],esp
    8048460: mov    eax,DWORD PTR [ebp-0x4]
    8048463: mov    DWORD PTR [esp+0x8],0x64      ; <- whoops! compiler still referencing via esp
    804846b: mov    DWORD PTR [esp+0x4],0x0       ; <- whoops! compiler still referencing via esp
    8048473: mov    DWORD PTR [esp],eax   
    8048476: call   8048338 <memset@plt>
    804847b: mov    eax,DWORD PTR [ebp-0x4]
    804847e: mov    DWORD PTR [esp+0x8],0xd       ; <- whoops! compiler still referencing via esp
    8048486: mov    DWORD PTR [esp+0x4],0x80485a8 ; <- whoops! compiler still referencing via esp
    804848e: mov    DWORD PTR [esp],eax           ; <- whoops! compiler still referencing via esp
    8048491: call   8048358 <memcpy@plt>
    8048496: mov    eax,DWORD PTR [ebp-0x4]
    8048499: mov    DWORD PTR [esp],eax           ; <- whoops! compiler still referencing via esp
    804849c: call   8048368 <puts@plt>
    80484a1: leave
    80484a2: ret
    

    As you can see, it isn't so simple. Unfortunately, I stand by my original assertion that you need compiler assistance.

  • For the D programming language, the source code for alloca() comes with the download. How it works is fairly well commented. For dmd1, it's in /dmd/src/phobos/internal/alloca.d. For dmd2, it's in /dmd/src/druntime/src/compiler/dmd/alloca.d.

    dsimcha : Well, I guess that pretty much answers it. It says right in the comments that it's a magic function and requires compiler support, i.e. I can't do exactly what I wanted. Maybe I'll figure out a way to do it with the existing alloca() and mixins instead.
  • You can examine sources of an open-source C compiler, like Open Watcom, and find it yourself

  • The C and C++ standards don't specify that alloca() has to the use the stack, because alloca() isn't in the C or C++ standards (or POSIX for that matter)¹.

    A compiler may also implement alloca() using the heap. For example, the ARM RealView (RVCT) compiler's alloca() uses malloc() to allocate the buffer (referenced on their website here), and also causes the compiler to emit code that frees the buffer when the function returns. This doesn't require playing with the stack pointer, but still requires compiler support.

    Microsoft Visual C++ has a _malloca() function that uses the heap if there isn't enough room on the stack, but it requires the caller to use _freea(), unlike _alloca(), which does not need/want explicit freeing.

    (With C++ destructors at your disposal, you can obviously do the cleanup without compiler support, but you can't declare local variables inside an arbitrary expression so I don't think you could write an alloca() macro that uses RAII. Then again, apparently you can't use alloca() in some expressions (like function parameters) anyway.)

    ¹ Yes, it's legal to write an alloca() that simply calls system("/usr/games/nethack").

0 comments:

Post a Comment