我刚刚完成了工作面试的一部分测试,有一个问题难住了我,甚至用谷歌作为参考。我想看看StackOverflow的工作人员可以做什么:

memset_16aligned函数需要传递给它一个16字节的对齐指针,否则它将崩溃。 a)如何分配1024字节的内存,并将其对齐到16字节的边界? b)在memset_16aligned执行后释放内存。

{    
   void *mem;
   void *ptr;

   // answer a) here

   memset_16aligned(ptr, 0, 1024);

   // answer b) here    
}

当前回答

您还可以尝试posix_memalign()(当然是在POSIX平台上)。

其他回答

也许他们会满足于memalign的知识?正如乔纳森·莱弗勒(Jonathan Leffler)指出的,有两个更新的更可取的函数需要了解。

哦,弗罗林先我一步。但是,如果您阅读了我链接到的手册页,您很可能会理解前面的帖子提供的示例。

使用memalign, Aligned-Memory-Blocks可能是解决这个问题的好方法。

原来的答案

{
    void *mem = malloc(1024+16);
    void *ptr = ((char *)mem+16) & ~ 0x0F;
    memset_16aligned(ptr, 0, 1024);
    free(mem);
}

固定的答案

{
    void *mem = malloc(1024+15);
    void *ptr = ((uintptr_t)mem+15) & ~ (uintptr_t)0x0F;
    memset_16aligned(ptr, 0, 1024);
    free(mem);
}

按要求解释

The first step is to allocate enough spare space, just in case. Since the memory must be 16-byte aligned (meaning that the leading byte address needs to be a multiple of 16), adding 16 extra bytes guarantees that we have enough space. Somewhere in the first 16 bytes, there is a 16-byte aligned pointer. (Note that malloc() is supposed to return a pointer that is sufficiently well aligned for any purpose. However, the meaning of 'any' is primarily for things like basic types — long, double, long double, long long, and pointers to objects and pointers to functions. When you are doing more specialized things, like playing with graphics systems, they can need more stringent alignment than the rest of the system — hence questions and answers like this.)

The next step is to convert the void pointer to a char pointer; GCC notwithstanding, you are not supposed to do pointer arithmetic on void pointers (and GCC has warning options to tell you when you abuse it). Then add 16 to the start pointer. Suppose malloc() returned you an impossibly badly aligned pointer: 0x800001. Adding the 16 gives 0x800011. Now I want to round down to the 16-byte boundary — so I want to reset the last 4 bits to 0. 0x0F has the last 4 bits set to one; therefore, ~0x0F has all bits set to one except the last four. Anding that with 0x800011 gives 0x800010. You can iterate over the other offsets and see that the same arithmetic works.

最后一步free()很简单:你总是且只会返回给free()一个malloc()、calloc()或realloc()返回给你的值——其他任何步骤都是灾难。你正确地为我提供了那个值-谢谢。自由释放它。

最后,如果您了解系统的malloc包的内部结构,您可能会猜测它很可能返回16字节对齐的数据(也可能是8字节对齐的)。如果它是16字节对齐的,那么您就不需要对值进行丁克。然而,这是狡猾的和不可移植的-其他malloc包有不同的最小对齐,因此假设一件事当它做不同的事情时将导致核心转储。在广泛的范围内,这个解决方案是可移植的。

还有人提到posix_memalign()是获得对齐内存的另一种方法;并不是所有地方都可以使用它,但通常可以使用它作为基础来实现。注意,对齐是2的幂,这很方便;其他的结盟则更为混乱。

还有一条注释——这段代码不会检查分配是否成功。

修正案

Windows Programmer pointed out that you can't do bit mask operations on pointers, and, indeed, GCC (3.4.6 and 4.3.1 tested) does complain like that. So, an amended version of the basic code — converted into a main program, follows. I've also taken the liberty of adding just 15 instead of 16, as has been pointed out. I'm using uintptr_t since C99 has been around long enough to be accessible on most platforms. If it wasn't for the use of PRIXPTR in the printf() statements, it would be sufficient to #include <stdint.h> instead of using #include <inttypes.h>. [This code includes the fix pointed out by C.R., which was reiterating a point first made by Bill K a number of years ago, which I managed to overlook until now.]

#include <assert.h>
#include <inttypes.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

static void memset_16aligned(void *space, char byte, size_t nbytes)
{
    assert((nbytes & 0x0F) == 0);
    assert(((uintptr_t)space & 0x0F) == 0);
    memset(space, byte, nbytes);  // Not a custom implementation of memset()
}

int main(void)
{
    void *mem = malloc(1024+15);
    void *ptr = (void *)(((uintptr_t)mem+15) & ~ (uintptr_t)0x0F);
    printf("0x%08" PRIXPTR ", 0x%08" PRIXPTR "\n", (uintptr_t)mem, (uintptr_t)ptr);
    memset_16aligned(ptr, 0, 1024);
    free(mem);
    return(0);
}

这里是一个稍微一般化的版本,它适用于2的幂的大小:

#include <assert.h>
#include <inttypes.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

static void memset_16aligned(void *space, char byte, size_t nbytes)
{
    assert((nbytes & 0x0F) == 0);
    assert(((uintptr_t)space & 0x0F) == 0);
    memset(space, byte, nbytes);  // Not a custom implementation of memset()
}

static void test_mask(size_t align)
{
    uintptr_t mask = ~(uintptr_t)(align - 1);
    void *mem = malloc(1024+align-1);
    void *ptr = (void *)(((uintptr_t)mem+align-1) & mask);
    assert((align & (align - 1)) == 0);
    printf("0x%08" PRIXPTR ", 0x%08" PRIXPTR "\n", (uintptr_t)mem, (uintptr_t)ptr);
    memset_16aligned(ptr, 0, 1024);
    free(mem);
}

int main(void)
{
    test_mask(16);
    test_mask(32);
    test_mask(64);
    test_mask(128);
    return(0);
}

要将test_mask()转换为通用分配函数,分配器的单个返回值必须对发布地址进行编码,正如一些人在他们的回答中所指出的那样。

与面试官的问题

Uri评论道:也许今天早上我的阅读理解有问题,但如果面试问题明确地说:“你如何分配1024字节的内存”,而你分配的内存显然不止这个数。这难道不是面试官的自动失败吗?

我的回答写不进300字的评论……

我想这要看情况。我想大多数人(包括我)认为这个问题的意思是“你将如何分配一个可以存储1024字节数据的空间,其中基址是16字节的倍数”。如果面试官真正的意思是如何分配1024字节(仅)并将其对齐为16字节,那么选择就更有限了。

Clearly, one possibility is to allocate 1024 bytes and then give that address the 'alignment treatment'; the problem with that approach is that the actual available space is not properly determinate (the usable space is between 1008 and 1024 bytes, but there wasn't a mechanism available to specify which size), which renders it less than useful. Another possibility is that you are expected to write a full memory allocator and ensure that the 1024-byte block you return is appropriately aligned. If that is the case, you probably end up doing an operation fairly similar to what the proposed solution did, but you hide it inside the allocator.

然而,如果面试官期待这两种回答中的任何一种,我希望他们能意识到这个答案回答了一个密切相关的问题,然后重新组织他们的问题,把谈话引向正确的方向。(此外,如果面试官真的很暴躁,那么我就不会想要这份工作;如果对一个不够精确的要求的回答没有得到纠正就被猛烈抨击,那么这个面试官就不是一个安全的雇主。)

世界在前进

问题的题目最近变了。把我难住的是解决C语言中的记忆对齐问题。修改后的标题(如何仅使用标准库分配对齐内存?)需要一个稍微修改的答案-这个附录提供了它。

C11 (ISO/IEC 9899:2011)添加函数aligned_alloc():

7.22.3.1 The aligned_alloc function Synopsis #include <stdlib.h> void *aligned_alloc(size_t alignment, size_t size); Description The aligned_alloc function allocates space for an object whose alignment is specified by alignment, whose size is specified by size, and whose value is indeterminate. The value of alignment shall be a valid alignment supported by the implementation and the value of size shall be an integral multiple of alignment. Returns The aligned_alloc function returns either a null pointer or a pointer to the allocated space.

POSIX定义了posix_memalign():

#include <stdlib.h> int posix_memalign(void **memptr, size_t alignment, size_t size); DESCRIPTION The posix_memalign() function shall allocate size bytes aligned on a boundary specified by alignment, and shall return a pointer to the allocated memory in memptr. The value of alignment shall be a power of two multiple of sizeof(void *). Upon successful completion, the value pointed to by memptr shall be a multiple of alignment. If the size of the space requested is 0, the behavior is implementation-defined; the value returned in memptr shall be either a null pointer or a unique pointer. The free() function shall deallocate memory that has previously been allocated by posix_memalign(). RETURN VALUE Upon successful completion, posix_memalign() shall return zero; otherwise, an error number shall be returned to indicate the error.

现在可以使用其中一个或两个函数来回答问题,但在最初回答问题时,只有POSIX函数是一个选项。

在幕后,新的对齐内存函数所做的工作与问题中概述的基本相同,只是它们能够更容易地强制对齐,并在内部跟踪对齐内存的开始,这样代码就不必特别处理—它只是释放使用的分配函数返回的内存。

三个稍微不同的答案取决于你如何看待这个问题:

1) Jonathan Leffler的解决方案很好地回答了这个问题,除了要四舍五入到16对齐,你只需要额外的15个字节,而不是16个。

A:

/* allocate a buffer with room to add 0-15 bytes to ensure 16-alignment */
void *mem = malloc(1024+15);
ASSERT(mem); // some kind of error-handling code
/* round up to multiple of 16: add 15 and then round down by masking */
void *ptr = ((char*)mem+15) & ~ (size_t)0x0F;

B:

free(mem);

2)对于一个更通用的内存分配函数,调用者不需要跟踪两个指针(一个使用,一个释放)。因此,在对齐的缓冲区下面存储一个指向“真实”缓冲区的指针。

A:

void *mem = malloc(1024+15+sizeof(void*));
if (!mem) return mem;
void *ptr = ((char*)mem+sizeof(void*)+15) & ~ (size_t)0x0F;
((void**)ptr)[-1] = mem;
return ptr;

B:

if (ptr) free(((void**)ptr)[-1]);

注意,与(1)中只向mem添加了15个字节不同,如果您的实现恰好保证了malloc的32字节对齐(不太可能,但理论上C实现可以有32字节对齐类型),那么这段代码实际上可以减少对齐。如果您所做的只是调用memset_16aligned,那么这并不重要,但如果您为结构体使用内存,那么这可能很重要。

我不确定一个好的修复是什么(除了警告用户返回的缓冲区不一定适合任意结构),因为没有办法通过编程确定特定于实现的对齐保证是什么。我猜在启动时,您可以分配两个或更多的1字节缓冲区,并假设您看到的最糟糕的对齐方式是保证对齐方式。如果你错了,你就浪费了记忆。谁有更好的主意,请说出来…

[Added: The 'standard' trick is to create a union of 'likely to be maximally aligned types' to determine the requisite alignment. The maximally aligned types are likely to be (in C99) 'long long', 'long double', 'void *', or 'void (*)(void)'; if you include <stdint.h>, you could presumably use 'intmax_t' in place of long long (and, on Power 6 (AIX) machines, intmax_t would give you a 128-bit integer type). The alignment requirements for that union can be determined by embedding it into a struct with a single char followed by the union:

struct alignment
{
    char     c;
    union
    {
        intmax_t      imax;
        long double   ldbl;
        void         *vptr;
        void        (*fptr)(void);
    }        u;
} align_data;
size_t align = (char *)&align_data.u.imax - &align_data.c;

然后,您将使用所请求的对齐(在示例中为16)和上面计算的对齐值中较大的一个。

在(64位)Solaris 10上,来自malloc()的结果的基本对齐方式似乎是32字节的倍数。 ]

在实践中,对齐分配器通常采用一个参数进行对齐,而不是硬连接。因此,用户将传递他们所关心的结构体的大小(或大于或等于2的最小次幂),一切都将正常。

3)使用你的平台提供的:posix_memalign用于POSIX, _aligned_malloc用于Windows。

4)如果你使用C11,那么最干净——可移植和简洁——的选项是使用在这个版本的语言规范中引入的标准库函数aligned_alloc。

您还可以尝试posix_memalign()(当然是在POSIX平台上)。