在关于C的一个有信誉的来源中,在讨论&操作符后给出了以下信息:

... 有点不幸的是,术语[地址的]仍然存在,因为它混淆了那些不知道地址是关于什么的人,并误导了那些知道地址的人:将指针视为地址通常会导致悲伤……

我读过的其他材料(来自同样有名望的来源,我想说)总是毫不掩饰地将指针和&操作符作为内存地址。我很愿意继续寻找事情的真相,但当有信誉的消息来源不同意时,这有点困难。

现在我有点困惑了——如果指针不是内存地址,那么它到底是什么?

P.S.

作者后来说:……不过,我将继续使用“地址”这个术语,因为发明一个不同的(术语)会更糟糕。


当前回答

把指针看作地址是一种近似。像所有的近似值一样,它有时足够有用,但也不准确,这意味着依赖它会带来麻烦。

指针就像一个地址,它指出在哪里可以找到一个对象。这种类比的一个直接限制是,并非所有指针都实际包含地址。NULL是一个指针,它不是地址。指针变量的内容实际上可以是以下三种类型之一:

对象的地址,可以被解引用(如果p包含x的地址,则表达式*p与x的值相同); 一个空指针,null是一个例子; 无效内容,不指向对象(如果p不持有有效值,则*p可以做任何事情(“未定义行为”),导致程序崩溃是相当常见的可能性)。

此外,更准确的说法是,一个指针(如果有效且非空)包含一个地址:指针指出在哪里可以找到一个对象,但还有更多与之相关的信息。

In particular, a pointer has a type. On most platforms, the type of the pointer has no influence at runtime, but it has an influence that goes beyond the type at compile time. If p is a pointer to int (int *p;), then p + 1 points to an integer which is sizeof(int) bytes after p (assuming p + 1 is still a valid pointer). If q is a pointer to char that points to the same address as p (char *q = p;), then q + 1 is not the same address as p + 1. If you think of pointer as addresses, it is not very intuitive that the “next address” is different for different pointers to the same location.

It is possible in some environments to have multiple pointer values with different representations (different bit patterns in memory) that point to the same location in memory. You can think of these as different pointers holding the same address, or as different addresses for the same location — the metaphor isn't clear in this case. The == operator always tells you whether the two operands are pointing to the same location, so on these environments you can have p == q even though p and q have different bit patterns.

甚至在某些环境中,指针携带除地址以外的其他信息,例如类型或权限信息。作为一名程序员,你很容易在生活中不会遇到这些问题。

在某些环境中,不同类型的指针具有不同的表示形式。你可以把它想象成不同类型的地址有不同的表示。例如,一些体系结构有字节指针和字指针,或者对象指针和函数指针。

总而言之,只要记住这一点,将指针视为地址并不太糟糕

它只有有效的,非空的地址指针; 同一个位置可以有多个地址; 你不能对地址进行算术运算,地址上也没有顺序; 指针还携带类型信息。

反过来就麻烦多了。并不是所有看起来像地址的东西都可以是指针。在深层的某个地方,任何指针都表示为可以作为整数读取的位模式,并且您可以说这个整数是一个地址。但反过来说,不是每个整数都是指针。

首先有一些众所周知的限制;例如,在程序地址空间之外指定位置的整数不能是有效指针。未对齐的地址不能为需要对齐的数据类型创建有效指针;例如,在int需要4字节对齐的平台上,0x7654321不能是有效的int*值。

然而,它远远不止于此,因为当您将指针设置为整数时,您就会遇到很多麻烦。这个问题的很大一部分是优化编译器在微优化方面比大多数程序员预期的要好得多,因此他们对程序如何工作的思维模型是严重错误的。仅仅因为指针具有相同的地址并不意味着它们是等价的。例如,考虑下面的代码片段:

unsigned int x = 0;
unsigned short *p = (unsigned short*)&x;
p[0] = 1;
printf("%u = %u\n", x, *p);

您可能会期望,在sizeof(int)==4和sizeof(short)==2的普通机器上,这要么打印1 = 1?(little-endian)还是65536 = 1?(大端)。但在我的64位Linux PC上,GCC 4.4:

$ c99 -O2 -Wall a.c && ./a.out 
a.c: In function ‘main’:
a.c:6: warning: dereferencing pointer ‘p’ does break strict-aliasing rules
a.c:5: note: initialized from here
0 = 1?

在这个简单的例子中,GCC会提醒我们哪里出了问题——在更复杂的例子中,编译器可能不会注意到。由于p与&x的类型不同,改变p指向的对象不会影响&x指向的对象(除了一些定义良好的异常)。因此,编译器可以自由地将x的值保存在寄存器中,而不会在*p更改时更新该寄存器。程序解引用两个指向相同地址的指针,得到两个不同的值!

The moral of this example is that thinking of a (non-null valid) pointer as an address is fine, as long as you stay within the precise rules of the C language. The flip side of the coin is that the rules of the C language are intricate, and difficult to get an intuitive feeling for unless you know what happens under the hood. And what happens under the hood is that the tie between pointers and addresses is somewhat loose, both to support “exotic” processor architectures and to support optimizing compilers.

因此,可以将指针作为地址作为理解的第一步,但不要过于遵循这种直觉。

其他回答

Come to think about it, I think it's a matter of semantics. I don't think the author is right, since the C standard refers to a pointer as holding an address to the referenced object as others have already mentioned here. However, address!=memory address. An address can be really anything as per C standard although it will eventually lead to a memory address, the pointer itself can be an id, an offset + selector (x86), really anything as long as it can describe (after mapping) any memory address in the addressable space.

我不确定你的来源,但你描述的语言类型来自C标准:

6.5.3.2地址和间接操作符 […] 3.一元&操作符产生其操作数的地址。[…]

所以…是的,指针指向内存地址。至少这是C标准所暗示的意思。

更清楚地说,指针是保存某个地址值的变量。对象的地址(可以存储在指针中)使用一元&操作符返回。

我可以将地址“42 Wallaby Way, Sydney”存储在一个变量中(该变量将是某种“指针”,但由于这不是一个内存地址,所以我们不能正确地称之为“指针”)。您的计算机有内存桶的地址。指针存储地址的值(例如,指针存储值“42 Wallaby Way, Sydney”,这是一个地址)。

编辑:我想对Alexey Frunze的评论进行扩展。

指针到底是什么?让我们看看C标准:

6.2.5类型 […] 20.[…] 指针类型可以从函数类型或对象类型派生,称为引用类型。指针类型描述了一个对象,该对象的值提供了对所引用类型实体的引用。从引用类型T派生的指针类型有时称为“指向T的指针”。从引用类型构造指针类型称为“指针类型派生”。指针类型是一个完整的对象类型。

从本质上讲,指针存储一个值,该值提供对某些对象或函数的引用。种。指针用于存储提供对某些对象或函数引用的值,但情况并非总是如此:

6.3.2.3指针 […] 5. 整数可以转换为任何指针类型。除非像前面指定的那样,否则结果是由实现定义的,可能没有正确对齐,可能没有指向引用类型的实体,并且可能是陷阱表示。

The above quote says that we can turn an integer into a pointer. If we do that (that is, if we stuff an integer value into a pointer instead of a specific reference to an object or function), then the pointer "might not point to an entity of reference type" (i.e. it may not provide a reference to an object or function). It might provide us with something else. And this is one place where you might stick some kind of handle or ID in a pointer (i.e. the pointer isn't pointing to an object; it's storing a value that represents something, but that value may not be an address).

是的,正如Alexey Frunze所说,指针可能没有存储对象或函数的地址。有可能一个指针存储的是某种“句柄”或ID,你可以通过给指针赋某个任意整数值来做到这一点。这个句柄或ID表示什么取决于系统/环境/上下文。只要您的系统/实现能够理解这个值,您就处于良好的状态(但这取决于具体的值和具体的系统/实现)。

通常,指针存储对象或函数的地址。如果它没有存储实际的地址(到对象或函数),则结果是实现定义的(这意味着究竟发生了什么以及指针现在表示什么取决于您的系统和实现,因此它可能是特定系统上的句柄或ID,但在另一个系统上使用相同的代码/值可能会使程序崩溃)。

结果比我想象的要长……

A pointer, like any other variable in C, is fundamentally a collection of bits which may be represented by one or more concatenated unsigned char values (as with any other type of cariable, sizeof(some_variable) will indicate the number of unsigned char values). What makes a pointer different from other variables is that a C compiler will interpret the bits in a pointer as identifying, somehow, a place where a variable may be stored. In C, unlike some other languages, it is possible to request space for multiple variables, and then convert a pointer to any value in that set into a pointer to any other variable within that set.

Many compilers implement pointers by using their bits store actual machine addresses, but that is not the only possible implementation. An implementation could keep one array--not accessible to user code--listing the hardware address and allocated size of all of the memory objects (sets of variables) which a program was using, and have each pointer contain an index into an array along with an offset from that index. Such a design would allow a system to not only restrict code to only operating upon memory that it owned, but also ensure that a pointer to one memory item could not be accidentally converted into a pointer to another memory item (in a system that uses hardware addresses, if foo and bar are arrays of 10 items that are stored consecutively in memory, a pointer to the "eleventh" item of foo might instead point to the first item of bar, but in a system where each "pointer" is an object ID and an offset, the system could trap if code tried to index a pointer to foo beyond its allocated range). It would also be possible for such a system to eliminate memory-fragmentation problems, since the physical addresses associated with any pointers could be moved around.

Note that while pointers are somewhat abstract, they're not quite abstract enough to allow a fully-standards-compliant C compiler to implement a garbage collector. The C compiler specifies that every variable, including pointers, is represented as a sequence of unsigned char values. Given any variable, one can decompose it into a sequence of numbers and later convert that sequence of numbers back into a variable of the original type. Consequently, it would be possible for a program to calloc some storage (receiving a pointer to it), store something there, decompose the pointer into a series of bytes, display those on the screen, and then erase all reference to them. If the program then accepted some numbers from the keyboard, reconstituted those to a pointer, and then tried to read data from that pointer, and if user entered the same numbers that the program had earlier displayed, the program would be required to output the data that had been stored in the calloc'ed memory. Since there is no conceivable way the computer could know whether the user had made a copy of the numbers that were displayed, there would be no conceivable may the computer could know whether the aforementioned memory might ever be accessed in future.

你是对的,是理智的。通常,指针只是一个地址,因此您可以将其强制转换为整数并进行任何算术运算。

但有时指针只是地址的一部分。在一些体系结构上,指针被转换为一个增加了基数的地址或使用另一个CPU寄存器。

但是现在,在PC和ARM架构上,使用平面内存模型和原生编译的C语言,可以认为指针是指向一维可寻址RAM中某个位置的整数地址。

在这幅图中,

Pointer_p是一个位于0x12345的指针,它指向0x34567的变量variable_v。