当问到C语言中常见的未定义行为时,人们有时会提到严格的混叠规则。 他们在谈论什么?
当前回答
根据C89的基本原理,标准的作者不想要求编译器给出如下代码:
int x;
int test(double *p)
{
x=5;
*p = 1.0;
return x;
}
应该要求在赋值和返回语句之间重新加载x的值,以便允许p指向x的可能性,而对*p的赋值可能因此改变x的值。编译器应该有权假定在上述情况下不会出现混叠的概念是没有争议的。
不幸的是,C89的作者写规则的方式,如果从字面上读,甚至会使下面的函数调用未定义行为:
void test(void)
{
struct S {int x;} s;
s.x = 1;
}
because it uses an lvalue of type int to access an object of type struct S, and int is not among the types that may be used accessing a struct S. Because it would be absurd to treat all use of non-character-type members of structs and unions as Undefined Behavior, almost everyone recognizes that there are at least some circumstances where an lvalue of one type may be used to access an object of another type. Unfortunately, the C Standards Committee has failed to define what those circumstances are.
大部分问题是缺陷报告#028的结果,它询问了程序的行为,如:
int test(int *ip, double *dp)
{
*ip = 1;
*dp = 1.23;
return *ip;
}
int test2(void)
{
union U { int i; double d; } u;
return test(&u.i, &u.d);
}
缺陷报告#28指出,程序调用了未定义行为,因为写入类型为“double”的联合成员并读取类型为“int”的联合成员的操作调用了实现定义的行为。这样的推理是毫无意义的,但却形成了有效类型规则的基础,这些规则不必要地使语言复杂化,而对解决原始问题毫无帮助。
解决原始问题的最好办法可能是治疗 关于规则目的的脚注,就像它是规范的一样,并作出 除非实际涉及使用别名的冲突访问,否则该规则不可执行。假设是这样的:
void inc_int(int *p) { *p = 3; }
int test(void)
{
int *p;
struct S { int x; } s;
s.x = 1;
p = &s.x;
inc_int(p);
return s.x;
}
在inc_int中没有冲突,因为所有通过*p访问的存储都是使用int类型的左值完成的,在test中也没有冲突,因为p明显地派生于结构体S,并且在下次使用S时,所有通过p访问的存储都已经发生了。
如果代码稍微改变一下……
void inc_int(int *p) { *p = 3; }
int test(void)
{
int *p;
struct S { int x; } s;
p = &s.x;
s.x = 1; // !!*!!
*p += 1;
return s.x;
}
这里,p和对s.x的访问在被标记的行上存在别名冲突,因为在执行时存在另一个引用,该引用将用于访问相同的存储。
如果缺陷报告028说原始示例调用UB是因为两个指针的创建和使用之间有重叠,那么事情就会变得更清楚,而不必添加“有效类型”或其他类似的复杂性。
其他回答
根据C89的基本原理,标准的作者不想要求编译器给出如下代码:
int x;
int test(double *p)
{
x=5;
*p = 1.0;
return x;
}
应该要求在赋值和返回语句之间重新加载x的值,以便允许p指向x的可能性,而对*p的赋值可能因此改变x的值。编译器应该有权假定在上述情况下不会出现混叠的概念是没有争议的。
不幸的是,C89的作者写规则的方式,如果从字面上读,甚至会使下面的函数调用未定义行为:
void test(void)
{
struct S {int x;} s;
s.x = 1;
}
because it uses an lvalue of type int to access an object of type struct S, and int is not among the types that may be used accessing a struct S. Because it would be absurd to treat all use of non-character-type members of structs and unions as Undefined Behavior, almost everyone recognizes that there are at least some circumstances where an lvalue of one type may be used to access an object of another type. Unfortunately, the C Standards Committee has failed to define what those circumstances are.
大部分问题是缺陷报告#028的结果,它询问了程序的行为,如:
int test(int *ip, double *dp)
{
*ip = 1;
*dp = 1.23;
return *ip;
}
int test2(void)
{
union U { int i; double d; } u;
return test(&u.i, &u.d);
}
缺陷报告#28指出,程序调用了未定义行为,因为写入类型为“double”的联合成员并读取类型为“int”的联合成员的操作调用了实现定义的行为。这样的推理是毫无意义的,但却形成了有效类型规则的基础,这些规则不必要地使语言复杂化,而对解决原始问题毫无帮助。
解决原始问题的最好办法可能是治疗 关于规则目的的脚注,就像它是规范的一样,并作出 除非实际涉及使用别名的冲突访问,否则该规则不可执行。假设是这样的:
void inc_int(int *p) { *p = 3; }
int test(void)
{
int *p;
struct S { int x; } s;
s.x = 1;
p = &s.x;
inc_int(p);
return s.x;
}
在inc_int中没有冲突,因为所有通过*p访问的存储都是使用int类型的左值完成的,在test中也没有冲突,因为p明显地派生于结构体S,并且在下次使用S时,所有通过p访问的存储都已经发生了。
如果代码稍微改变一下……
void inc_int(int *p) { *p = 3; }
int test(void)
{
int *p;
struct S { int x; } s;
p = &s.x;
s.x = 1; // !!*!!
*p += 1;
return s.x;
}
这里,p和对s.x的访问在被标记的行上存在别名冲突,因为在执行时存在另一个引用,该引用将用于访问相同的存储。
如果缺陷报告028说原始示例调用UB是因为两个指针的创建和使用之间有重叠,那么事情就会变得更清楚,而不必添加“有效类型”或其他类似的复杂性。
严格的混叠是不允许不同的指针类型指向相同的数据。
本文将帮助您全面详细地理解这个问题。
从技术上讲,在c++中,严格的混叠规则可能永远都不适用。
注意indirection(*运算符)的定义:
一元*运算符执行间接操作:它所对应的表达式 是指向对象类型的指针,还是指向对象类型的指针 函数类型,结果是指向对象或的左值 表达式所指向的函数。
同样来自glvalue的定义
glvalue是一个表达式,其求值决定的标识 一个对象,(…剪)
因此,在任何定义良好的程序跟踪中,glvalue都指向对象。所以所谓的严格混叠规则并不适用。这可能不是设计师想要的。
这是严格的混叠规则,可以在c++ 03标准的3.10节中找到(其他答案提供了很好的解释,但没有一个提供了规则本身):
If a program attempts to access the stored value of an object through an lvalue of other than one of the following types the behavior is undefined: the dynamic type of the object, a cv-qualified version of the dynamic type of the object, a type that is the signed or unsigned type corresponding to the dynamic type of the object, a type that is the signed or unsigned type corresponding to a cv-qualified version of the dynamic type of the object, an aggregate or union type that includes one of the aforementioned types among its members (including, recursively, a member of a subaggregate or contained union), a type that is a (possibly cv-qualified) base class type of the dynamic type of the object, a char or unsigned char type.
c++ 11和c++ 14的措辞(强调更改):
If a program attempts to access the stored value of an object through a glvalue of other than one of the following types the behavior is undefined: the dynamic type of the object, a cv-qualified version of the dynamic type of the object, a type similar (as defined in 4.4) to the dynamic type of the object, a type that is the signed or unsigned type corresponding to the dynamic type of the object, a type that is the signed or unsigned type corresponding to a cv-qualified version of the dynamic type of the object, an aggregate or union type that includes one of the aforementioned types among its elements or non-static data members (including, recursively, an element or non-static data member of a subaggregate or contained union), a type that is a (possibly cv-qualified) base class type of the dynamic type of the object, a char or unsigned char type.
有两个变化很小:glvalue代替了lvalue,并澄清了聚合/并集的情况。
第三个变化提供了更强的保证(放宽强混叠规则):类似类型的新概念现在可以安全地进行混叠。
还有C的措辞(C99;Iso / iec 9899:1999 6.5/7;在ISO/IEC 9899:2011§6.5¶7中使用了完全相同的措辞:
An object shall have its stored value accessed only by an lvalue expression that has one of the following types 73) or 88): a type compatible with the effective type of the object, a qualified version of a type compatible with the effective type of the object, a type that is the signed or unsigned type corresponding to the effective type of the object, a type that is the signed or unsigned type corresponding to a qualified version of the effective type of the object, an aggregate or union type that includes one of the aforementioned types among its members (including, recursively, a member of a subaggregate or contained union), or a character type. 73) or 88) The intent of this list is to specify those circumstances in which an object may or may not be aliased.
作为Doug T.已经写过的附录,在这里 是一个简单的测试用例,可能会触发GCC:
check.c
#include <stdio.h>
void check(short *h,long *k)
{
*h=5;
*k=6;
if (*h == 5)
printf("strict aliasing problem\n");
}
int main(void)
{
long k[1];
check((short *)k,k);
return 0;
}
编译gcc -O2 -o check check.c。 通常(我尝试过的大多数gcc版本)这会输出“严格的混叠问题”,因为编译器假设“h”不能与“check”函数中的“k”地址相同。因此,编译器会优化if (*h == 5),并始终调用printf。
对于那些感兴趣的人,这里有x64汇编代码,由gcc 4.6.3生成,运行在ubuntu 12.04.2的x64上:
movw $5, (%rdi)
movq $6, (%rsi)
movl $.LC0, %edi
jmp puts
所以if条件从汇编代码中完全消失了。
推荐文章
- 什么是“参数依赖查找”(又名ADL,或“Koenig查找”)?
- 公共朋友交换成员函数
- 如何将文件指针(file * fp)转换为文件描述符(int fd)?
- 如何在Go中使用c++
- 自定义c++分配器的引人注目的例子?
- RAII和c++中的智能指针
- 如何构建和使用谷歌TensorFlow c++ api
- C“int”的大小是2字节还是4字节?
- 多维数组如何在内存中格式化?
- printf()和puts()在C语言中的区别是什么?
- 断言是邪恶的吗?
- 下面这些短语在c++中是什么意思:0 -,default-和value-initialization?
- 在STL地图中,使用map::insert比[]更好吗?
- C++ Linux的想法?
- 有效,但毫无价值的语法在开关情况下?