哪一个更快:while(1)还是while(2)?

这是一位高级经理在面试时问的问题。

哪个更快?

while(1) {
    // Some code
}

while(2) {
    //Some code
}

我说过两者具有相同的执行速度，因为while中的表达式最终应求值为true或false。在这种情况下，两者都求值为true，并且while条件中没有额外的条件指令。因此，两者将具有相同的执行速度，我更喜欢while(1)。

但面试官自信地说: “检查你的基本知识。While(1)比While(2)快。” (他不是在试探我的信心)

这是真的吗?

请参见:“for(;;)”是否比“while (TRUE)”快?如果不是，人们为什么要使用它?

当前回答

这两个循环都是无限的，但我们可以看到哪个循环每次迭代需要更多的指令/资源。

使用gcc，我编译了以下两个程序，以不同的优化级别进行汇编:

int main(void) {
    while(1) {}
    return 0;
}

int main(void) {
    while(2) {}
    return 0;
}

即使没有优化(-O0)，两个程序生成的程序集也是相同的。因此，两个循环之间没有速度差异。

作为参考，下面是生成的程序集(使用gcc main.c -S -masm=intel并带有优化标志):

o0:

    .file   "main.c"
    .intel_syntax noprefix
    .def    __main; .scl    2;  .type   32; .endef
    .text
    .globl  main
    .def    main;   .scl    2;  .type   32; .endef
    .seh_proc   main
main:
    push    rbp
    .seh_pushreg    rbp
    mov rbp, rsp
    .seh_setframe   rbp, 0
    sub rsp, 32
    .seh_stackalloc 32
    .seh_endprologue
    call    __main
.L2:
    jmp .L2
    .seh_endproc
    .ident  "GCC: (tdm64-2) 4.8.1"

o1群:

    .file   "main.c"
    .intel_syntax noprefix
    .def    __main; .scl    2;  .type   32; .endef
    .text
    .globl  main
    .def    main;   .scl    2;  .type   32; .endef
    .seh_proc   main
main:
    sub rsp, 40
    .seh_stackalloc 40
    .seh_endprologue
    call    __main
.L2:
    jmp .L2
    .seh_endproc
    .ident  "GCC: (tdm64-2) 4.8.1"

-O2和-O3(相同输出):

    .file   "main.c"
    .intel_syntax noprefix
    .def    __main; .scl    2;  .type   32; .endef
    .section    .text.startup,"x"
    .p2align 4,,15
    .globl  main
    .def    main;   .scl    2;  .type   32; .endef
    .seh_proc   main
main:
    sub rsp, 40
    .seh_stackalloc 40
    .seh_endprologue
    call    __main
.L2:
    jmp .L2
    .seh_endproc
    .ident  "GCC: (tdm64-2) 4.8.1"

事实上，为循环生成的程序集对于每个级别的优化都是相同的:

 .L2:
    jmp .L2
    .seh_endproc
    .ident  "GCC: (tdm64-2) 4.8.1"

重要的部分是:

.L2:
    jmp .L2

我不太懂汇编，但这显然是一个无条件循环。jmp指令无条件地将程序重置回. l2标签，甚至不将值与true进行比较，当然，它会立即再次这样做，直到程序以某种方式结束。这直接对应于C/ c++代码:

L2:
    goto L2;

编辑:

有趣的是，即使没有优化，下面的循环都在汇编中产生完全相同的输出(无条件jmp):

while(42) {}

while(1==1) {}

while(2==2) {}

while(4<7) {}

while(3==3 && 4==4) {}

while(8-9 < 0) {}

while(4.3 * 3e4 >= 2 << 6) {}

while(-0.1 + 02) {}

甚至令我惊讶的是:

#include<math.h>

while(sqrt(7)) {}

while(hypot(3,4)) {}

对于用户定义函数，事情变得更加有趣:

int x(void) {
    return 1;
}

while(x()) {}

#include<math.h>

double x(void) {
    return sqrt(7);
}

while(x()) {}

At -O0, these two examples actually call x and perform a comparison for each iteration.

First example (returning 1):

.L4:
    call    x
    testl   %eax, %eax
    jne .L4
    movl    $0, %eax
    addq    $32, %rsp
    popq    %rbp
    ret
    .seh_endproc
    .ident  "GCC: (tdm64-2) 4.8.1"

Second example (returning sqrt(7)):

.L4:
    call    x
    xorpd   %xmm1, %xmm1
    ucomisd %xmm1, %xmm0
    jp  .L4
    xorpd   %xmm1, %xmm1
    ucomisd %xmm1, %xmm0
    jne .L4
    movl    $0, %eax
    addq    $32, %rsp
    popq    %rbp
    ret
    .seh_endproc
    .ident  "GCC: (tdm64-2) 4.8.1"

However, at -O1 and above, they both produce the same assembly as the previous examples (an unconditional jmp back to the preceding label).

TL;DR

Under GCC, the different loops are compiled to identical assembly. The compiler evaluates the constant values and doesn't bother performing any actual comparison.

The moral of the story is:

There exists a layer of translation between C source code and CPU instructions, and this layer has important implications for performance.
Therefore, performance cannot be evaluated by only looking at source code.
The compiler should be smart enough to optimize such trivial cases. Programmers should not waste their time thinking about them in the vast majority of cases.

2014-07-20 08:03:37

其他回答

这两个循环都是无限的，但我们可以看到哪个循环每次迭代需要更多的指令/资源。

使用gcc，我编译了以下两个程序，以不同的优化级别进行汇编:

int main(void) {
    while(1) {}
    return 0;
}

int main(void) {
    while(2) {}
    return 0;
}

即使没有优化(-O0)，两个程序生成的程序集也是相同的。因此，两个循环之间没有速度差异。

作为参考，下面是生成的程序集(使用gcc main.c -S -masm=intel并带有优化标志):

o0:

    .file   "main.c"
    .intel_syntax noprefix
    .def    __main; .scl    2;  .type   32; .endef
    .text
    .globl  main
    .def    main;   .scl    2;  .type   32; .endef
    .seh_proc   main
main:
    push    rbp
    .seh_pushreg    rbp
    mov rbp, rsp
    .seh_setframe   rbp, 0
    sub rsp, 32
    .seh_stackalloc 32
    .seh_endprologue
    call    __main
.L2:
    jmp .L2
    .seh_endproc
    .ident  "GCC: (tdm64-2) 4.8.1"

o1群:

    .file   "main.c"
    .intel_syntax noprefix
    .def    __main; .scl    2;  .type   32; .endef
    .text
    .globl  main
    .def    main;   .scl    2;  .type   32; .endef
    .seh_proc   main
main:
    sub rsp, 40
    .seh_stackalloc 40
    .seh_endprologue
    call    __main
.L2:
    jmp .L2
    .seh_endproc
    .ident  "GCC: (tdm64-2) 4.8.1"

-O2和-O3(相同输出):

    .file   "main.c"
    .intel_syntax noprefix
    .def    __main; .scl    2;  .type   32; .endef
    .section    .text.startup,"x"
    .p2align 4,,15
    .globl  main
    .def    main;   .scl    2;  .type   32; .endef
    .seh_proc   main
main:
    sub rsp, 40
    .seh_stackalloc 40
    .seh_endprologue
    call    __main
.L2:
    jmp .L2
    .seh_endproc
    .ident  "GCC: (tdm64-2) 4.8.1"

事实上，为循环生成的程序集对于每个级别的优化都是相同的:

 .L2:
    jmp .L2
    .seh_endproc
    .ident  "GCC: (tdm64-2) 4.8.1"

重要的部分是:

.L2:
    jmp .L2

L2:
    goto L2;

编辑:

有趣的是，即使没有优化，下面的循环都在汇编中产生完全相同的输出(无条件jmp):

while(42) {}

while(1==1) {}

while(2==2) {}

while(4<7) {}

while(3==3 && 4==4) {}

while(8-9 < 0) {}

while(4.3 * 3e4 >= 2 << 6) {}

while(-0.1 + 02) {}

甚至令我惊讶的是:

#include<math.h>

while(sqrt(7)) {}

while(hypot(3,4)) {}

对于用户定义函数，事情变得更加有趣:

int x(void) {
    return 1;
}

while(x()) {}

#include<math.h>

double x(void) {
    return sqrt(7);
}

while(x()) {}

At -O0, these two examples actually call x and perform a comparison for each iteration.

First example (returning 1):

.L4:
    call    x
    testl   %eax, %eax
    jne .L4
    movl    $0, %eax
    addq    $32, %rsp
    popq    %rbp
    ret
    .seh_endproc
    .ident  "GCC: (tdm64-2) 4.8.1"

Second example (returning sqrt(7)):

.L4:
    call    x
    xorpd   %xmm1, %xmm1
    ucomisd %xmm1, %xmm0
    jp  .L4
    xorpd   %xmm1, %xmm1
    ucomisd %xmm1, %xmm0
    jne .L4
    movl    $0, %eax
    addq    $32, %rsp
    popq    %rbp
    ret
    .seh_endproc
    .ident  "GCC: (tdm64-2) 4.8.1"

However, at -O1 and above, they both produce the same assembly as the previous examples (an unconditional jmp back to the preceding label).

TL;DR

Under GCC, the different loops are compiled to identical assembly. The compiler evaluates the constant values and doesn't bother performing any actual comparison.

The moral of the story is:

There exists a layer of translation between C source code and CPU instructions, and this layer has important implications for performance.
Therefore, performance cannot be evaluated by only looking at source code.
The compiler should be smart enough to optimize such trivial cases. Programmers should not waste their time thinking about them in the vast majority of cases.

2014-07-20 08:03:37

当然，我不知道这位经理的真实意图，但我提出了一个完全不同的观点:当一个团队雇佣一个新成员时，了解他如何应对冲突情况是很有用的。

他们让你陷入冲突。如果这是真的，他们很聪明，这个问题也很好。对于某些行业，比如银行业，将你的问题发布到Stack Overflow可能会成为被拒绝的理由。

但我当然不知道，我只是提出一个选择。

2014-07-22 03:17:31

我能想到为什么while(2)会更慢的唯一原因是:

代码将循环优化为 2 . CMP eax 当减法发生时，你实际上是在减法 A. 00000000 - 00000010 CMP eax, 2 而不是 B. 00000000 - 00000001 CMP eax, 1

CMP只设置标志，不设置结果。所以对于最低有效位，我们知道b是否需要借位，而对于a，你必须做两次减法才能得到借位。

2014-07-22 20:46:35

现有的答案显示了由特定编译器为特定目标使用特定选项集生成的代码，并不能完全回答这个问题——除非在特定的上下文中提出了这个问题(例如，“对于带有默认选项的x86_64，使用gcc 4.7.2哪个更快?”)。

就语言定义而言，在抽象机器中while(1)计算整数常数1,while(2)计算整数常数2;在这两种情况下，比较结果是否等于零。语言标准完全没有说明这两种构造的相对性能。

我可以想象，一个极其幼稚的编译器可能会为这两种表单生成不同的机器代码，至少在编译时没有请求优化。

另一方面，C编译器绝对必须在编译时计算一些常量表达式，当它们出现在需要常量表达式的上下文中时。例如，这个:

int n = 4;
switch (n) {
    case 2+2: break;
    case 4:   break;
}

需要诊断;延迟编译器没有将2+2的计算延迟到执行时的选项。由于编译器必须具备在编译时计算常量表达式的能力，因此即使不需要也没有充分的理由不利用这种能力。

C标准(N1570 6.8.5p4)说

一个迭代语句导致一个称为循环体的语句重复执行，直到控制表达式比较为 0.

所以相关的常数表达式是1 == 0和2 == 0，它们的值都是int值0。(这些比较隐含在while循环的语义中;它们并不作为实际的C表达式存在。)

非常简单的编译器可能会为这两种构造生成不同的代码。例如，对于第一个，它可以生成一个无条件的无限循环(将1视为特殊情况)，对于第二个，它可以生成一个显式的运行时比较，等价于2 != 0。但我从来没有遇到过这样的C编译器，我严重怀疑这样的编译器是否存在。

大多数编译器(我倾向于说所有生产质量的编译器)都有请求额外优化的选项。在这种选择下，任何编译器都不太可能为这两种表单生成不同的代码。

如果编译器为这两种构造生成不同的代码，首先检查不同的代码序列是否具有不同的性能。如果有，请尝试使用优化选项(如果可用)再次编译。如果它们仍然不同，则向编译器供应商提交错误报告。这(不一定)是一个不符合C标准的错误，但几乎可以肯定这是一个应该纠正的问题。

底线:while(1)和while(2)几乎肯定具有相同的性能。它们具有完全相同的语义，任何编译器都没有理由不生成相同的代码。

尽管编译器为while(1)生成比while(2)更快的代码是完全合法的，但在同一个程序中，编译器为while(1)生成比另一个while(1)更快的代码也是合法的。

(你问的问题中还暗含着另一个问题:你如何应对坚持提出一个错误的技术观点的面试官?这可能是一个适合Workplace网站的好问题)。

2014-07-21 00:35:06

对这个问题最有可能的解释是，面试官认为处理器会逐一检查数字的每一位，直到它达到一个非零值:

1 = 00000001
2 = 00000010

如果“is 0 ?”算法从数字的右侧开始，并且必须检查每一位直到它达到非零位，那么while(1){}循环每次迭代必须检查的比特数是while(2){}循环的两倍。

这需要一个关于计算机如何工作的非常错误的思维模型，但它确实有自己的内部逻辑。一种检查方法是询问while(-1){}或while(3){}是否同样快，或者while(32){}是否更慢。

2014-07-21 14:41:00