了解汇编程序的原因之一是,有时可以使用汇编程序来编写比用高级语言(特别是C语言)编写的代码性能更好的代码。然而,我也听人说过很多次,尽管这并非完全错误,但实际上可以使用汇编程序来生成性能更好的代码的情况极其罕见,并且需要汇编方面的专业知识和经验。

这个问题甚至没有涉及到这样一个事实,即汇编程序指令将是特定于机器的、不可移植的,或者汇编程序的任何其他方面。当然,除了这一点之外,了解汇编还有很多很好的理由,但这是一个需要示例和数据的具体问题,而不是关于汇编程序与高级语言的扩展论述。

谁能提供一些具体的例子,说明使用现代编译器汇编代码比编写良好的C代码更快,并且您能否用分析证据支持这一说法?我相信这些案例确实存在,但我真的很想知道这些案例到底有多深奥,因为这似乎是一个有争议的问题。


当前回答

在运行时创建机器代码怎么样?

我的兄弟曾经(大约在2000年)通过在运行时生成代码实现了一个非常快速的实时光线跟踪器。我不记得细节了,但有一些主模块是通过对象循环的,然后它准备和执行一些特定于每个对象的机器代码。

然而,随着时间的推移,这种方法被新的图形硬件淘汰,变得毫无用处。

今天,我认为大数据(数百万条记录)上的一些操作,如数据透视表、钻孔、实时计算等,都可以用这种方法进行优化。问题是:这样的努力值得吗?

其他回答

这很难具体地回答,因为这个问题非常不具体:到底什么是“现代编译器”?

理论上,几乎任何手动的汇编器优化都可以由编译器来完成——实际上它是否已经完成,不能笼统地说,只能说特定编译器的特定版本。许多可能需要花费大量的精力来确定它们是否可以在特定的上下文中应用而不产生副作用,以至于编译器编写者不会为它们烦恼。

以下是我个人经历中的几个例子:

Access to instructions that are not accessible from C. For instance, many architectures (like x86-64, IA-64, DEC Alpha, and 64-bit MIPS or PowerPC) support a 64 bit by 64 bit multiplication producing a 128 bit result. GCC recently added an extension providing access to such instructions, but before that assembly was required. And access to this instruction can make a huge difference on 64-bit CPUs when implementing something like RSA - sometimes as much as a factor of 4 improvement in performance. Access to CPU-specific flags. The one that has bitten me a lot is the carry flag; when doing a multiple-precision addition, if you don't have access to the CPU carry bit one must instead compare the result to see if it overflowed, which takes 3-5 more instructions per limb; and worse, which are quite serial in terms of data accesses, which kills performance on modern superscalar processors. When processing thousands of such integers in a row, being able to use addc is a huge win (there are superscalar issues with contention on the carry bit as well, but modern CPUs deal pretty well with it). SIMD. Even autovectorizing compilers can only do relatively simple cases, so if you want good SIMD performance it's unfortunately often necessary to write the code directly. Of course you can use intrinsics instead of assembly but once you're at the intrinsics level you're basically writing assembly anyway, just using the compiler as a register allocator and (nominally) instruction scheduler. (I tend to use intrinsics for SIMD simply because the compiler can generate the function prologues and whatnot for me so I can use the same code on Linux, OS X, and Windows without having to deal with ABI issues like function calling conventions, but other than that the SSE intrinsics really aren't very nice - the Altivec ones seem better though I don't have much experience with them). As examples of things a (current day) vectorizing compiler can't figure out, read about bitslicing AES or SIMD error correction - one could imagine a compiler that could analyze algorithms and generate such code, but it feels to me like such a smart compiler is at least 30 years away from existing (at best).

On the other hand, multicore machines and distributed systems have shifted many of the biggest performance wins in the other direction - get an extra 20% speedup writing your inner loops in assembly, or 300% by running them across multiple cores, or 10000% by running them across a cluster of machines. And of course high level optimizations (things like futures, memoization, etc) are often much easier to do in a higher level language like ML or Scala than C or asm, and often can provide a much bigger performance win. So, as always, there are tradeoffs to be made.

我认为汇编程序更快的一般情况是,当一个聪明的汇编程序员看到编译器的输出并说“这是性能的关键路径,我可以写这个更有效”,然后那个人调整汇编程序或从头重写它。

Actually you can build large scale programs in a large model mode segaments may be restricted to 64kb code but you can write many segaments, people give the argument against ASM as it is an old language and we don't need to preserve memory anymore, If that were the case why would we be packing our PC's with memory, the only Flaw I can find with ASM is that it is more or less Processor based so most programs written for the intel architecture Most likely would not run on An AMD Architecture. As for C being faster than ASM there is no language faster than ASM and ASM can do many thing's C and other HLL's can not do at processor level. ASM is a difficult language to learn but once you learn it no HLL can translate it better than you. If you could only see some of the things HLL's Do to you code, and understand what it is doing, you would wonder why More people don't use ASM and why assembers are no longer being updated ( For general public use anyway). So no C is not faster than ASM. Even experiences C++ programmers still use and write code Chunks in ASM added to there C++ code for speed. Other Languages Also that some people think are obsolete or possibly no good is a myth at times for instance Photoshop is written in Pascal/ASM 1st release of souce has been submitted to the technical history museum, and paintshop pro is written still written in Python,TCL and ASM ... a common denominator of these to "Fast and Great image processors is ASM, although photoshop may have Upgraded to delphi now it is still pascal. and any speed problems are comming from pascal but this is because we like the way programs look and not what they do now days. I would like to make a Photoshop Clone in pure ASM which I have been working on and its comming along rather well. not code,interpret,arange,rewwrite,etc.... Just code and go process complete.

在处理器速度以MHz为单位,屏幕尺寸低于100万像素的时代,一个众所周知的更快显示的技巧是展开循环:为屏幕的每个扫描行写操作。它避免了维护循环索引的开销!再加上检测屏幕刷新,它非常有效。 这是C编译器不会做的事情……(虽然通常可以在速度优化和规模优化之间进行选择,但我认为前者使用了一些类似的技巧。)

我知道有些人喜欢用汇编语言编写Windows应用程序。他们声称他们更快(很难证明)和更小(确实如此!)。 显然,虽然这样做很有趣,但可能会浪费时间(当然,学习目的除外!),特别是对于GUI操作…… 现在,也许某些操作(比如在文件中搜索字符串)可以通过精心编写的汇编代码进行优化。