什么时候汇编比C快?

了解汇编程序的原因之一是，有时可以使用汇编程序来编写比用高级语言(特别是C语言)编写的代码性能更好的代码。然而，我也听人说过很多次，尽管这并非完全错误，但实际上可以使用汇编程序来生成性能更好的代码的情况极其罕见，并且需要汇编方面的专业知识和经验。

这个问题甚至没有涉及到这样一个事实，即汇编程序指令将是特定于机器的、不可移植的，或者汇编程序的任何其他方面。当然，除了这一点之外，了解汇编还有很多很好的理由，但这是一个需要示例和数据的具体问题，而不是关于汇编程序与高级语言的扩展论述。

谁能提供一些具体的例子，说明使用现代编译器汇编代码比编写良好的C代码更快，并且您能否用分析证据支持这一说法?我相信这些案例确实存在，但我真的很想知道这些案例到底有多深奥，因为这似乎是一个有争议的问题。

当前回答

尽管C语言“接近”于对8位、16位、32位和64位数据的低级操作，但仍有一些C语言不支持的数学操作通常可以在某些汇编指令集中优雅地执行:

Fixed-point multiplication: The product of two 16-bit numbers is a 32-bit number. But the rules in C says that the product of two 16-bit numbers is a 16-bit number, and the product of two 32-bit numbers is a 32-bit number -- the bottom half in both cases. If you want the top half of a 16x16 multiply or a 32x32 multiply, you have to play games with the compiler. The general method is to cast to a larger-than-necessary bit width, multiply, shift down, and cast back: int16_t x, y; // int16_t is a typedef for "short" // set x and y to something int16_t prod = (int16_t)(((int32_t)x*y)>>16);` In this case the compiler may be smart enough to know that you're really just trying to get the top half of a 16x16 multiply and do the right thing with the machine's native 16x16multiply. Or it may be stupid and require a library call to do the 32x32 multiply that's way overkill because you only need 16 bits of the product -- but the C standard doesn't give you any way to express yourself. Certain bitshifting operations (rotation/carries): // 256-bit array shifted right in its entirety: uint8_t x[32]; for (int i = 32; --i > 0; ) { x[i] = (x[i] >> 1) | (x[i-1] << 7); } x[0] >>= 1; This is not too inelegant in C, but again, unless the compiler is smart enough to realize what you are doing, it's going to do a lot of "unnecessary" work. Many assembly instruction sets allow you to rotate or shift left/right with the result in the carry register, so you could accomplish the above in 34 instructions: load a pointer to the beginning of the array, clear the carry, and perform 32 8-bit right-shifts, using auto-increment on the pointer. For another example, there are linear feedback shift registers (LFSR) that are elegantly performed in assembly: Take a chunk of N bits (8, 16, 32, 64, 128, etc), shift the whole thing right by 1 (see above algorithm), then if the resulting carry is 1 then you XOR in a bit pattern that represents the polynomial.

尽管如此，除非有严重的性能限制，否则我不会求助于这些技术。正如其他人所说，汇编代码比C代码更难记录/调试/测试/维护:性能的提高伴随着一些严重的代价。

编辑:3。溢出检测在汇编中是可能的(在C中不能真正做到)，这使得一些算法更容易。

2009-02-23 14:34:56

其他回答

这很难具体地回答，因为这个问题非常不具体:到底什么是“现代编译器”?

理论上，几乎任何手动的汇编器优化都可以由编译器来完成——实际上它是否已经完成，不能笼统地说，只能说特定编译器的特定版本。许多可能需要花费大量的精力来确定它们是否可以在特定的上下文中应用而不产生副作用，以至于编译器编写者不会为它们烦恼。

2009-02-23 13:17:02

长波克，只有一个限制时间。当你没有足够的资源来优化每一个代码的变化，并花时间分配寄存器，优化一些溢出和诸如此类的事情时，编译器每次都会赢。对代码进行修改、重新编译和度量。如有必要重复。

此外，你可以在高水平方面做很多事情。此外，检查生成的程序集可能会给人一种代码是垃圾的印象，但实际上它的运行速度比您想象的要快。例子:

Int y = data[i]; //在这里做一些事情。 call_function (y,…);

编译器将读取数据，将其推入堆栈(溢出)，然后从堆栈读取并作为参数传递。听起来屎?它实际上可能是非常有效的延迟补偿，并导致更快的运行时。

//优化版本 call_function(数据[我],…);//毕竟不是那么优化。

优化版本的想法是，我们降低了寄存器压力，避免溢出。但事实上，“垃圾”版本更快!

看看汇编代码，只看指令，然后得出结论:指令越多，速度越慢，这将是一个错误的判断。

这里需要注意的是:许多组装专家认为他们知道很多，但知道的很少。规则也会随着架构的变化而变化。例如，x86代码并不存在总是最快的银弹。如今，最好还是按照经验法则行事:

记忆很慢缓存速度快尽量更好地使用缓存你多久会错过一次?你有延迟补偿策略吗? 对于一个cache miss，你可以执行10-100个ALU/FPU/SSE指令应用程序架构很重要。 . .但是当问题不在架构上时，它就没有帮助了

此外，过于相信编译器会神奇地将考虑不周到的C/ c++代码转换为“理论上最优”的代码是一厢情愿的想法。如果你关心这个低级别的“性能”，你必须知道你使用的编译器和工具链。

C/ c++中的编译器通常不太擅长重新排序子表达式，因为对于初学者来说，函数有副作用。函数式语言没有受到这个警告的影响，但它不太适合当前的生态系统。有一些编译器选项可以允许宽松的精确规则，允许编译器/链接器/代码生成器改变操作的顺序。

这个话题有点死路一条;对于大多数人来说，这是无关紧要的，而剩下的人，他们已经知道自己在做什么了。

这一切都归结为:“理解你在做什么”，这与知道你在做什么有点不同。

2010-09-17 13:12:59

Actually you can build large scale programs in a large model mode segaments may be restricted to 64kb code but you can write many segaments, people give the argument against ASM as it is an old language and we don't need to preserve memory anymore, If that were the case why would we be packing our PC's with memory, the only Flaw I can find with ASM is that it is more or less Processor based so most programs written for the intel architecture Most likely would not run on An AMD Architecture. As for C being faster than ASM there is no language faster than ASM and ASM can do many thing's C and other HLL's can not do at processor level. ASM is a difficult language to learn but once you learn it no HLL can translate it better than you. If you could only see some of the things HLL's Do to you code, and understand what it is doing, you would wonder why More people don't use ASM and why assembers are no longer being updated ( For general public use anyway). So no C is not faster than ASM. Even experiences C++ programmers still use and write code Chunks in ASM added to there C++ code for speed. Other Languages Also that some people think are obsolete or possibly no good is a myth at times for instance Photoshop is written in Pascal/ASM 1st release of souce has been submitted to the technical history museum, and paintshop pro is written still written in Python,TCL and ASM ... a common denominator of these to "Fast and Great image processors is ASM, although photoshop may have Upgraded to delphi now it is still pascal. and any speed problems are comming from pascal but this is because we like the way programs look and not what they do now days. I would like to make a Photoshop Clone in pure ASM which I have been working on and its comming along rather well. not code,interpret,arange,rewwrite,etc.... Just code and go process complete.

2014-08-19 15:09:15

在我的工作中，有三个原因让我了解和使用组装。按重要性排序:

Debugging - I often get library code that has bugs or incomplete documentation. I figure out what it's doing by stepping in at the assembly level. I have to do this about once a week. I also use it as a tool to debug problems in which my eyes don't spot the idiomatic error in C/C++/C#. Looking at the assembly gets past that. Optimizing - the compiler does fairly well in optimizing, but I play in a different ballpark than most. I write image processing code that usually starts with code that looks like this: for (int y=0; y < imageHeight; y++) { for (int x=0; x < imageWidth; x++) { // do something } } the "do something part" typically happens on the order of several million times (ie, between 3 and 30). By scraping cycles in that "do something" phase, the performance gains are hugely magnified. I don't usually start there - I usually start by writing the code to work first, then do my best to refactor the C to be naturally better (better algorithm, less load in the loop etc). I usually need to read assembly to see what's going on and rarely need to write it. I do this maybe every two or three months. doing something the language won't let me. These include - getting the processor architecture and specific processor features, accessing flags not in the CPU (man, I really wish C gave you access to the carry flag), etc. I do this maybe once a year or two years.

2009-02-23 16:22:00

简短的回答吗?有时。

从技术上讲，每一个抽象都有成本，而编程语言是CPU如何工作的抽象。然而C非常接近。几年前，我记得当我登录UNIX帐户并收到以下财富信息时(当时这种东西很流行)，我笑出声来:

C程序设计语言——A 语言结合了汇编语言的灵活性汇编语言的强大。

这很有趣，因为这是真的:C就像可移植的汇编语言。

值得注意的是，汇编语言无论如何编写都可以运行。然而，在C语言和它生成的汇编语言之间有一个编译器，这是非常重要的，因为你的C代码有多快与你的编译器有多好有很大关系。

当gcc出现时，它如此受欢迎的原因之一是它通常比许多商业UNIX版本附带的C编译器要好得多。它不仅是ANSI C(没有任何K&R C的垃圾)，更健壮，通常能产生更好(更快)的代码。不是总是，而是经常。

我告诉你这一切是因为没有关于C和汇编器速度的统一规则，因为C没有客观的标准。

同样地，汇编程序也会根据你正在运行的处理器、你的系统规格、你正在使用的指令集等而有很大的不同。历史上有两个CPU体系结构家族:CISC和RISC。CISC中最大的玩家过去是，现在仍然是Intel x86架构(和指令集)。RISC主宰了UNIX世界(MIPS6000、Alpha、Sparc等等)。CISC赢得了民心之战。

不管怎样，当我还是一个年轻的开发人员时，流行的观点是，手写的x86通常比C快得多，因为架构的工作方式，它的复杂性受益于人类的操作。另一方面，RISC似乎是为编译器设计的，所以没有人(我知道)写Sparc汇编器。我相信这样的人确实存在，但毫无疑问，他们现在都疯了，被送进了精神病院。

指令集是一个重要的点，即使在同一家族的处理器。某些英特尔处理器具有SSE到SSE4等扩展。AMD有他们自己的SIMD指令。像C这样的编程语言的好处是，人们可以编写他们的库，以便对您运行的任何处理器进行优化。这在汇编程序中是一项艰苦的工作。

你仍然可以在汇编程序中做一些编译器无法做的优化，一个编写良好的汇编程序算法将会和它的C等效程序一样快或更快。更大的问题是:这样做值得吗?

Ultimately though assembler was a product of its time and was more popular at a time when CPU cycles were expensive. Nowadays a CPU that costs $5-10 to manufacture (Intel Atom) can do pretty much anything anyone could want. The only real reason to write assembler these days is for low level things like some parts of an operating system (even so the vast majority of the Linux kernel is written in C), device drivers, possibly embedded devices (although C tends to dominate there too) and so on. Or just for kicks (which is somewhat masochistic).

2009-02-23 13:35:59

什么时候汇编比C快?

推荐文章

最新文章

标签