UTF-8和UTF-8与BOM有什么区别?

UTF-8和UTF-8与BOM有什么不同?哪个更好?

当前回答

这个问题已经有了无数个答案，其中许多答案都很好，但我想尝试并澄清何时应该使用BOM，何时不应该使用BOM。

如前所述，任何使用UTF BOM(字节顺序标记)来确定字符串是否为UTF-8的方法都是有根据的猜测。如果有适当的元数据可用(如charset="utf-8")，那么您已经知道应该使用什么，但除此之外，您还需要进行测试并做出一些假设。这涉及到检查字符串来自的文件是否以十六进制字节码EF BB BF开头。

If a byte code corresponding to the UTF-8 BOM is found, the probability is high enough to assume it's UTF-8 and you can go from there. When forced to make this guess, however, additional error checking while reading would still be a good idea in case something comes up garbled. You should only assume a BOM is not UTF-8 (i.e. latin-1 or ANSI) if the input definitely shouldn't be UTF-8 based on its source. If there is no BOM, however, you can simply determine whether it's supposed to be UTF-8 by validating against the encoding.

为什么不推荐使用BOM ?

不支持unicode或兼容性较差的软件可能会假定它是latin-1或ANSI，并且不会从字符串中剥离BOM，这显然会导致问题。这并不是真正需要的(只要检查内容是否兼容，并且在找不到兼容编码时总是使用UTF-8作为备用)

什么时候应该使用BOM编码?

如果您无法以任何其他方式(通过字符集标记或文件系统元)记录元数据，并且像使用BOM一样使用程序，则应该使用BOM进行编码。在Windows上尤其如此，没有BOM的任何东西通常都被认为使用了遗留代码页。BOM告诉Office等程序，是的，这个文件中的文本是Unicode;这是使用的编码。

归根结底，我唯一真正有问题的文件是CSV。根据程序的不同，它必须或必须没有BOM。例如，如果你在Windows上使用Excel 2007+，如果你想要顺利地打开它，而不必求助于导入数据，它必须用BOM编码。

2016-01-25 16:03:13

其他回答

BOM倾向于在某个地方爆炸(没有双关语)。当它突然出现时(例如，无法被浏览器、编辑器等识别)，它会以奇怪的字符ï»¿出现在文档的开头(例如，HTML文件、JSON响应、RSS等)，并导致类似于最近奥巴马在Twitter上谈话时经历的编码问题那样的尴尬。

当它出现在难以调试的地方或当测试被忽略时，这是非常令人讨厌的。所以除非必须使用，否则最好避免使用。

2011-07-11 07:56:16

从http://en.wikipedia.org/wiki/Byte-order_mark:

字节顺序标记(BOM)是一个Unicode 符号的符号文本文件的字节顺序或流。其编码点为U+FEFF。 BOM使用是可选的，如果使用，应该出现在文本的开头吗流。除了它的特殊用途字节顺序指示器，即BOM 字符也可以指示哪一个几种Unicode表示文本是用。

总是在文件中使用BOM将确保它总是在支持UTF-8和BOM的编辑器中正确打开。

我对缺少BOM的真正问题如下。假设我们有一个文件，它包含:

abc

如果没有BOM，在大多数编辑器中它会作为ANSI打开。所以这个文件的另一个用户打开它，并添加一些本机字符，例如:

abg-αβγ

哎呀……现在文件仍然在ANSI中，你猜怎么着，“αβγ”不占用6个字节，而是3个字节。这不是UTF-8，这会在开发链的后面引起其他问题。

2010-02-08 18:31:00

Unicode字节顺序标记(BOM)常见问题解答提供了一个简明的答案:

Q: How I should deal with BOMs? A: Here are some guidelines to follow: A particular protocol (e.g. Microsoft conventions for .txt files) may require use of the BOM on certain Unicode data streams, such as files. When you need to conform to such a protocol, use a BOM. Some protocols allow optional BOMs in the case of untagged text. In those cases, Where a text data stream is known to be plain text, but of unknown encoding, BOM can be used as a signature. If there is no BOM, the encoding could be anything. Where a text data stream is known to be plain Unicode text (but not which endian), then BOM can be used as a signature. If there is no BOM, the text should be interpreted as big-endian. Some byte oriented protocols expect ASCII characters at the beginning of a file. If UTF-8 is used with these protocols, use of the BOM as encoding form signature should be avoided. Where the precise type of the data stream is known (e.g. Unicode big-endian or Unicode little-endian), the BOM should not be used. In particular, whenever a data stream is declared to be UTF-16BE, UTF-16LE, UTF-32BE or UTF-32LE a BOM must not be used.

2018-03-08 13:58:08

一个实际的区别是，如果你为Mac OS X编写一个shell脚本，并将其保存为普通的UTF-8，你将得到响应:

#!/bin/bash: No such file or directory

在shebang行指定您希望使用哪个shell的响应中:

#!/bin/bash

如果你保存为UTF-8，没有BOM(说在BBEdit)，一切都会很好。

2014-01-24 20:38:21

没有BOM的UTF-8没有BOM，这并不意味着它比有BOM的UTF-8更好，除非文件的消费者需要知道(或者从知道中受益)文件是否是UTF-8编码的。

BOM通常用于确定编码的字节序，这对于大多数用例来说是不需要的。

此外，对于那些不了解或不关心BOM的消费者来说，BOM可能是不必要的噪音/痛苦，并可能导致用户困惑。

2010-02-08 18:30:19

UTF-8和UTF-8与BOM有什么区别?

推荐文章

最新文章

标签