如何检查字符串是否为unicode或ascii?

我必须在Python中做什么来找出字符串的编码?

当前回答

Unicode不是一种编码——引用Kumar McMillan的话:

如果ASCII, UTF-8和其他字节字符串是“text”… .．.那么Unicode就是“文本性”; 它是文本的抽象形式

读一读McMillan在PyCon 2008上的Unicode In Python，完全解密的演讲，它比Stack Overflow上的大多数相关答案更好地解释了事情。

2012-05-21 14:12:19

其他回答

如何判断一个对象是unicode字符串还是字节字符串

可以使用type或isinstance。

在Python 2中:

>>> type(u'abc')  # Python 2 unicode string literal
<type 'unicode'>
>>> type('abc')   # Python 2 byte string literal
<type 'str'>

在python2中，str只是一个字节序列。巨蟒不知道它的编码是。unicode类型是存储文本的更安全的方式。如果你想了解更多，我推荐http://farmdev.com/talks/unicode/。

在Python 3中:

>>> type('abc')   # Python 3 unicode string literal
<class 'str'>
>>> type(b'abc')  # Python 3 byte string literal
<class 'bytes'>

在Python 3中，str类似于Python 2的unicode，并且用于存储文本。在Python 2中被称为str的东西在Python 3中被称为bytes。

如何判断一个字节字符串是有效的utf-8或ascii

你可以调用decode。如果它引发UnicodeDecodeError异常，则它是无效的。

>>> u_umlaut = b'\xc3\x9c'   # UTF-8 representation of the letter 'Ü'
>>> u_umlaut.decode('utf-8')
u'\xdc'
>>> u_umlaut.decode('ascii')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 0: ordinal not in range(128)

2011-02-13 22:33:39

您可以使用通用编码检测器，但请注意，它只会给您最好的猜测，而不是实际的编码，因为不可能知道字符串“abc”的编码。您将需要在其他地方获取编码信息，例如HTTP协议使用内容类型报头。

2011-02-13 22:34:55

use:

import six
if isinstance(obj, six.text_type)

在六个库中，它被表示为:

if PY3:
    string_types = str,
else:
    string_types = basestring,

2016-08-08 08:50:49

在python3中，所有字符串都是Unicode字符的序列。有一种bytes类型保存原始字节。

在python2中，字符串的类型可以是str或unicode。你可以用如下代码来区分:

def whatisthis(s):
    if isinstance(s, str):
        print "ordinary string"
    elif isinstance(s, unicode):
        print "unicode string"
    else:
        print "not a string"

这并不区分“Unicode或ASCII”;它只区分Python类型。Unicode字符串可以由ASCII范围内的纯字符组成，字节字符串可以包含ASCII、编码的Unicode，甚至是非文本数据。

2011-02-13 22:40:50

对于py2/py3兼容性，只需使用

进口六如果isinstance(obj, six.text_type)

2018-05-28 11:56:41

如何检查字符串是否为unicode或ascii?

推荐文章

最新文章

标签