从字符串中删除标点符号的最佳方法

似乎应该有一种比以下更简单的方法：

import string
s = "string. With. Punctuation?" # Sample string 
out = s.translate(string.maketrans("",""), string.punctuation)

有？

当前回答

我还没有看到这个答案。只需使用正则表达式；它删除了除单词字符（\w）和数字字符（\d）之外的所有字符，后跟一个空白字符（\s）：

import re
s = "string. With. Punctuation?" # Sample string 
out = re.sub(ur'[^\w\d\s]+', '', s)

2016-06-18 06:38:57

其他回答

正则表达式很简单，如果你知道的话。

import re
s = "string. With. Punctuation?"
s = re.sub(r'[^\w\s]','',s)

2013-05-28 18:47:47

字符串标点符号仅为ASCII！一种更正确（但也慢得多）的方法是使用unicodedata模块：

# -*- coding: utf-8 -*-
from unicodedata import category
s = u'String — with -  «punctation »...'
s = ''.join(ch for ch in s if category(ch)[0] != 'P')
print 'stripped', s

您还可以概括和剥离其他类型的字符：

''.join(ch for ch in s if category(ch)[0] not in 'SP')

它还将删除~*+§$等字符，这些字符可能是“标点符号”，也可能不是“标点符号。

2011-09-01 09:29:45

myString.translate(None, string.punctuation)

2010-03-08 15:19:09

>>> s = "string. With. Punctuation?"
>>> s = re.sub(r'[^\w\s]','',s)
>>> re.split(r'\s*', s)


['string', 'With', 'Punctuation']

2016-08-24 05:43:58

考虑unicode。代码已在python3中检查。

from unicodedata import category
text = 'hi, how are you?'
text_without_punc = ''.join(ch for ch in text if not category(ch).startswith('P'))

2020-06-04 05:08:05

从字符串中删除标点符号的最佳方法

推荐文章

最新文章

标签