从字符串中删除标点符号的最佳方法

似乎应该有一种比以下更简单的方法：

import string
s = "string. With. Punctuation?" # Sample string 
out = s.translate(string.maketrans("",""), string.punctuation)

有？

当前回答

对于严肃的自然语言处理（NLP），您应该让像SpaCy这样的库通过标记化处理标点符号，然后您可以根据需要手动调整。

例如，您希望如何处理单词中的连字符？例外情况，如缩写？开始和结束引号？URL？在NLP中，将“let’s”这样的收缩分隔为“let”和“s”以进行进一步处理通常很有用。

2022-03-31 01:53:41

其他回答

正则表达式很简单，如果你知道的话。

import re
s = "string. With. Punctuation?"
s = re.sub(r'[^\w\s]','',s)

2013-05-28 18:47:47

字符串标点符号仅为ASCII！一种更正确（但也慢得多）的方法是使用unicodedata模块：

# -*- coding: utf-8 -*-
from unicodedata import category
s = u'String — with -  «punctation »...'
s = ''.join(ch for ch in s if category(ch)[0] != 'P')
print 'stripped', s

您还可以概括和剥离其他类型的字符：

''.join(ch for ch in s if category(ch)[0] not in 'SP')

它还将删除~*+§$等字符，这些字符可能是“标点符号”，也可能不是“标点符号。

2011-09-01 09:29:45

>>> s = "string. With. Punctuation?"
>>> s = re.sub(r'[^\w\s]','',s)
>>> re.split(r'\s*', s)


['string', 'With', 'Punctuation']

2016-08-24 05:43:58

下面是Python 3.5的一行代码：

import string
"l*ots! o(f. p@u)n[c}t]u[a'ti\"on#$^?/".translate(str.maketrans({a:None for a in string.punctuation}))

2016-03-21 02:46:47

这是我写的一个函数。它不是很有效，但很简单，您可以添加或删除任何您想要的标点符号：

def stripPunc(wordList):
    """Strips punctuation from list of words"""
    puncList = [".",";",":","!","?","/","\\",",","#","@","$","&",")","(","\""]
    for punc in puncList:
        for word in wordList:
            wordList=[word.replace(punc,'') for word in wordList]
    return wordList

2015-09-22 14:30:47

从字符串中删除标点符号的最佳方法

推荐文章

最新文章

标签