似乎应该有一种比以下更简单的方法:
import string
s = "string. With. Punctuation?" # Sample string
out = s.translate(string.maketrans("",""), string.punctuation)
有?
似乎应该有一种比以下更简单的方法:
import string
s = "string. With. Punctuation?" # Sample string
out = s.translate(string.maketrans("",""), string.punctuation)
有?
当前回答
字符串标点符号仅为ASCII!一种更正确(但也慢得多)的方法是使用unicodedata模块:
# -*- coding: utf-8 -*-
from unicodedata import category
s = u'String — with - «punctation »...'
s = ''.join(ch for ch in s if category(ch)[0] != 'P')
print 'stripped', s
您还可以概括和剥离其他类型的字符:
''.join(ch for ch in s if category(ch)[0] not in 'SP')
它还将删除~*+§$等字符,这些字符可能是“标点符号”,也可能不是“标点符号。
其他回答
我在寻找一个非常简单的解决方案。这是我得到的:
import re
s = "string. With. Punctuation?"
s = re.sub(r'[\W\s]', ' ', s)
print(s)
'string With Punctuation '
使用Python从文本文件中删除停止词
print('====THIS IS HOW TO REMOVE STOP WORS====')
with open('one.txt','r')as myFile:
str1=myFile.read()
stop_words ="not", "is", "it", "By","between","This","By","A","when","And","up","Then","was","by","It","If","can","an","he","This","or","And","a","i","it","am","at","on","in","of","to","is","so","too","my","the","and","but","are","very","here","even","from","them","then","than","this","that","though","be","But","these"
myList=[]
myList.extend(str1.split(" "))
for i in myList:
if i not in stop_words:
print ("____________")
print(i,end='\n')
正则表达式很简单,如果你知道的话。
import re
s = "string. With. Punctuation?"
s = re.sub(r'[^\w\s]','',s)
您也可以这样做:
import string
' '.join(word.strip(string.punctuation) for word in 'text'.split())
>>> s = "string. With. Punctuation?"
>>> s = re.sub(r'[^\w\s]','',s)
>>> re.split(r'\s*', s)
['string', 'With', 'Punctuation']