我想使用.replace函数替换多个字符串。
我目前有
string.replace("condition1", "")
但想要一些像
string.replace("condition1", "").replace("condition2", "text")
尽管这样的语法感觉不太好
正确的做法是什么?有点像在grep/regex中,你可以用\1和\2来替换某些搜索字符串的字段
我想使用.replace函数替换多个字符串。
我目前有
string.replace("condition1", "")
但想要一些像
string.replace("condition1", "").replace("condition2", "text")
尽管这样的语法感觉不太好
正确的做法是什么?有点像在grep/regex中,你可以用\1和\2来替换某些搜索字符串的字段
当前回答
我需要一个解决方案,其中字符串可以被替换为正则表达式, 例如,通过将多个空格字符替换为一个空格字符来帮助规范化长文本。根据其他人(包括MiniQuark和mmj)的一系列答案,我得出了以下结论:
def multiple_replace(string, reps, re_flags = 0):
""" Transforms string, replacing keys from re_str_dict with values.
reps: dictionary, or list of key-value pairs (to enforce ordering;
earlier items have higher priority).
Keys are used as regular expressions.
re_flags: interpretation of regular expressions, such as re.DOTALL
"""
if isinstance(reps, dict):
reps = reps.items()
pattern = re.compile("|".join("(?P<_%d>%s)" % (i, re_str[0])
for i, re_str in enumerate(reps)),
re_flags)
return pattern.sub(lambda x: reps[int(x.lastgroup[1:])][1], string)
它适用于其他答案中给出的例子,例如:
>>> multiple_replace("(condition1) and --condition2--",
... {"condition1": "", "condition2": "text"})
'() and --text--'
>>> multiple_replace('hello, world', {'hello' : 'goodbye', 'world' : 'earth'})
'goodbye, earth'
>>> multiple_replace("Do you like cafe? No, I prefer tea.",
... {'cafe': 'tea', 'tea': 'cafe', 'like': 'prefer'})
'Do you prefer tea? No, I prefer cafe.'
对我来说,最重要的是你也可以使用正则表达式,例如只替换整个单词,或规范化空白:
>>> s = "I don't want to change this name:\n Philip II of Spain"
>>> re_str_dict = {r'\bI\b': 'You', r'[\n\t ]+': ' '}
>>> multiple_replace(s, re_str_dict)
"You don't want to change this name: Philip II of Spain"
如果你想使用字典键作为普通字符串, 你可以在调用multiple_replace之前转义这些,例如使用下面的函数:
def escape_keys(d):
""" transform dictionary d by applying re.escape to the keys """
return dict((re.escape(k), v) for k, v in d.items())
>>> multiple_replace(s, escape_keys(re_str_dict))
"I don't want to change this name:\n Philip II of Spain"
下面的函数可以帮助在你的字典键中找到错误的正则表达式(因为来自multiple_replace的错误消息不是很明显):
def check_re_list(re_list):
""" Checks if each regular expression in list is well-formed. """
for i, e in enumerate(re_list):
try:
re.compile(e)
except (TypeError, re.error):
print("Invalid regular expression string "
"at position {}: '{}'".format(i, e))
>>> check_re_list(re_str_dict.keys())
请注意,它没有链接替换,而是同时执行它们。这使得它更有效率,而不会限制它能做什么。为了模仿链接的效果,你可能只需要添加更多的字符串替换对,并确保这些对的预期顺序:
>>> multiple_replace("button", {"but": "mut", "mutton": "lamb"})
'mutton'
>>> multiple_replace("button", [("button", "lamb"),
... ("but", "mut"), ("mutton", "lamb")])
'lamb'
其他回答
你真的不应该这样做,但我觉得这太酷了:
>>> replacements = {'cond1':'text1', 'cond2':'text2'}
>>> cmd = 'answer = s'
>>> for k,v in replacements.iteritems():
>>> cmd += ".replace(%s, %s)" %(k,v)
>>> exec(cmd)
现在,答案是所有替换的结果
再说一次,这是非常俗气的,不是你应该经常使用的东西。但我很高兴知道如果你需要的话,你可以这样做。
sentence='its some sentence with a something text'
def replaceAll(f,Array1,Array2):
if len(Array1)==len(Array2):
for x in range(len(Array1)):
return f.replace(Array1[x],Array2[x])
newSentence=replaceAll(sentence,['a','sentence','something'],['another','sentence','something something'])
print(newSentence)
或者简单说一下:
for line in to_read:
read_buffer = line
stripped_buffer1 = read_buffer.replace("term1", " ")
stripped_buffer2 = stripped_buffer1.replace("term2", " ")
write_to_file = to_write.write(stripped_buffer2)
下面是一个简短的例子,应该做的技巧与正则表达式:
import re
rep = {"condition1": "", "condition2": "text"} # define desired replacements here
# use these three lines to do the replacement
rep = dict((re.escape(k), v) for k, v in rep.iteritems())
#Python 3 renamed dict.iteritems to dict.items so use rep.items() for latest versions
pattern = re.compile("|".join(rep.keys()))
text = pattern.sub(lambda m: rep[re.escape(m.group(0))], text)
例如:
>>> pattern.sub(lambda m: rep[re.escape(m.group(0))], "(condition1) and --condition2--")
'() and --text--'
我的方法是首先将字符串标记化,然后决定每个标记是否包含它。
潜在地,如果我们可以假设一个hashmap/set的O(1)查找,可能会更好:
remove_words = {"we", "this"}
target_sent = "we should modify this string"
target_sent_words = target_sent.split()
filtered_sent = " ".join(list(filter(lambda word: word not in remove_words, target_sent_words)))
Filtered_sent现在是'应该修改字符串'