我想使用.replace函数替换多个字符串。
我目前有
string.replace("condition1", "")
但想要一些像
string.replace("condition1", "").replace("condition2", "text")
尽管这样的语法感觉不太好
正确的做法是什么?有点像在grep/regex中,你可以用\1和\2来替换某些搜索字符串的字段
我想使用.replace函数替换多个字符串。
我目前有
string.replace("condition1", "")
但想要一些像
string.replace("condition1", "").replace("condition2", "text")
尽管这样的语法感觉不太好
正确的做法是什么?有点像在grep/regex中,你可以用\1和\2来替换某些搜索字符串的字段
当前回答
这是我的0.02美元。它基于Andrew Clark的答案,只是更清楚一点,它还涵盖了当一个字符串被替换为另一个字符串的子字符串时的情况(更长的字符串胜出)
def multireplace(string, replacements):
"""
Given a string and a replacement map, it returns the replaced string.
:param str string: string to execute replacements on
:param dict replacements: replacement dictionary {value to find: value to replace}
:rtype: str
"""
# Place longer ones first to keep shorter substrings from matching
# where the longer ones should take place
# For instance given the replacements {'ab': 'AB', 'abc': 'ABC'} against
# the string 'hey abc', it should produce 'hey ABC' and not 'hey ABc'
substrs = sorted(replacements, key=len, reverse=True)
# Create a big OR regex that matches any of the substrings to replace
regexp = re.compile('|'.join(map(re.escape, substrs)))
# For each match, look up the new string in the replacements
return regexp.sub(lambda match: replacements[match.group(0)], string)
这就是这个要点,如果你有任何建议,请随意修改。
其他回答
在我的情况下,我需要一个简单的唯一键替换名称,所以我想到了这个:
a = 'This is a test string.'
b = {'i': 'I', 's': 'S'}
for x,y in b.items():
a = a.replace(x, y)
>>> a
'ThIS IS a teSt StrIng.'
注意:测试你的案例,见注释。
这里有一个例子,它在长弦上更有效,有许多小的替换。
source = "Here is foo, it does moo!"
replacements = {
'is': 'was', # replace 'is' with 'was'
'does': 'did',
'!': '?'
}
def replace(source, replacements):
finder = re.compile("|".join(re.escape(k) for k in replacements.keys())) # matches every string we want replaced
result = []
pos = 0
while True:
match = finder.search(source, pos)
if match:
# cut off the part up until match
result.append(source[pos : match.start()])
# cut off the matched part and replace it in place
result.append(replacements[source[match.start() : match.end()]])
pos = match.end()
else:
# the rest after the last match
result.append(source[pos:])
break
return "".join(result)
print replace(source, replacements)
关键是要避免长字符串的多次连接。我们将源字符串切成片段,在我们形成列表时替换一些片段,然后将整个字符串连接回字符串。
为什么没有这样的解决方案呢?
s = "The quick brown fox jumps over the lazy dog"
for r in (("brown", "red"), ("lazy", "quick")):
s = s.replace(*r)
#output will be: The quick red fox jumps over the quick dog
您可以使用pandas库和replace函数,它既支持精确匹配,也支持正则表达式替换。例如:
df = pd.DataFrame({'text': ['Billy is going to visit Rome in November', 'I was born in 10/10/2010', 'I will be there at 20:00']})
to_replace=['Billy','Rome','January|February|March|April|May|June|July|August|September|October|November|December', '\d{2}:\d{2}', '\d{2}/\d{2}/\d{4}']
replace_with=['name','city','month','time', 'date']
print(df.text.replace(to_replace, replace_with, regex=True))
修改后的文本为:
0 name is going to visit city in month
1 I was born in date
2 I will be there at time
你可以在这里找到一个例子。请注意,文本上的替换是按照它们在列表中出现的顺序进行的
另一个例子: 输入列表
error_list = ['[br]', '[ex]', 'Something']
words = ['how', 'much[ex]', 'is[br]', 'the', 'fish[br]', 'noSomething', 'really']
期望的输出将是
words = ['how', 'much', 'is', 'the', 'fish', 'no', 'really']
代码:
[n[0][0] if len(n[0]) else n[1] for n in [[[w.replace(e,"") for e in error_list if e in w],w] for w in words]]