如何使用python3搜索和替换文件中的文本?
这是我的代码:
import os
import sys
import fileinput
print ("Text to search for:")
textToSearch = input( "> " )
print ("Text to replace it with:")
textToReplace = input( "> " )
print ("File to perform Search-Replace on:")
fileToSearch = input( "> " )
#fileToSearch = 'D:\dummy1.txt'
tempFile = open( fileToSearch, 'r+' )
for line in fileinput.input( fileToSearch ):
if textToSearch in line :
print('Match Found')
else:
print('Match Not Found!!')
tempFile.write( line.replace( textToSearch, textToReplace ) )
tempFile.close()
input( '\n\n Press Enter to exit...' )
输入文件:
hi this is abcd hi this is abcd
This is dummy text file.
This is how search and replace works abcd
当我在上面的输入文件中搜索并将“ram”替换为“abcd”时,它就像一个咒语。但当我反过来做,即替换'abcd'由'ram',一些垃圾字符被留在最后。
将'abcd'替换为'ram'
hi this is ram hi this is ram
This is dummy text file.
This is how search and replace works rambcd
Besides the answers already mentioned, here is an explanation of why you have some random characters at the end:
You are opening the file in r+ mode, not w mode. The key difference is that w mode clears the contents of the file as soon as you open it, whereas r+ doesn't.
This means that if your file content is "123456789" and you write "www" to it, you get "www456789". It overwrites the characters with the new input, but leaves any remaining input untouched.
You can clear a section of the file contents by using truncate(<startPosition>), but you are probably best off saving the updated file content to a string first, then doing truncate(0) and writing it all at once.
Or you can use my library :D
正如michaelb958所指出的,不能用不同长度的数据替换现有的部分,因为这会使其余部分不合适。我不同意其他人建议你从一个文件读到另一个文件。相反,我将把文件读入内存,修复数据,然后在单独的步骤中将其写入相同的文件。
# Read in the file
with open('file.txt', 'r') as file :
filedata = file.read()
# Replace the target string
filedata = filedata.replace('abcd', 'ram')
# Write the file out again
with open('file.txt', 'w') as file:
file.write(filedata)
除非你有一个巨大的文件要处理,它太大了,无法一次性加载到内存中,或者你担心如果在向文件写入数据的第二步过程中中断,可能会导致数据丢失。
你可以在python中使用sed、awk或grep(有一些限制)。这里有一个非常简单的例子。它在文件中把香蕉变成香蕉牙膏。你可以编辑和使用它。(我测试过了…注意:如果你在Windows下测试,你应该先安装sed命令并设置路径)
import os
file="a.txt"
oldtext="Banana"
newtext=" BananaToothpaste"
os.system('sed -i "s/{}/{}/g" {}'.format(oldtext,newtext,file))
#print(f'sed -i "s/{oldtext}/{newtext}/g" {file}')
print('This command was applied: sed -i "s/{}/{}/g" {}'.format(oldtext,newtext,file))
如果你想直接在文件上看到结果,应用:"type" for windows/ "cat" for linux:
####FOR WINDOWS:
os.popen("type " + file).read()
####FOR LINUX:
os.popen("cat " + file).read()
使用re.subn可以对替换过程进行更多的控制,例如将单词分成两行,区分大小写的匹配。此外,它返回匹配的数量,如果没有找到字符串,可以使用这些匹配来避免浪费资源。
import re
file = # path to file
# they can be also raw string and regex
textToSearch = r'Ha.*O' # here an example with a regex
textToReplace = 'hallo'
# read and replace
with open(file, 'r') as fd:
# sample case-insensitive find-and-replace
text, counter = re.subn(textToSearch, textToReplace, fd.read(), re.I)
# check if there is at least a match
if counter > 0:
# edit the file
with open(file, 'w') as fd:
fd.write(text)
# summary result
print(f'{counter} occurence of "{textToSearch}" were replaced with "{textToReplace}".')
一些正则表达式:
添加re.I标志,re.IGNORECASE的缩写形式,用于不区分大小写的匹配
对于多行替换re.subn(r'\n*'.join(textToSearch), textToReplace, fd.read())),取决于数据也'\n{,1}'。注意,在这种情况下,textToSearch必须是纯字符串,而不是正则表达式!
我稍微修改了Jayram Singh的帖子,以替换每一个'!'字符转换为一个数字,我想在每个实例中增加这个数字。我想这对那些想要修改每行出现不止一次的字符并且想要迭代的人可能会有帮助。希望这能帮助到别人。PS-我在编码方面很新,所以如果我的帖子在任何方面都不合适,我很抱歉,但这对我来说是有效的。
f1 = open('file1.txt', 'r')
f2 = open('file2.txt', 'w')
n = 1
# if word=='!'replace w/ [n] & increment n; else append same word to
# file2
for line in f1:
for word in line:
if word == '!':
f2.write(word.replace('!', f'[{n}]'))
n += 1
else:
f2.write(word)
f1.close()
f2.close()