这是最简单的解释。这是我正在使用的:

re.split('\W', 'foo/bar spam\neggs')
>>> ['foo', 'bar', 'spam', 'eggs']

这是我想要的:

someMethod('\W', 'foo/bar spam\neggs')
>>> ['foo', '/', 'bar', ' ', 'spam', '\n', 'eggs']

原因是我想把一个字符串分割成令牌,操作它,然后再把它组合在一起。


当前回答

一个懒惰而简单的解决方案

假设你的正则表达式模式是split_pattern = r'(!|\?)'

首先,添加一些与新分隔符相同的字符,如'[cut]'

New_string = re.sub(split_pattern, '\\1[cut]', your_string)

然后拆分新的分隔符new_string.split('[cut]')

其他回答

如果你想拆分字符串,同时通过regex保留分隔符,而不捕获组:

def finditer_with_separators(regex, s):
    matches = []
    prev_end = 0
    for match in regex.finditer(s):
        match_start = match.start()
        if (prev_end != 0 or match_start > 0) and match_start != prev_end:
            matches.append(s[prev_end:match.start()])
        matches.append(match.group())
        prev_end = match.end()
    if prev_end < len(s):
        matches.append(s[prev_end:])
    return matches

regex = re.compile(r"[\(\)]")
matches = finditer_with_separators(regex, s)

如果假设regex被封装到捕获组中:

def split_with_separators(regex, s):
    matches = list(filter(None, regex.split(s)))
    return matches

regex = re.compile(r"([\(\)])")
matches = split_with_separators(regex, s)

这两种方法也将删除空组,在大多数情况下是无用和恼人的。

安装wrs时“不拆卸分离器”

pip install wrs

(由Rao Hamza开发)

import wrs
text  = "Now inbox “how to make spam ad” Invest in hard email marketing."
splitor = 'email | spam | inbox'
list = wrs.wr_split(splitor, text)
print(list)

结果: ['现在','收件箱'如何制作','垃圾广告'努力投资','电子邮件营销'。]

# This keeps all separators  in result 
##########################################################################
import re
st="%%(c+dd+e+f-1523)%%7"
sh=re.compile('[\+\-//\*\<\>\%\(\)]')

def splitStringFull(sh, st):
   ls=sh.split(st)
   lo=[]
   start=0
   for l in ls:
     if not l : continue
     k=st.find(l)
     llen=len(l)
     if k> start:
       tmp= st[start:k]
       lo.append(tmp)
       lo.append(l)
       start = k + llen
     else:
       lo.append(l)
       start =llen
   return lo
  #############################

li= splitStringFull(sh , st)
['%%(', 'c', '+', 'dd', '+', 'e', '+', 'f', '-', '1523', ')%%', '7']

将所有分隔符:(\W)替换为分隔符+ new_分隔符:(\W;) 由new_separator分隔符拆分:(;)

def split_and_keep(seperator, s):
  return re.split(';', re.sub(seperator, lambda match: match.group() + ';', s))

print('\W', 'foo/bar spam\neggs')
>>> line = 'hello_toto_is_there'
>>> sep = '_'
>>> [sep + x[1] if x[0] != 0 else x[1] for x in enumerate(line.split(sep))]
['hello', '_toto', '_is', '_there']