如何提取两个标记之间的子字符串?

假设我有一个字符串'gfgfdAAA1234ZZZuijjk'，我想提取'1234'部分。

我只知道在AAA之前的几个字符，以及在ZZZ之后的我感兴趣的部分1234。

使用sed，可以对字符串执行如下操作:

echo "$STRING" | sed -e "s|.*AAA\(.*\)ZZZ.*|\1|"

结果是1234。

如何在Python中做同样的事情?

当前回答

打印稿。获取两个字符串之间的字符串。

搜索前缀和后缀之间的最短字符串

前缀-字符串/字符串数组/ null(意味着从开始搜索)。

Postfixes -字符串/字符串数组/ null(意味着搜索直到结束)。

public getStringInBetween(str: string, prefixes: string | string[] | null,
                          postfixes: string | string[] | null): string {

    if (typeof prefixes === 'string') {
        prefixes = [prefixes];
    }

    if (typeof postfixes === 'string') {
        postfixes = [postfixes];
    }

    if (!str || str.length < 1) {
        throw new Error(str + ' should contain ' + prefixes);
    }

    let start = prefixes === null ? { pos: 0, sub: '' } : this.indexOf(str, prefixes);
    const end = postfixes === null ? { pos: str.length, sub: '' } : this.indexOf(str, postfixes, start.pos + start.sub.length);

    let value = str.substring(start.pos + start.sub.length, end.pos);
    if (!value || value.length < 1) {
        throw new Error(str + ' should contain string in between ' + prefixes + ' and ' + postfixes);
    }

    while (true) {
        try {
            start = this.indexOf(value, prefixes);
        } catch (e) {
            break;
        }
        value = value.substring(start.pos + start.sub.length);
        if (!value || value.length < 1) {
            throw new Error(str + ' should contain string in between ' + prefixes + ' and ' + postfixes);
        }
    }

    return value;
}

2020-09-04 11:16:46

其他回答

以防有人要做和我一样的事。我必须在一行中提取圆括号内的所有内容。例如，如果我有这样一句话，‘美国总统(巴拉克·奥巴马)会见了……，我只想得到“巴拉克·奥巴马”，这是解决方案:

regex = '.*\((.*?)\).*'
matches = re.search(regex, line)
line = matches.group(1) + '\n'

也就是说，你需要用斜杠\符号来阻止括号。尽管这是一个关于更多正则表达式的问题。

此外，在某些情况下，你可能会在正则表达式定义之前看到'r'符号。如果没有r前缀，你需要像在c中那样使用转义字符。这里有更多关于这个的讨论。

2014-01-19 19:29:00

>>> s = '/tmp/10508.constantstring'
>>> s.split('/tmp/')[1].split('constantstring')[0].strip('.')

2014-02-08 00:12:43

如果没有匹配则返回其他字符串的一行。编辑:改进版本使用next函数，如果需要，将“not-found”替换为其他内容:

import re
res = next( (m.group(1) for m in [re.search("AAA(.*?)ZZZ", "gfgfdAAA1234ZZZuijjk" ),] if m), "not-found" )

我的另一个方法来做这个，不太理想，使用regex第二次，仍然没有找到一个更短的方法:

import re
res = ( ( re.search("AAA(.*?)ZZZ", "gfgfdAAA1234ZZZuijjk") or re.search("()","") ).group(1) )

2017-12-07 00:55:20

正则表达式

import re

re.search(r"(?<=AAA).*?(?=ZZZ)", your_text).group(0)

如果your_text中没有“AAA”和“ZZZ”，上述as-is将失败，并出现AttributeError

字符串的方法

your_text.partition("AAA")[2].partition("ZZZ")[0]

如果your_text中不存在"AAA"或"ZZZ"，上述函数将返回空字符串。

PS Python挑战?

2011-02-06 23:43:17

此外，您可以在波纹函数中找到所有的组合

s = 'Part 1. Part 2. Part 3 then more text'
def find_all_places(text,word):
    word_places = []
    i=0
    while True:
        word_place = text.find(word,i)
        i+=len(word)+word_place
        if i>=len(text):
            break
        if word_place<0:
            break
        word_places.append(word_place)
    return word_places
def find_all_combination(text,start,end):
    start_places = find_all_places(text,start)
    end_places = find_all_places(text,end)
    combination_list = []
    for start_place in start_places:
        for end_place in end_places:
            print(start_place)
            print(end_place)
            if start_place>=end_place:
                continue
            combination_list.append(text[start_place:end_place])
    return combination_list
find_all_combination(s,"Part","Part")

结果:

['Part 1. ', 'Part 1. Part 2. ', 'Part 2. ']

2021-10-05 19:02:30

如何提取两个标记之间的子字符串?

推荐文章

最新文章

标签