按子字符串条件筛选panda DataFrame

我有一个熊猫DataFrame，其中包含一列字符串值。我需要根据部分字符串匹配来选择行。

类似于这个成语：

re.search(pattern, cell_in_question)

返回布尔值。我熟悉df[df['A']==“helloworld”]的语法，但似乎找不到一种方法来处理部分字符串匹配，比如“hello”。

当前回答

如果需要对panda数据帧列中的字符串进行不区分大小写的搜索：

df[df['A'].str.contains("hello", case=False)]

2020-04-29 17:31:18

其他回答

我的2c价值：

我执行了以下操作：

sale_method = pd.DataFrame(model_data['Sale Method'].str.upper())
sale_method['sale_classification'] = \
    np.where(sale_method['Sale Method'].isin(['PRIVATE']),
             'private',
             np.where(sale_method['Sale Method']
                      .str.contains('AUCTION'),
                      'auction',
                      'other'
             )
    )

2021-06-24 01:40:21

如果需要对panda数据帧列中的字符串进行不区分大小写的搜索：

df[df['A'].str.contains("hello", case=False)]

2020-04-29 17:31:18

如果有人想知道如何执行相关问题：“按部分字符串选择列”

Use:

df.filter(like='hello')  # select columns which contain the word hello

若要通过部分字符串匹配来选择行，请将axis=0传递到筛选器：

# selects rows which contain the word hello in their index label
df.filter(like='hello', axis=0)

2016-10-12 21:04:32

一个更一般的例子-如果在字符串中查找单词或特定单词的部分：

df = pd.DataFrame([('cat andhat', 1000.0), ('hat', 2000000.0), ('the small dog', 1000.0), ('fog', 330000.0),('pet', 330000.0)], columns=['col1', 'col2'])

句子或单词的特定部分：

searchfor = '.*cat.*hat.*|.*the.*dog.*'

创建显示受影响行的列（可以根据需要过滤掉）

df["TrueFalse"]=df['col1'].str.contains(searchfor, regex=True)

    col1             col2           TrueFalse
0   cat andhat       1000.0         True
1   hat              2000000.0      False
2   the small dog    1000.0         True
3   fog              330000.0       False
4   pet 3            30000.0        False

2021-02-16 09:41:59

快速提示：如果要基于索引中包含的部分字符串进行选择，请尝试以下操作：

df['stridx']=df.index
df[df['stridx'].str.contains("Hello|Britain")]

2014-04-10 15:36:14

按子字符串条件筛选panda DataFrame

推荐文章

最新文章

标签