按子字符串条件筛选panda DataFrame

我有一个熊猫DataFrame，其中包含一列字符串值。我需要根据部分字符串匹配来选择行。

类似于这个成语：

re.search(pattern, cell_in_question)

返回布尔值。我熟悉df[df['A']==“helloworld”]的语法，但似乎找不到一种方法来处理部分字符串匹配，比如“hello”。

当前回答

假设您有以下DataFrame：

>>> df = pd.DataFrame([['hello', 'hello world'], ['abcd', 'defg']], columns=['a','b'])
>>> df
       a            b
0  hello  hello world
1   abcd         defg

您始终可以在lambda表达式中使用in运算符来创建筛选器。

>>> df.apply(lambda x: x['a'] in x['b'], axis=1)
0     True
1    False
dtype: bool

这里的技巧是在apply中使用axis=1选项，将元素逐行传递给lambda函数，而不是逐列传递。

2014-11-10 19:26:27

其他回答

这是我最后为部分字符串匹配所做的。如果有人有更有效的方法，请告诉我。

def stringSearchColumn_DataFrame(df, colName, regex):
    newdf = DataFrame()
    for idx, record in df[colName].iteritems():

        if re.search(regex, record):
            newdf = concat([df[df[colName] == record], newdf], ignore_index=True)

    return newdf

2012-07-06 17:08:46

矢量化字符串方法（即Series.str）允许您执行以下操作：

df[df['A'].str.contains("hello")]

这在熊猫0.8.1及以上版本中可用。

2012-07-17 21:52:18

我的2c价值：

我执行了以下操作：

sale_method = pd.DataFrame(model_data['Sale Method'].str.upper())
sale_method['sale_classification'] = \
    np.where(sale_method['Sale Method'].isin(['PRIVATE']),
             'private',
             np.where(sale_method['Sale Method']
                      .str.contains('AUCTION'),
                      'auction',
                      'other'
             )
    )

2021-06-24 01:40:21

快速提示：如果要基于索引中包含的部分字符串进行选择，请尝试以下操作：

df['stridx']=df.index
df[df['stridx'].str.contains("Hello|Britain")]

2014-04-10 15:36:14

假设我们在数据帧df中有一个名为“ENTITY”的列。我们可以过滤df，以获得整个数据帧df，其中“实体”列的行不包含“DM”，方法如下：

mask = df['ENTITY'].str.contains('DM')

df = df.loc[~(mask)].copy(deep=True)

2021-03-30 12:06:24

按子字符串条件筛选panda DataFrame

推荐文章

最新文章

标签