使用熊猫GroupBy?

我有一个数据帧df，我使用几列从它到groupby:

df['col1','col2','col3','col4'].groupby(['col1','col2']).mean()

在上面的方式，我几乎得到表(数据帧)，我需要。缺少的是包含每个组中的行数的附加列。换句话说，我有均值，但我也想知道有多少数字被用来得到这些均值。例如，第一组有8个值，第二组有10个值，以此类推。

简而言之:如何获得数据框架的分组统计数据?

当前回答

如果你熟悉tidyverse R包，这里有一种在python中实现它的方法:

from datar.all import tibble, rnorm, f, group_by, summarise, mean, n, rep

df = tibble(
  col1=rep(['A', 'B'], 5), 
  col2=rep(['C', 'D'], each=5), 
  col3=rnorm(10), 
  col4=rnorm(10)
)
df >> group_by(f.col1, f.col2) >> summarise(
  count=n(),
  col3_mean=mean(f.col3), 
  col4_mean=mean(f.col4)
)

  col1 col2  n  mean_col3  mean_col4
0    A    C  3  -0.516402   0.468454
1    A    D  2  -0.248848   0.979655
2    B    C  2   0.545518  -0.966536
3    B    D  3  -0.349836  -0.915293
[Groups: ['col1'] (n=2)]

我是数据包的作者。如果您对使用它有任何问题，请随时提交问题。

2021-04-29 17:51:36

其他回答

在groupby对象上，agg函数可以接受一个列表，以便一次应用多个聚合方法。这应该会给你你需要的结果:

df[['col1', 'col2', 'col3', 'col4']].groupby(['col1', 'col2']).agg(['mean', 'count'])

2013-10-15 15:49:28

如果你熟悉tidyverse R包，这里有一种在python中实现它的方法:

from datar.all import tibble, rnorm, f, group_by, summarise, mean, n, rep

df = tibble(
  col1=rep(['A', 'B'], 5), 
  col2=rep(['C', 'D'], each=5), 
  col3=rnorm(10), 
  col4=rnorm(10)
)
df >> group_by(f.col1, f.col2) >> summarise(
  count=n(),
  col3_mean=mean(f.col3), 
  col4_mean=mean(f.col4)
)

  col1 col2  n  mean_col3  mean_col4
0    A    C  3  -0.516402   0.468454
1    A    D  2  -0.248848   0.979655
2    B    C  2   0.545518  -0.966536
3    B    D  3  -0.349836  -0.915293
[Groups: ['col1'] (n=2)]

我是数据包的作者。如果您对使用它有任何问题，请随时提交问题。

2021-04-29 17:51:36

请试试这段代码

new_column=df[['col1', 'col2', 'col3', 'col4']].groupby(['col1', 'col2']).count()
df['count_it']=new_column
df

我认为代码将添加一个名为“计数它”的列，计数每组

2020-02-08 01:34:26

另一个选择:

import pandas as pd
import numpy as np

df = pd.DataFrame({'A' : ['foo', 'bar', 'foo', 'bar',
                          'foo', 'bar', 'foo', 'foo'],
                   'B' : ['one', 'one', 'two', 'three',
                          'two', 'two', 'one', 'three'],
                   'C' : np.random.randn(8),
                   'D' : np.random.randn(8)})
df

    A   B       C           D
0   foo one   0.808197   2.057923
1   bar one   0.330835  -0.815545
2   foo two  -1.664960  -2.372025
3   bar three 0.034224   0.825633
4   foo two   1.131271  -0.984838
5   bar two   2.961694  -1.122788
6   foo one   -0.054695  0.503555
7   foo three 0.018052  -0.746912

pd.crosstab(df.A, df.B).stack().reset_index(name='count')

输出:

    A   B     count
0   bar one     1
1   bar three   1
2   bar two     1
3   foo one     2
4   foo three   1
5   foo two     2

2023-01-05 18:28:46

要获得多个统计信息，请折叠索引，并保留列名:

df = df.groupby(['col1','col2']).agg(['mean', 'count'])
df.columns = [ ' '.join(str(i) for i in col) for col in df.columns]
df.reset_index(inplace=True)
df

生产:

2019-11-13 01:31:03

使用熊猫GroupBy?

推荐文章

最新文章

标签