pandas中的merge与groupby用法

2022-07-17 09:48:48

最近一直在做机器学习比赛,学习大神们的源码时发现这两个函数使用频繁,自己也是花了一阵子才搞明白,先草草记录下暂时在比赛中用到的,比赛结束后再细细整理。
1、gruopby

In [35]: df = pd.DataFrame({'key1':['a', 'a', 'b', 'b', 'a'],
    ...:                    'key2':['one', 'two', 'one', 'two', 'one'],
    ...:                    'data1':np.random.randn(5),^M
    ...:                    'data2':np.random.randn(5)})

In [36]: df
Out[36]:
  key1 key2     data1     data2
0    a  one -1.400763  0.494059
1    a  two  1.303229 -2.396705
2    b  one -0.482499 -1.590093
3    b  two -0.902582 -0.909068
4    a  one -0.628412  1.724196

In [100]: df.groupby(['key1','key2'],as_index=False)['key1'].agg({'TotalNumber':'count'})
Out[100]:
  key1 key2  TotalNumber
0    a  one            2
1    a  two            1
2    b  one            1
3    b  two            1

这里用到了key1,key2两个键值作为分组标准,然后对key1进行计数(比赛中用到了类似的)。
还有,agg函数也经常使用,常与groupby连用

2、merge合并

In [89]: left = pd.DataFrame({'key1':['foo','foo','bar'],'key2':['one','one','two'],'lval':[1,2,3]})

In [90]: right = pd.DataFrame({'key1':['foo','foo','bar','bar'],'key2':['one','one','one','two'],'rval':[4,5,6,7]})

In [91]: left
Out[91]:
  key1 key2  lval
0  foo  one     1
1  foo  one     2
2  bar  two     3

In [92]: right
Out[92]:
  key1 key2  rval
0  foo  one     4
1  foo  one     5
2  bar  one     6
3  bar  two     7

In [93]: left.merge(right,on=['key1','key2'],how='left')
Out[93]:
  key1 key2  lval  rval
0  foo  one     1     4
1  foo  one     1     5
2  foo  one     2     4
3  foo  one     2     5
4  bar  two     3     7

这里,用到了key1和key2两个键值作为合并依据,合并方式为left(左侧DataFrame取全部,右侧DataFrame取部分)

  • 作者:chandelierds
  • 原文链接:https://blog.csdn.net/chandelierds/article/details/84496372
    更新时间:2022-07-17 09:48:48