import pandas as pd
import numpy as np
df = pd.DataFrame({'key1':['a', 'a', 'b', 'b', 'a'],
'key2':['one', 'two', 'one', 'two', 'one'],
'data1':np.random.randint(1,10,size=5),#返回5个1到10之间的整数,[1,10)
'data2':np.random.randint(1,8,size=5)})
df
得到df如下:
key1 key2 data1 data2
0 a one 4 7
1 a two 1 1
2 b one 8 6
3 b two 2 2
4 a one 5 2
一、对单列进行计算
计算后再重命名,推荐这种方法
#对单列进行计算
df_temp = df.groupby(['key1','key2']).agg({'data1':'min'})
#对data1列,计算分组后的最小值,名字还是data1,因此需要重命名
df_temp.rename(columns={'data1':'data1_min'},inplace=True)#修改列名
print(df_temp)
输出结果如下:
data1_min
key1 key2
a one 4
two 1
b one 8
two 2
二、对多列进行计算
#以多列进行计算
df_temp = df.groupby(['key1','key2']).agg({'data1':'min','data2':'max'})
df_temp.rename(columns={'data1':'data1_min','data2':'data2_max'},inplace=True)#修改列名
print(df_temp)
输出结果如下:
data1_min data2_max
key1 key2
a one 4 7
two 1 1
b one 8 6
two 2 2
三、对多列进行多个计算
df_temp2 = df.groupby(['key1','key2']).agg({'data1':['min','max'],'data2':['max','count']})#对data1列,取各组的最小值,名字还是data1
print(df_temp2)
df_temp2.columns = [i[0] + "_" + i[1] for i in df_temp2.columns] # 注意重命名方式
print(df_temp2)
输出结果如下:
data1 data2
min max max count
key1 key2
a one 1 2 2 2
two 2 2 2 1
b one 2 2 1 1
two 2 2 1 1
data1_min data1_max data2_max data2_count
key1 key2
a one 1 2 2 2
two 2 2 2 1
b one 2 2 1 1
two 2 2 1 1