python之pandas库的DataFrame — 数据对齐与缺失数据处理
目录
1.基本操作
DataFrame相加
DataFrame缺失值填充 fillna
DataFrame缺失值删除 dropna
2.高级操作
缺失值删除 dropna(how = ‘’)
当一行中全部为缺失值时删除整行
当一行中任意为缺失值时删除整行
当一行中全部为缺失值时删除整列
当一行中任意为缺失值时删除整列
代码实现
1.基本操作
import pandas as pd
import numpy as np
df1 = pd.DataFrame({'one':[1,2,3,4],'two':[5,6,7,8]},index=['a','b','c','d'])
df2 = pd.DataFrame({'one':[1,2,3,np.nan],'two':[5,6,7,8]},index=['d','c','b','a'])
df_add = df1 + df2
df_fill = df2.fillna(0) #填充缺失值
df_drop = df2.dropna() #一行中有一个缺失值时,会把整行删掉
print('df1=',df1,'\n')
print('df2=',df2,'\n')
print('df_add=',df_add,'\n')
print('df_fill=',df_fill,'\n')
print('df_drop=',df_drop,'\n')
df1= one two
a 1 5
b 2 6
c 3 7
d 4 8
df2= one two
d 1.0 5
c 2.0 6
b 3.0 7
a NaN 8
df_add= one two
a NaN 13
b 5.0 13
c 5.0 13
d 5.0 13
df_fill= one two
d 1.0 5
c 2.0 6
b 3.0 7
a 0.0 8
df_drop= one two
d 1.0 5
c 2.0 6
b 3.0 7
df3= one two
d 1.0 5.0
c 2.0 6.0
b 3.0 NaN
a NaN NaN
df3_drop= one two
d 1.0 5.0
c 2.0 6.0
b 3.0 NaN
df3_drop2= one two
d 1.0 5.0
c 2.0 6.0
2.高级操作
#缺失值删除的高级操作#
df3 = pd.DataFrame({'one':[1,2,3,np.nan],'two':[5,6,np.nan,np.nan]},index=['d','c','b','a'])
df3_drop = df3.dropna(how='all') #当一行中全部为缺失值时删除整行#
df3_drop2 = df3.dropna(how='any') #当一行中有任意缺失值时删除整行#
df4 = pd.DataFrame({'one':[1,2,3,4],'two':[5,6,np.nan,np.nan],'three':[np.nan,np.nan,np.nan,np.nan]},index=['d','c','b','a'])
df4_drop = df4.dropna(how='any',axis=1) #当一列中有任意缺失值时删除整行#
df4_drop2 = df4.dropna(how='all',axis=1) #当一列中全部为缺失值时删除整行#
print('df3=',df3,'\n')
print('df3_drop=',df3_drop,'\n')
print('df3_drop2=',df3_drop2,'\n')
print('df4=',df4,'\n')
print('df4_drop=',df4_drop,'\n')
print('df4_drop2=',df4_drop2,'\n')
df3= one two
d 1.0 5.0
c 2.0 6.0
b 3.0 NaN
a NaN NaN
df3_drop= one two
d 1.0 5.0
c 2.0 6.0
b 3.0 NaN
df3_drop2= one two
d 1.0 5.0
c 2.0 6.0
df4= one two three
d 1 5.0 NaN
c 2 6.0 NaN
b 3 NaN NaN
a 4 NaN NaN
df4_drop= one
d 1
c 2
b 3
a 4
df4_drop2= one two
d 1 5.0
c 2 6.0
b 3 NaN
a 4 NaN