import numpy as np
import pandas as pd
Missing Data
Let’s show a few convenient methods to deal with Missing Data in pandas:
= pd.DataFrame({'A':[1,2,np.nan],
df 'B':[5,np.nan,np.nan],
'C':[1,2,3]})
df
A | B | C | |
---|---|---|---|
0 | 1.0 | 5.0 | 1 |
1 | 2.0 | NaN | 2 |
2 | NaN | NaN | 3 |
df.dropna()
A | B | C | |
---|---|---|---|
0 | 1.0 | 5.0 | 1 |
=1) df.dropna(axis
C | |
---|---|
0 | 1 |
1 | 2 |
2 | 3 |
=2) df.dropna(thresh
A | B | C | |
---|---|---|---|
0 | 1.0 | 5.0 | 1 |
1 | 2.0 | NaN | 2 |
='FILL VALUE') df.fillna(value
A | B | C | |
---|---|---|---|
0 | 1 | 5 | 1 |
1 | 2 | FILL VALUE | 2 |
2 | FILL VALUE | FILL VALUE | 3 |
'A'].fillna(value=df['A'].mean()) df[
0 1.0
1 2.0
2 1.5
Name: A, dtype: float64