There are lots of operations with pandas that will be really useful to you, but don’t fall into any distinct category. Let’s show them here in this lecture:
import pandas as pddf = pd.DataFrame({'col1':[1,2,3,4],'col2':[444,555,666,444],'col3':['abc','def','ghi','xyz']})df.head()
col1
col2
col3
0
1
444
abc
1
2
555
def
2
3
666
ghi
3
4
444
xyz
Info on Unique Values
df['col2'].unique()
array([444, 555, 666])
df['col2'].nunique()
3
df['col2'].value_counts()
444 2
555 1
666 1
Name: col2, dtype: int64
Selecting Data
#Select from DataFrame using criteria from multiple columnsnewdf = df[(df['col1']>2) & (df['col2']==444)]
newdf
col1
col2
col3
3
4
444
xyz
Applying Functions
def times2(x):return x*2
df['col1'].apply(times2)
0 2
1 4
2 6
3 8
Name: col1, dtype: int64
df['col3'].apply(len)
0 3
1 3
2 3
3 3
Name: col3, dtype: int64
df['col1'].sum()
10
Permanently Removing a Column
del df['col1']
df
col2
col3
0
444
abc
1
555
def
2
666
ghi
3
444
xyz
Get column and index names:
df.columns
Index(['col2', 'col3'], dtype='object')
df.index
RangeIndex(start=0, stop=4, step=1)
Sorting and Ordering a DataFrame:
df
col2
col3
0
444
abc
1
555
def
2
666
ghi
3
444
xyz
df.sort_values(by='col2') #inplace=False by default