Convert Categorical Features to Integers with Scikit Learn
Machine Learning requires all the categorical features to be numbers. Often we need to convert the categorical text to integers. We can readily do this conversion using LabelEncoder in Scikit Learn Python package.
Take the dataframe as follows:
import pandas as pd
df = pd.DataFrame({"Name":['Alfred','Steve','Ally','Jane','Tony'],
"Gender":['Male','Male','Female','Female','Male'],
"Race":['Chinese','Malay','Chinese','Chinese','Malay'],
"Height": [170,172,153,161,180]})
df
Name Gender Race Height
0 Alfred Male Chinese 170
1 Steve Male Malay 172
2 Ally Female Chinese 153
3 Jane Female Chinese 161
4 Tony Male Malay 180
We can apply LabelEncoder to convert the Gender and Race columns to integers as follows
from sklearn.preprocessing import LabelEncoder
le = LabelEncoder()
categorical_cols = ['Gender','Race']
df[categorical_cols] = df[categorical_cols].apply(lambda col: le.fit_transform(col))
df
Name Gender Race Height
0 Alfred 1 0 170
1 Steve 1 1 172
2 Ally 0 0 153
3 Jane 0 0 161
4 Tony 1 1 180
References:
Relevant Courses
July 15, 2021