Why One-Hot Encode Data in Machine Learning?
Updated:06/10/2021 by Computer Hope
Categorical data is a collection of information that is divided into groups. I.e, if an organisation or agency is trying to get a biodata of its employees, the resulting data is referred to as categorical. This data is called categorical because it may be grouped according to the variables present in the biodata such as sex, state of residence, etc.
A “pet” variable with the values: “dog” and “cat“.A “color” variable with the values: “red“, “green” and “blue“.A “place” variable with the values: “first”, “second” and “third“. One Hot Encoding
For example :Consider the data where fruits and their corresponding categorical value and prices are given.
Bike | Categorical value of Bike | Price |
---|
ktm | 1 | 100 |
---|
ninza | 3 | 200 |
---|
suzuki | 4 | 300 |
---|
The output after one hot encoding the data is given as follows,
ktm | ninza | Suzuki | price |
---|
1 | 0 | 0 | 100 |
---|
0 | 1 | 0 | 200 |
---|
0 | 0 | 1 | 300 |
---|