Preprocessing methods
Sources:
- https://towardsdatascience.com/all-about-categorical-variable-encoding-305f3361fd02
One Hot Encoding
- (-)
n
labels will create2^n
variables. - (+) interpretable
Hashing Encoding
- (+) considerably reduce the number of variables
- (-) interpretable, loose information
Classic Hash
- (-) collisions with unrelated labels
Local Sensitive Hashing (LSH)
- (+/-) close elements are considered similar
Target Encoding (Level Encoding)
- (+) great for high cardinality of categorical variables