Preprocessing methods
Sources:
- https://towardsdatascience.com/all-about-categorical-variable-encoding-305f3361fd02
One Hot Encoding
- (-)
nlabels will create2^nvariables. - (+) interpretable
Hashing Encoding

- (+) considerably reduce the number of variables
- (-) interpretable, loose information
Classic Hash
- (-) collisions with unrelated labels
Local Sensitive Hashing (LSH)
- (+/-) close elements are considered similar
Target Encoding (Level Encoding)
- (+) great for high cardinality of categorical variables