阿新 • • 發佈:2021-01-12
- Ordinal型別:這種型別的Categorical存在著自然的順序結構,如果你對Ordinal 型別資料進行排序的話,可以是增序或者降序,比如在學習成績這個特徵中具體的值可能有:
- Nominal型別:這種是常規的Categorical型別,不能對Nominal型別資料進行排序。比如血型特徵可能的值有:
- Use pandas’ get_dummies() method to return a new DataFrame containing a new column for each dummy variable
- Use the concat() method to add these dummy columns back to the original DataFrame
- Then drop the original columns entirely using the drop method
- In case you are dealing with ordinal feature –> you map its values to 1, 2, 3, 4 or 3, 2, 1 or whatever if not already mapped. Ordinal feature means its values may be arranged in some order that makes logical sense. For example, you have a feature “Size” with alphanumeric values, let’s say “small, medium, big”; indeed “big” is bigger than “small”, you can compare those values and it will make sense. You map “small, medium, big” to 1, 2, 3 for example. Example in Titanic: Pclass is an ordinal feature: Pclass=1 is better than Pclass=3. Note that in this case Pclass feature is already mapped to 1, 2, 3 so you don’t have to do anything with it. You would have to map it if Pclass contained alphanumeric values like “high_class, medium_class, low_class”.
- In case you are dealing with categorical feature - you look at how much categories (possible values in that particular feature) do you have. If you have only 2 categories you map them to 0 and 1 or to -1 and that’s it. If you have more than 2 categories, you create dummy variables. Example in Titanic: Sex is a categorical variable with 2 categories - ‘male’ and ‘female’, you map them for example to 0 and 1, and that’s it. Note that it’s not ordinal because male is not
better nor worse than female, you can’t logically compare them. Now, Embarked is a categorical feature too, but it has 3 categories instead of just 2. You make dummy variables out of this feature. And make just 2, not 3, the 3rd one is redundant. Well this feature is redundant by itself but anyway.
Edit: following further discussion, there are cases when turning ordinal features to dummies may improve your score a bit. It’s hard to tell beforehand, so it should be usefeul to make 2 sets of features, one including ordinal data and the other with ordinal-to-one-hot data, compare the results on various models and pick the one that worked out best in your specific case.