Python Sklearn - Determine The Encoding Order Of Labelencoder
Solution 1:
You cannot do that in original one.
LabelEncoder.fit()
uses numpy.unique which will always return the data as sorted, as given in source:
deffit(...):
y = column_or_1d(y, warn=True)
self.classes_ = np.unique(y)
return self
So if you want to do that, you need to override the fit()
function. Something like this:
import pandas as pd
from sklearn.preprocessing import LabelEncoder
from sklearn.utils import column_or_1d
classMyLabelEncoder(LabelEncoder):
deffit(self, y):
y = column_or_1d(y, warn=True)
self.classes_ = pd.Series(y).unique()
return self
Then you can do this:
le = MyLabelEncoder()
le.fit(['b', 'a', 'c', 'd' ])
le.classes_
#Output: array(['b', 'a', 'c', 'd'], dtype=object)
Here, I am using pandas.Series.unique(), to get unique classes. If you cannot use pandas for any reason, refer to this question which does this question using numpy:
Solution 2:
Note, there is potentially a better way to do this now with http://contrib.scikit-learn.org/categorical-encoding/ordinal.html. In particular, see the mapping
parameter:
a mapping of class to label to use for the encoding, optional. the dict contains the keys ‘col’ and ‘mapping’. the value of ‘col’ should be the feature name. the value of ‘mapping’ should be a dictionary of ‘original_label’ to ‘encoded_label’. example mapping: [{‘col’: ‘col1’, ‘mapping’: {None: 0, ‘a’: 1, ‘b’: 2}}]
Solution 3:
Vivek Kumar solution worked for me, but had to do it this way
classLabelEncoder(LabelEncoder):
deffit(self, y):
y = column_or_1d(y, warn=True)
self.classes_ = pd.Series(y).unique().sort()
return self
Solution 4:
NOTE :: This is not a standard way but a hacky approach I used 'classes_' attribute to customize my mapping
from sklearn import preprocessing
le_temp = preprocessing.LabelEncoder()
le_temp = le_temp.fit(df_1['Temp'])
print(df_1['Temp'])
le_temp.classes_ = np.array(['Cool', 'Mild','Hot'])
print("New classes sequence::",le_temp.classes_)
df_1['Temp'] = le_temp.transform(df_1['Temp'])
print(df_1['Temp'])
My output Look like
1Hot2Hot3Hot4Mild5Cool6CoolName: Temp, dtype: objectNewclassessequence:: ['Cool''Mild''Hot']122232415060Name: Temp, dtype: int32
Post a Comment for "Python Sklearn - Determine The Encoding Order Of Labelencoder"