Python Sklearn Randomforestclassifier Non-reproducible Results
Solution 1:
First make sure that you have the latest versions of the needed modules(e.g. scipy, numpy etc). When you type random.seed(1234)
, you use the numpy
generator.
When you use random_state
parameter inside the RandomForestClassifier
, there are several options: int, RandomState instance or None.
From the docs here :
If int, random_state is the seed used by the random number generator;
If RandomState instance, random_state is the random number generator;
If None, the random number generator is the RandomState instance used by np.random.
A way to use the same generator in both cases is the following. I use the same (numpy) generator in both cases and I get reproducible results (same results in both cases).
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import make_classification
from numpy import *
X, y = make_classification(n_samples=1000, n_features=4,
n_informative=2, n_redundant=0,
random_state=0, shuffle=False)
random.seed(1234)
clf = RandomForestClassifier(max_depth=2)
clf.fit(X, y)
clf2 = RandomForestClassifier(max_depth=2, random_state = random.seed(1234))
clf2.fit(X, y)
Check if the results are the same:
all(clf.predict(X) == clf2.predict(X))
#True
Check after running the same code for 5 times:
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import make_classification
from numpy import *
for i in range(5):
X, y = make_classification(n_samples=1000, n_features=4,
n_informative=2, n_redundant=0,
random_state=0, shuffle=False)
random.seed(1234)
clf = RandomForestClassifier(max_depth=2)
clf.fit(X, y)
clf2 = RandomForestClassifier(max_depth=2, random_state = random.seed(1234))
clf2.fit(X, y)
print(all(clf.predict(X) == clf2.predict(X)))
Results:
TrueTrueTrueTrueTrue
Solution 2:
Ok, what solved it eventually, is reinstalling the conda environment. I'm still not sure why the different results happened. Thanks
Post a Comment for "Python Sklearn Randomforestclassifier Non-reproducible Results"