Skip to content Skip to sidebar Skip to footer

Python Sklearn Randomforestclassifier Non-reproducible Results

I've been using sklearn's random forest, and I've tried to compare several models. Then I noticed that random-forest is giving different results even with the same seed. I tried it

Solution 1:

First make sure that you have the latest versions of the needed modules(e.g. scipy, numpy etc). When you type random.seed(1234), you use the numpy generator.


When you use random_state parameter inside the RandomForestClassifier, there are several options: int, RandomState instance or None.


From the docs here :

  • If int, random_state is the seed used by the random number generator;

  • If RandomState instance, random_state is the random number generator;

  • If None, the random number generator is the RandomState instance used by np.random.


A way to use the same generator in both cases is the following. I use the same (numpy) generator in both cases and I get reproducible results (same results in both cases).

from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import make_classification
from numpy import *

X, y = make_classification(n_samples=1000, n_features=4,
                       n_informative=2, n_redundant=0,
                       random_state=0, shuffle=False)

random.seed(1234)
clf = RandomForestClassifier(max_depth=2)
clf.fit(X, y)

clf2 = RandomForestClassifier(max_depth=2, random_state = random.seed(1234))
clf2.fit(X, y)

Check if the results are the same:

all(clf.predict(X) == clf2.predict(X))
#True

Check after running the same code for 5 times:

from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import make_classification
from numpy import *

for i in range(5):

    X, y = make_classification(n_samples=1000, n_features=4,
                       n_informative=2, n_redundant=0,
                       random_state=0, shuffle=False)

    random.seed(1234)
    clf = RandomForestClassifier(max_depth=2)
    clf.fit(X, y)

    clf2 = RandomForestClassifier(max_depth=2, random_state = random.seed(1234))
    clf2.fit(X, y)

    print(all(clf.predict(X) == clf2.predict(X)))

Results:

TrueTrueTrueTrueTrue

Solution 2:

Ok, what solved it eventually, is reinstalling the conda environment. I'm still not sure why the different results happened. Thanks

Post a Comment for "Python Sklearn Randomforestclassifier Non-reproducible Results"