Python Pandas Global Vs Passed Variable
Solution 1:
Local variables are faster to access than global variables in python.
In the context of pandas, this means you should be passing variables into functions where this makes sense (it means they can be found quicker inside the function). Conversely, function calls in python are expensive (if you are calling them lots), which is why numpy/pandas use vectorised functions where possible. Obviously, you have to be careful to ensure all your calculations are done inplace if your doing things inside a function.
I would usually get things working first, in "pythonic"/"pandastic" way, before worrying about speed. Then use %timeit
and see if it's fast enough already (usually it is). Add a unittest(s). Tweak for speed, %timeit
, %prun and %timeit
some more. If it's a big project vbench.
Solution 2:
You will need to profile it, but my guess is that if there is any significant difference at all, it is in favor of globals. The reference is still in memory, and no reference counting happens.
(EDIT: anyway, see @Andy Hayden's link about their relative access time, and the link here, which says that local variables are much faster).
The main consideration is that of "software engineering" - using global data is a bad idea, since it's hard to follow when and where it is being changed. Of course, if you can't fulfill the requirements (runtime) otherwise, then it has to be done; but in order to know it - measure first.
Anyway, I would recommend a different solution - keep this data inside a class. It will cost one more dictionary lookup (the first lookup is the variable name, and it happens anyway; the second is the lookup in the class dict), but it may be more efficient than passing around many objects, and will help the organization of your program.
Post a Comment for "Python Pandas Global Vs Passed Variable"