Why Strings Object Are Cached In Python
Solution 1:
Well there is a reason why modifying a string isn't goint to modify the second one.
Strings in python are immutable.
It's not exactly that strings are cached in python but the fact is that you can't change them. The python interpreter is able to optimize somewhat and reference two names to the same id.
In python, you're never actually editing a string directly. Look at this:
a = "fun"
a.capitalize()
print a
>> fun
The capitalize function will create a capitalized version of a
but won't change a
. One example is str.replace
. As you probably already noticed, to change a string using replace, you'll have to do something like this:
a = "fun"
a = a.replace("u", "a")
print a
>> fan
What you see here is that the name a
is being affected a pointer to "fun". On the second line, we're affecting a new id to a
and the old a
might get removed by the gc if there is no similar string.
What you have to understand is that since strings are immutable, python can safely have strings pointing to the same id. Since the string will never get modified. You cannot have a string that will get modified implicitely.
Also, you'll see that some other types like numbers are also immutable and will the same behaviour with ids. But don't be fooled by ids, because for some reason that I can't explain.
Any number bigger than 256 will receive different ids even though they point to the same value. And if I'm not mistaken, with bigger string the ids will be different too.
Note:
The id thing might also have different values when code is being evaluated inside a repl or a program itself. I remember there is a thing with code being optimized with code blocks. Which means that executing the code on different lines might be enough to prevent optimizations.
Here's an example in the REPL:
>>>a = '[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]'; b = '[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]'>>>id(a), id(b)
(4561897488, 4561897488)
>>>a = '[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]'>>>b = '[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]'>>>id(a), id(b)
(4561897416, 4561897632)
With numbers:
>>>a = 100000>>>b = 100000>>>id(a), id(b)
(140533800516256, 140533800516304)
>>>a = 100000; b = 100000>>>id(a), id(b)
(140533800516232, 140533800516232)
But executing the file as a python script will print because it executes the lines in the same code block (as far as I understand)
4406456232 4406456232
4406456232 4406456232
140219722644160 140219722644160
Solution 2:
The strings aren't cached - they're literally the same string.
See, strings are immutable in Python. Just like the number 1
is the same number 1
no matter where you write it in your code, the string "Hello"
is the same string no matter where you write it in your code.
Since it's immutable, you also can't change it in-place like you would a list or somesuch - for example, if you call list.reverse()
, it changes the original list, but if you call str.replace("a", "b")
, it returns a new string and the old string isn't affected (this is what it means to be immutable). Because you can't ever change that string, there's no point in Python having two different copies of "Hello"
when they both mean exactly the same thing and neither can ever change.
Edit - @Keeper has pointed out that there's a section of the Python FAQ detailing why strings are immutable and hence why they behave like this. Link
Solution 3:
String in python are not cached :)
a = 'a'
b = 'a'id(a) == id(b) = id('a') # True because share same constant object id('a')!
a = 'z'# it change 'a' but a is not referencing 'b' so you can not change bid(a) == id('z') # not a contains 'z' but since not related to b, b contains still 'a'!
You can do something like this to achieve what possible you like:
Thing(object): # Dummy object can store any field since it is Pythonpass
a = Thing()
a.str = 'a'
b = a
print b.str# return 'a' since reference to object is same!
a.str = 'b'print b.str# return 'b' since reference to object is same but value changed!
Post a Comment for "Why Strings Object Are Cached In Python"