Python: How To Move A File With Unicode Filename To A Unicode Folder
Solution 1:
The basic problem is the unconverted mix between Unicode and byte strings. The solutions can be converting to a single format or avoiding the problems using some trickery. All of my solutions include the glob
and shutil
standard library.
For the sake of example, I have some Unicode filenames ending with ods
, and I want to move them to the subdirectory called א
(Hebrew Aleph, a unicode character).
First solution - express directory name as byte string:
>>>import glob>>>import shutil>>>files=glob.glob('*.ods') # List of Byte string file names>>>for file in files:... shutil.copy2(file, 'א') # Byte string directory name...
Second solution - convert the file names to Unicode:
>>>import glob>>>import shutil>>>files=glob.glob(u'*.ods') # List of Unicode file names>>>for file in files:... shutil.copy2(file, u'א') # Unicode directory name
Credit to the Ezio Melotti, Python bug list.
Third solution - avoiding destination Unicode directory name
Although this isn't the best solution in my opinion, there is a nice trick here that's worth mentioning.
Change your directory to the destination directory using os.getcwd()
, and then copy the files to it by referring to it as .
:
# -*- coding: utf-8 -*-import os
import shutil
import glob
os.chdir('א') # CD to the destination Unicode directoryprint os.getcwd() # DEBUG: Make sure you're in the right place
files=glob.glob('../*.ods') # List of Byte string file namesfor file in files:
shutil.copy2(file, '.') # Copy each file# Don't forget to go back to the original directory here, if it matters
Deeper explanation
The straightforward approach shutil.copy2(src, dest)
fails because shutil
concatenates a unicode with ASCII string without conversions:
>>>files=glob.glob('*.ods')>>>for file in files:... shutil.copy2(file, u'א')...
Traceback (most recent call last):
File "<stdin>", line 2, in <module>
File "/usr/lib/python2.6/shutil.py", line 98, in copy2
dst = os.path.join(dst, os.path.basename(src))
File "/usr/lib/python2.6/posixpath.py", line 70, in join
path += '/' + b
UnicodeDecodeError: 'ascii' codec can't decode byte 0xd7 in position 1:
ordinal not in range(128)
As seen before, this can be avoided when using 'א'
instead of the Unicode u'א'
It this a bug?
In my opinion, this is bug, because Python cannot expect basedir
names to be always str
, not unicode
. I have reported this as an issue in the Python buglist, and waiting for responses.
Further reading
Solution 2:
Use Unicode string everywhere:
# -*- coding: utf-8 -*-# source code ^^ encoding; it might be different from sys.getfilesystemencoding()import glob
import os
srcdir = u'مصدر الدليل'# <-- unicode string
dstdir = os.path.join('..', u'κατάλογο προορισμού') # relative pathfor path in glob.glob(os.path.join(srcdir, u'*.ext')):
newpath = os.path.join(dstdir, os.path.basename(path))
os.rename(path, newpath) # move file or directory; assume the same filesystem
There are many subtle details in moving files; see shutit.copy*
functions. You could use one that is appropriate in your particular case and remove source files on success e.g., via os.remove()
.
Post a Comment for "Python: How To Move A File With Unicode Filename To A Unicode Folder"