Skip to content Skip to sidebar Skip to footer

Why Python2 And Python3 Treat Same Windows Directory Differently?

My windows language is Chinese. To illustrate my point, I use package pathlib. from pathlib import * rootdir=Path(r'D:\新建文件夹') print(rootdir.exists()) Python2.7 I get Fa

Solution 1:

In Python 3 strings are Unicode by default. In Python 2, they are byte strings encoded in the source file encoding. Use a Unicode string in Python 2.

Also make sure to declare the source file encoding and make sure the source is saved in that encoding.

#coding:utf8from pathlib import *
rootdir=Path(ur'D:\新建文件夹')
print(rootdir.exists())

Solution 2:

The main difference between Python 2 and Python 3 is the basic types that exist to deal with texts and bytes. On Python 3 we have one text type: str which holds Unicode data and two byte types bytes and bytearray.

On the other hand on Python 2 we have two text types: str which for all intents and purposes is limited to ASCII + some undefined data above the 7 bit range, unicode which is equivalent to the Python 3 str type and one byte type bytearray which it inherited from Python 3.

Python 3 removed all codecs that don't go from bytes to Unicode or vice versa and removed the now useless .encode() method on bytes and .decode() method on strings.

More about this e.g. here.

Solution 3:

Use Unicode literals for Windows paths: add from __future__ import unicode_literals at the top.

Explanation

  1. r'D:\新建文件夹' is a bytestring on Python 2. Its specific value depends on the encoding declaration at the top (such as # -*- coding: utf-8 -*-). You should get an error without the declaration if you use non-ascii literal in Python 2. r'D:\新建文件夹' is a Unicode string on Python 3 and the default source code encoding is utf-8 (no encoding declaration is required)
  2. Python uses Unicode API when working with files on Windows if the input is Unicode and "ANSI" API if the input is bytes.

If the source code encoding differs from "ANSI" encoding (such as cp1252) then the result may differ: the bytes are passed as is (the same byte-sequence can represent different characters in different encodings). If the filename can't be represented in "ANSI" encoding (e.g., cp1252 -- a single byte encoding can't represent all Unicode characters -- there are around a million Unicode characters but only 256 bytes); the results may differ. Using Unicode strings for filenames on Windows fixes both issues.

Post a Comment for "Why Python2 And Python3 Treat Same Windows Directory Differently?"