Reading Russian Language Data From Csv
Solution 1:
\ea
is the windows-1251 / cp5347 encoding for к
. Therefore, you need to use windows-1251
decoding, not UTF-8.
In Python 2.7, the CSV library does not support Unicode properly - See "Unicode" in https://docs.python.org/2/library/csv.html
They propose a simple work around using:
classUnicodeReader:
"""
A CSV reader which will iterate over lines in the CSV file "f",
which is encoded in the given encoding.
"""def__init__(self, f, dialect=csv.excel, encoding="utf-8", **kwds):
f = UTF8Recoder(f, encoding)
self.reader = csv.reader(f, dialect=dialect, **kwds)
defnext(self):
row = self.reader.next()
return [unicode(s, "utf-8") for s in row]
def__iter__(self):
return self
This would allow you to do:
defloadCsv(filename):
lines = UnicodeReader(open(filename, "rb"), delimiter=";", encoding="windows-1251" )
# if you really need lists then uncomment the next line# this will let you do call exact lines by doing `line_12 = lines[12]`# return list(lines)# this will return an "iterator", so that the file is read on each call# use this if you'll do a `for x in x`return lines
If you try to print dataset
, then you'll get a representation of a list within a list, where the first list is rows, and the second list is colums. Any encoded bytes or literals will be represented with \x
or \u
. To print the values, do:
for csv_line in loadCsv("myfile.csv"):
printu", ".join(csv_line)
If you need to write your results to another file (fairly typical), you could do:
with io.open("my_output.txt", "w", encoding="utf-8") as my_ouput:
for csv_line in loadCsv("myfile.csv"):
my_output.write(u", ".join(csv_line))
This will automatically convert/encode your output to UTF-8.
Solution 2:
You cant try:
import pandas as pd
pd.read_csv(path_file , "cp1251")
or
import csv
withopen(path_file, encoding="cp1251", errors='ignore') as source_file:
reader = csv.reader(source_file, delimiter=",")
Solution 3:
Can your .csv be another encoding, not UTF-8? (considering error message, it even should be). Try other cyrillic encodings such as Windows-1251 or CP866 or KOI8.
Solution 4:
In py3:
import csv
path = 'C:/Users/me/Downloads/sv.csv'withopen(path, encoding="UTF8") as f:
reader = csv.reader(f)
for row in reader:
print(row)
Post a Comment for "Reading Russian Language Data From Csv"