Skip to content Skip to sidebar Skip to footer

How To Display Chinese Characters Inside A Pandas Dataframe?

I can read a csv file in which there is a column containing Chinese characters (other columns are English and numbers). However, Chinese characters don't display correctly. see pho

Solution 1:

I just remembered that the source dataset was created using encoding='GBK', so I tried again using

data06_16 = pd.read_csv("../data/stocks1542monthly.csv", encoding="GBK")

Now, I can see all the Chinese characters.

Thanks guys!


Solution 2:

I see here three possible issues:

1) You can try this:

import codecs
x = codecs.open("testdata.csv", "r", "utf-8")

2) Another possibility can be theoretically this:

import pandas as pd
df = pd.DataFrame(pd.read_csv('testdata.csv',encoding='utf-8')) 

3) Maybe you should convert you csv file into utf-8 before importing with Python (for example in Notepad++)? It can be a solution for one-time-import, not for automatic process, of course.


Solution 3:

Try this

df = pd.read_csv(path, engine='python', encoding='utf-8-sig')

Solution 4:

A non-python relate answer. I just ran into this problem this afternoon and found that using Excel to import data from CSV can show us lots of encoding names. We can play with the encodings there and see which one fit our need. For instance, I found that in excel both gb2312 and gb18030 convert the data nicely from csv to xlsx. But only gb18030 works in Python.

pd.read_csv(in_path + 'XXX.csv', encoding='gb18030')

Anyway, this is not about how to import csv in Python, but rather to find the available encodings to try. enter image description here


Post a Comment for "How To Display Chinese Characters Inside A Pandas Dataframe?"