Pymongo Error: Bson.errors.InvalidBSON: 'utf8' Codec Can't Decode Byte 0xa1 In Position 25: Invalid Start Byte
Solution 1:
You need to pass the unicode_decode_error_handler argument to MongoClient and use pymongo 3.5.1 at least.
import pymongo
import json
from pymongo import MongoClient
if __name__ == '__main__':
client = MongoClient(
host="whatever_your_host_is",
maxPoolSize=50,
unicode_decode_error_handler='ignore'
)
my_db=client['my_db']
collection=my_db['my_collection']
cursor = collection.find({"whatever": "some_stuff"})
for document in cursor:
print(document)
Looks like that 'ignore' is set by default on Python 2.7, but in Python 3.6.1 you have to do it yourself. This will ignore the Unicode errors and let the cursor continue iterating, pymongo will try to do its best to reconstruct the JSON data.
Solution 2:
I recently had a similar error message, and it is quite hard to find help about it.
Fast resolve
I solved my problem downgrading version of pymongo under 3.0. The changelog of pymongo advertise "A rewritten pure Python BSON implementation" at version 3.0. I found that the new implementation is having trouble managing python utf8 and unicode encoding when serializing in BSON format.
Analysing
It seams that the error comes from invalid bson from your DB ... similar to this. Maybe you should post your error there.
Solution 3:
I'm using Python 3.6, pymongo 3.4.0.
According to the documentation, you can clone a collection with the 'with_options' method, which does the trick for me:
col_article = col_article.with_options(codec_options = bson.CodecOptions(unicode_decode_error_handler="ignore"))
Post a Comment for "Pymongo Error: Bson.errors.InvalidBSON: 'utf8' Codec Can't Decode Byte 0xa1 In Position 25: Invalid Start Byte"