Chardet: encoding auto detect for Python

Last week, I has been fighting against the hordes of the character encoding. My new task in tha job-tasks-pool is develop a friendly web app to manage configuration files of cluster apps. Ohh, great idea!, why didn’t I realize it before? (irony) . Only I need 4 things: one parser for each conffile syntax, a stable and secure way to get/save remote files, work with non-controlled kind of differents char encodings, … and a fancy GUI. As you can intuit, the task is not just a bit app, but this post only refer to a very useful python lib ( chardet discovered a few days ago. This lib can be auto-detect the encoding of a file very reliable. I suggest you that visit chardet homesite to see some clear examples.

Using it in a couple of lines of code:

import io
import chardet

s=io.open(‘channels_info’, ‘r’, errors=’replace’)
r=s.read()

# For example:
# fileencoding=”iso-8859-15″
fe = chardet.detect(r)[‘encoding’]
fl = fl.decode(fe)
print fl

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s