Last week, I has been fighting against the hordes of the character encoding. My new task in tha job-tasks-pool is develop a friendly web app to manage configuration files of cluster apps. Ohh, great idea!, why didn’t I realize it before? (irony) . Only I need 4 things: one parser for each conffile syntax, a stable and secure way to get/save remote files, work with non-controlled kind of differents char encodings, … and a fancy GUI. As you can intuit, the task is not just a bit app, but this post only refer to a very useful python lib ( chardet discovered a few days ago. This lib can be auto-detect the encoding of a file very reliable. I suggest you that visit chardet homesite to see some clear examples.

Using it in a couple of lines of code:

import io
import chardet

s=io.open(‘channels_info’, ‘r’, errors=’replace’)
r=s.read()

# For example:
# fileencoding=”iso-8859-15″
fe = chardet.detect(r)[‘encoding’]
fl = fl.decode(fe)
print fl

Leave a comment

:-)

I’m Pablo Saavedra, a former Unix systems administrator turned embedded software developer, now dedicated to squashing bugs and optimizing performance on embedded devices..

I’m degree  in Computer Science by Universade da Coruña (Spain).

Of course, my hobbies are anything similar to computers, but also boxing, fitness, good beers, … You can follow me on twitter or my linkedin profile,