Friday, 6 September 2013

Python: Remove soft hyphen(s)

Python: Remove soft hyphen(s)

in a html file, I've got words containing soft hyphens, e.g."Schilde
rung"; repr(word) = "Schilde\xc2\xadrung". How can I remove them? Since my
file also contains umlaute and other special chars, solutions with
printable or with words.decode('ascii', 'ignore') aren't terribly good...
I already tried it using words.replace('\xc2\xad', ''); but this didn't work.
Thanks for any help :)

No comments:

Post a Comment