Text encoding

belbernard

New Member
Email Extractor messes up all current character encodings.
I tried Unicode UTF8 (which should be the default setting), UTF16, ISO Latin and MacOS Roman: all resulted in garbage names when characters are not plain English, e.g.:
"Hélène Martin <helene.martin@somedomain>"
 

stanbusk

Administrator
Staff member
eMail Extractor is UTF-8 native, that means UTF-8 is is favorite encoding. However it is possible eMail Extractor is not detecting the file as such. Try to set your file encoding to UTF-8 with BOM or Mac OS Roman with Text Wrangler.
 

belbernard

New Member
stanbusk said:
eMail Extractor is UTF-8 native, that means UTF-8 is is favorite encoding. However it is possible eMail Extractor is not detecting the file as such. Try to set your file encoding to UTF-8 with BOM or Mac OS Roman with Text Wrangler.
This is an example of input that does not work:
http://sldr.org/doc/tmp/adr-utf8.txt.zip
(needs to be unzipped)

Using Smultron I checked that this input is perfect UTF-8. The text output reads as "ISO-Latin" and it has wrong encodings. Both Smultron and TextEdit refuse to recognize it as UTF-8. Excell also converts it to a table that has wrong encodings.
 
Top