Hi, I need to convert a CSV file from iso to UTF-8 to keep the accents in the database.
French accents (é,è,ê, and the like) are not kept when I try to translate them to UTF-8, they are changed to "?".
I'm stumped.
I use the following function for the translation:
public static string iso8859ToUnicode(string src) {
Encoding iso = Encoding.GetEncoding("iso8859-1");
Encoding unicode = Encoding.UTF8;
byte[] isoBytes = iso.GetBytes(src);
byte[] unibytes = Encoding.Convert(iso,unicode,isoBytes);
char[] unichars = new char[iso.GetCharCount(unibytes,0,unibytes.Length)];
unicode.GetChars(unibytes,0,unibytes.Length,unichars,0);
return new string(unichars);
}
But it doesn't seem to work well. Help?
-
you might be loosing your encoding when you declare the new string, or when you store the data in the char array
MrZombie : I shouldn't be losing the encoding that way, as I'm converting the iso to bytes, then the bytes to utf-8... Unless there is byte-level automatic character conversion that I'm not aware of, it shouldn't be the problem. -
Instead of the
GetChars()method, can't you just callunicode.GetString(unibytes); -
I strongly suspect that your original string doesn't have the correct values. My guess is that you've read it from the file as if it were UTF-8.
To convert between two encodings, you shouldn't have the string in the first place - you should basically load the bytes of the file and call
Encoding.Convert()that way. Alternatively, load the file using ISO-Latin-1 and just save it as UTF-8. For example:public static void ConvertLatin1ToUtf8(string inputFile, string outputFile) { Encoding latin1 = Encoding.GetEncoding(28591); string text = File.ReadAllText(inputFile, latin1); File.WriteAllText(outputFile, text, Encoding.UTF8); }or
public static void ConvertLatin1ToUtf8(string inputFile, string outputFile) { Encoding latin1 = Encoding.GetEncoding(28591); byte[] latinBytes = File.ReadAllBytes(inputFile); byte[] utf8Bytes = Encoding.Convert(latin1, Encoding.UTF8, latinBytes); File.WriteAllBytes(outputFile, utf8Bytes); }MrZombie : Thank you a million times and a half. Is it okay for me to hate encoding issues? :PJon Skeet : Only if I can hate time zone issues more :)
0 comments:
Post a Comment