Project

General

Profile

German characters in a filename

Added by Стойчо Стефанов Stoycho Stefanov almost 12 years ago

Hi,

I'm trying to list a server's folder content using WStrings. German character as "ä","ö","ü" and "ß" are all converted to "?". When I try to create a WString using a boost::filesystem path:

boost::filesystem::path myPath;
...
WString(myPath.filename().string())

I get this error message:

[error] "WString: widen(): could not widen string: pöäüß"

It seems that the server prints correct the string returned by myPath.filename().string(), which is a non-UTF-8 coded sting (fromUTF8() exits with error "Invalid UTF-8 sequence"), but Wt cannot convert it.

The same error occurs when I specify the local encoding:

WString(myPath.filename().string(),std::locale());
WString(myPath.filename().string(),std::locale(""));
WString(myPath.filename().string(),std::locale("C"));
WString(myPath.filename().string(),LocalEncoding);

Any suggestions?!

Regards,

Stoycho


Replies (9)

RE: German characters in a filename - Added by Wim Dumon almost 12 years ago

I'm a bit surprised that neither of your WStrings with std::locale() parameter worked (especially the one with "").

I tried to find out what encoding boost.filesystem uses as default encoding for its paths, but I couldn't find it in the documentation - they do specify some in http://www.boost.org/doc/libs/1_50_0/libs/filesystem/doc/reference.html#path-Encoding-conversions but their rationale makes me wonder if the codecvt methods are used to convert the internal wstrings back and forth to windows API strings, or to convert the wstrings to strings when queried through the API, or both? In any case, it seems that you'll be getting some ANSI encoding from boost.filesystem, which is less desired than unicode.

I see two solutions:

Wim.

RE: German characters in a filename - Added by Стойчо Стефанов Stoycho Stefanov almost 12 years ago

Hey Wim,

in reverse order, with a UTF-8 codecvt facet I didn't achieve more:

[2012-Aug-01 10:42:22] [info] "DEBUG: WString(myPath.filename().string());"
[2012-Aug-01 10:42:22] [error] "WString: widen(): could not widen string: pöäüß"

boost::filesystem::detail::utf8_codecvt_facet utf8_codecvt;
[2012-Aug-01 10:42:22] [info] "DEBUG: WString(myPath.filename().string(utf8_codecvt));"
[2012-Aug-01 10:42:22] [error] "WString: widen(): could not widen string: pöäüß"

[2012-Aug-01 10:42:22] [info] "DEBUG: WString(myPath.filename().string(),std::locale(""));"
[2012-Aug-01 10:42:22] [error] "WString: widen(): could not widen string: pöäüß"

std::locale utf8_locale(std::locale(), new boost::filesystem::detail::utf8_codecvt_facet());
[2012-Aug-01 10:42:22] [info] "DEBUG: WString(myPath.filename().string(),utf8_locale));"
[2012-Aug-01 10:42:22] [error] "WString: widen(): could not widen string: pöäüß"

and filename().wstring() doesn't work at all:

[2012-Aug-01 10:42:22] myPath.filename().string() = pöäüß

[2012-Aug-01 10:35:44] [info] "DEBUG: WString(myPath.filename().wstring());"
[2012-Aug-01 10:35:44] [error] "Wt: fatal error: boost::filesystem::path codecvt to wstring: error"

boost::filesystem::detail::utf8_codecvt_facet utf8_codecvt;
[2012-Aug-01 10:35:44] [info] "DEBUG: WString(myPath.filename().wstring(utf8_codecvt));"
[2012-Aug-01 10:35:44] [error] "Wt: fatal error: boost::filesystem::path codecvt to wstring: error"

I don't have any idea what to do.

regards

Stoycho

RE: German characters in a filename - Added by Wim Dumon almost 12 years ago

Are you on Windows or Linux?

RE: German characters in a filename - Added by Стойчо Стефанов Stoycho Stefanov almost 12 years ago

Sorry,

I should start with this. I'm on embedded Linux.

RE: German characters in a filename - Added by Wim Dumon almost 12 years ago

Ah I assumed Windows - Linux is usually UTF-8.

It looks like your linux locale is not UTF-8. Do you know what it is? What's the output of 'locale' and 'locale -a' on your system? The environment variable LANG? The environment variable LC_CTYPE? If you print every character in filename().string() separately as ints, what list of numbers do you get?

Wim.

RE: German characters in a filename - Added by Стойчо Стефанов Stoycho Stefanov almost 12 years ago

Hey Wim,

yes, my local is definitely not UTF-8, otherwise WString(myPath.filename().string(),UTF8) shouldn't get "Invalid UTF-8 sequence" I suppose.

No, I don't know what it is and cannot find it out.

# locale
-sh: locale: not found
# locale -a
-sh: locale: not found
# echo $LANG

# echo $LC_CTYPE

#

Ok, it seems to be the ascii character map:

> filename().string() = abcd_öäüß
> a = 97
> b = 98
> c = 99
> d = 100
> _ = 95
> ö = 246
> ä = 228
> ü = 252
> ß = 223

Regards,

Styocho

RE: German characters in a filename - Added by Стойчо Стефанов Stoycho Stefanov almost 12 years ago

I meant Windows-Codepage 1252 as I wrote "ascii".

RE: German characters in a filename - Added by Wim Dumon almost 12 years ago

iso 8859-1 is probably more likely?

You could try to name the locale explicitly: std::locale("en_US.ISO-8859-1"). You may have to generate or install the locale on your embedded system, or use one that is installed.

Or if that still doesn't work, maybe convert all OS strings manually:

http://stackoverflow.com/questions/4059775/convert-iso-8859-1-strings-to-utf-8-in-c-c

If possible at all, change the locale of your system to UTF-8. It's so much easier to work with.

Wim.

RE: German characters in a filename - Added by Стойчо Стефанов Stoycho Stefanov almost 12 years ago

Hey Wim,

thanks for your help. I converted the OS strings manually into UTF-8 and than with WString::fromUTF8() I can display this special characters.

I have to find how to change the system local as you suggest, if possible at all. That would be the better solution.

Thanks once again!

Stoycho

    (1-9/9)