string to/from wstring conversion
std
::wstring s2ws
(const std
::string& s
)
{
std
::wstring temp
(s.
length(),L
‘ ‘);
std
::copy(s.
begin(), s.
end(), temp.
begin());
return temp
;
}
std::string ws2s(const std::wstring& s)
{
std::string temp(s.length(), ‘ ‘);
std::copy(s.begin(), s.end(), temp.begin());
return temp;
}
Update: 好心人提供的文章
八月 29th, 2008 | Tags: c++ | Category: Programming
| Subscribe to comments | Leave a comment | Trackback URL
| Visited (1/2044) times
九月 3rd, 2008 at 12:32 am
在 Win32 平台上, wchar_t 是 16-bits, 也就是說不是一個 wchar_t 就能對應到一個需要 20-bits 來表示的 unicode codepoint. Win32 的 std::wstring 所用的 encoding 是 UTF16, 因此有些 codepoint 是以兩個 element 來表示的.
可以參考我以前寫的一篇 粗潛的討論.
九月 3rd, 2008 at 6:58 am
感謝!!
九月 3rd, 2008 at 11:20 am
容我再雞婆一下… 先把對非 BMP 的 codepoint 的支持擺一邊.
std::wstring s2ws(const std::string& s)
{
return temp(s.begin(), s.end());
}
會是更好的寫法. 不但簡潔且更容易讓 compiler 做優化 (RVO). 也因為
的 iterator/const_iterator 是 random access iterator 的關係而不會有多次 allocation 的狀況. 更把原本的 n 個 copy-construct + n 個 assignment 簡化為 n 個 assignment.
九月 3rd, 2008 at 4:27 pm
回應被吃掉了, 改貼在 我家, 請參考.
九月 3rd, 2008 at 4:31 pm
救回來了
九月 5th, 2008 at 9:36 am
I usually use ios::widen instead.
And even Win32 wchar_t can be UTF-16LE, the conversion will be never correct without facet conversion for code point.
九月 5th, 2008 at 9:42 am
Oh, sorry, s/facet/codecvt/g actually.
Since C++ codecvt doesn’t have standard implementation (only Boost does), one of the best reference is glib::ustring. However, considering glib’s license may be not easy to attach, UTF8-CPP is still your good friend.
九月 12th, 2008 at 5:28 pm
Why not using mbstowcs() and wcstombs() instead, which is more accurate?! Especially considering some multi-byte encoding such as Shift-JIS which as state information embedded within byte flow. If you prefer character-to-character conversion, please use mbrtowc() and wctomb(). The mbrtowc() function can be replaced by mbtowc() if we’re sure that the multi-byte encoding has no embedded state information.
九月 12th, 2008 at 5:30 pm
For variable-length C++ (w)strings, use mbstowcs() and wcstombs() to convert to wide-character string “chunk-by-chunk”.
IMHO, C++ locale implementation are far from stable. We’d better use C locale instead.
(the above comments will also post to http://fsfoundry.org/codefreak/2008/09/03/re-string-tofrom-wstring-conversion/)
九月 12th, 2008 at 7:13 pm
About mbstowcs/wcstombs, my 2 cents are:
- Although they seem to be portable because of , buffer overflow/underflow problems are still within them, especially on Win32.
- Win32-specific version MultiByteToWideChar/WideCharToMultiByte may not be safer on overflow/underflow attacks, however, they have a special (strange) usage to evaluate buffer size with null pointer.
- For both security and portability considerations, I will still suggest UTF8-CPP and sticking at UTF-8/UTF-16; to be prepared by converting all other native encodings beforehand is encouraged.
九月 12th, 2008 at 7:14 pm
hum, <stdlib> was missing on my previous comment.
九月 12th, 2008 at 7:14 pm
<cstdlib> actually.
九月 14th, 2008 at 5:23 pm
Another follow-up post.
九月 16th, 2008 at 1:54 pm
Actually, for mbs/wcs/u8s/gcs/tcs strings, I have all any2any() functions implemented, on top of libiconv, even on Windows. :-p
Where…
mbs stands for multi-byte string,
wcs stands for wide-character string,
u8s stands for UTF-8 string,
gcs stands for generic-character string, that its actual character type comes from template parameter,
and tcs stands for geneirc-mapping-text string with TCHAR character type.
On Windows, the right libiconv version for Windows is required.