string to/from wstring conversion

2008/08/29
By
std::wstring s2ws(const std::string& s)
{
    std::wstring temp(s.length(),L‘ ‘);
    std::copy(s.begin(), s.end(), temp.begin());
    return temp;
}

std::string ws2s(const std::wstring& s)
{
    std::string temp(s.length(), ‘ ‘);
    std::copy(s.begin(), s.end(), temp.begin());
    return temp;
}

Update: 好心人提供的文章

Tags:

14 Responses to string to/from wstring conversion

  1. fr3@K on 2008/09/03 at 12:32 上午

    在 Win32 平台上, wchar_t 是 16-bits, 也就是說不是一個 wchar_t 就能對應到一個需要 20-bits 來表示的 unicode codepoint. Win32 的 std::wstring 所用的 encoding 是 UTF16, 因此有些 codepoint 是以兩個 element 來表示的.

    可以參考我以前寫的一篇 粗潛的討論.

  2. clsung on 2008/09/03 at 6:58 上午

    感謝!!

  3. fr3@K on 2008/09/03 at 11:20 上午

    容我再雞婆一下… 先把對非 BMP 的 codepoint 的支持擺一邊.

    std::wstring s2ws(const std::string& s)
    {
    return temp(s.begin(), s.end());
    }

    會是更好的寫法. 不但簡潔且更容易讓 compiler 做優化 (RVO). 也因為

    std::string

    的 iterator/const_iterator 是 random access iterator 的關係而不會有多次 allocation 的狀況. 更把原本的 n 個 copy-construct + n 個 assignment 簡化為 n 個 assignment.

  4. fr3@K on 2008/09/03 at 4:27 下午

    回應被吃掉了, 改貼在 我家, 請參考.

  5. clsung on 2008/09/03 at 4:31 下午

    救回來了 :)

  6. augustinus on 2008/09/05 at 9:36 上午

    I usually use ios::widen instead.
    And even Win32 wchar_t can be UTF-16LE, the conversion will be never correct without facet conversion for code point.

  7. augustinus on 2008/09/05 at 9:42 上午

    Oh, sorry, s/facet/codecvt/g actually.

    Since C++ codecvt doesn’t have standard implementation (only Boost does), one of the best reference is glib::ustring. However, considering glib’s license may be not easy to attach, UTF8-CPP is still your good friend.

  8. jeffhung on 2008/09/12 at 5:28 下午

    Why not using mbstowcs() and wcstombs() instead, which is more accurate?! Especially considering some multi-byte encoding such as Shift-JIS which as state information embedded within byte flow. If you prefer character-to-character conversion, please use mbrtowc() and wctomb(). The mbrtowc() function can be replaced by mbtowc() if we’re sure that the multi-byte encoding has no embedded state information.

  9. jeffhung on 2008/09/12 at 5:30 下午

    For variable-length C++ (w)strings, use mbstowcs() and wcstombs() to convert to wide-character string "chunk-by-chunk".

    IMHO, C++ locale implementation are far from stable. We’d better use C locale instead.

    (the above comments will also post to http://fsfoundry.org/codefreak/2008/09/03/re-string-tofrom-wstring-conversion/)

  10. augustinus on 2008/09/12 at 7:13 下午

    About mbstowcs/wcstombs, my 2 cents are:

    – Although they seem to be portable because of , buffer overflow/underflow problems are still within them, especially on Win32.
    – Win32-specific version MultiByteToWideChar/WideCharToMultiByte may not be safer on overflow/underflow attacks, however, they have a special (strange) usage to evaluate buffer size with null pointer.
    – For both security and portability considerations, I will still suggest UTF8-CPP and sticking at UTF-8/UTF-16; to be prepared by converting all other native encodings beforehand is encouraged.

  11. augustinus on 2008/09/12 at 7:14 下午

    hum, <stdlib> was missing on my previous comment.

  12. augustinus on 2008/09/12 at 7:14 下午

    <cstdlib> actually.

  13. fr3@K on 2008/09/14 at 5:23 下午

    Another follow-up post.

  14. jeffhung on 2008/09/16 at 1:54 下午

    Actually, for mbs/wcs/u8s/gcs/tcs strings, I have all any2any() functions implemented, on top of libiconv, even on Windows. :-p

    Where…

    mbs stands for multi-byte string,
    wcs stands for wide-character string,
    u8s stands for UTF-8 string,
    gcs stands for generic-character string, that its actual character type comes from template parameter,
    and tcs stands for geneirc-mapping-text string with TCHAR character type.

    On Windows, the right libiconv version for Windows is required.

發表迴響

您的電子郵件位址並不會被公開。 必要欄位標記為 *

*


*