Hi! I’m clsung

Hi! I’m clsung

clsung’s blog site, or you can call me AlanSung

Hi! I’m clsung RSS Feed
 
 
 
 

string to/from wstring conversion

std::wstring s2ws(const std::string& s)
{
    std::wstring temp(s.length(),L‘ ‘);
    std::copy(s.begin(), s.end(), temp.begin());
    return temp;
}

std::string ws2s(const std::wstring& s)
{
    std::string temp(s.length(), ‘ ‘);
    std::copy(s.begin(), s.end(), temp.begin());
    return temp;
}

Update: 好心人提供的文章

Share and Enjoy:
  • Digg
  • Sphinn
  • del.icio.us
  • Facebook
  • Mixx
  • Google
  • funp
  • Haohao
  • Hemidemi
  • Live
  • MisterWong
  • TwitThis
  • udn

14 Responses to “string to/from wstring conversion”

  1. 1
    fr3@K:

    在 Win32 平台上, wchar_t 是 16-bits, 也就是說不是一個 wchar_t 就能對應到一個需要 20-bits 來表示的 unicode codepoint. Win32 的 std::wstring 所用的 encoding 是 UTF16, 因此有些 codepoint 是以兩個 element 來表示的.

    可以參考我以前寫的一篇 粗潛的討論.

  2. 2
    clsung:

    感謝!!

  3. 3
    fr3@K:

    容我再雞婆一下… 先把對非 BMP 的 codepoint 的支持擺一邊.

    std::wstring s2ws(const std::string& s)
    {
    return temp(s.begin(), s.end());
    }

    會是更好的寫法. 不但簡潔且更容易讓 compiler 做優化 (RVO). 也因為

    std::string

    的 iterator/const_iterator 是 random access iterator 的關係而不會有多次 allocation 的狀況. 更把原本的 n 個 copy-construct + n 個 assignment 簡化為 n 個 assignment.

  4. 4
    fr3@K:

    回應被吃掉了, 改貼在 我家, 請參考.

  5. 5
    clsung:

    救回來了 :)

  6. 6
    augustinus:

    I usually use ios::widen instead.
    And even Win32 wchar_t can be UTF-16LE, the conversion will be never correct without facet conversion for code point.

  7. 7
    augustinus:

    Oh, sorry, s/facet/codecvt/g actually.

    Since C++ codecvt doesn’t have standard implementation (only Boost does), one of the best reference is glib::ustring. However, considering glib’s license may be not easy to attach, UTF8-CPP is still your good friend.

  8. 8
    jeffhung:

    Why not using mbstowcs() and wcstombs() instead, which is more accurate?! Especially considering some multi-byte encoding such as Shift-JIS which as state information embedded within byte flow. If you prefer character-to-character conversion, please use mbrtowc() and wctomb(). The mbrtowc() function can be replaced by mbtowc() if we’re sure that the multi-byte encoding has no embedded state information.

  9. 9
    jeffhung:

    For variable-length C++ (w)strings, use mbstowcs() and wcstombs() to convert to wide-character string “chunk-by-chunk”.

    IMHO, C++ locale implementation are far from stable. We’d better use C locale instead.

    (the above comments will also post to http://fsfoundry.org/codefreak/2008/09/03/re-string-tofrom-wstring-conversion/)

  10. 10
    augustinus:

    About mbstowcs/wcstombs, my 2 cents are:

    - Although they seem to be portable because of , buffer overflow/underflow problems are still within them, especially on Win32.
    - Win32-specific version MultiByteToWideChar/WideCharToMultiByte may not be safer on overflow/underflow attacks, however, they have a special (strange) usage to evaluate buffer size with null pointer.
    - For both security and portability considerations, I will still suggest UTF8-CPP and sticking at UTF-8/UTF-16; to be prepared by converting all other native encodings beforehand is encouraged.

  11. 11
    augustinus:

    hum, <stdlib> was missing on my previous comment.

  12. 12
    augustinus:

    <cstdlib> actually.

  13. 13
    fr3@K:

    Another follow-up post.

  14. 14
    jeffhung:

    Actually, for mbs/wcs/u8s/gcs/tcs strings, I have all any2any() functions implemented, on top of libiconv, even on Windows. :-p

    Where…

    mbs stands for multi-byte string,
    wcs stands for wide-character string,
    u8s stands for UTF-8 string,
    gcs stands for generic-character string, that its actual character type comes from template parameter,
    and tcs stands for geneirc-mapping-text string with TCHAR character type.

    On Windows, the right libiconv version for Windows is required.

Leave a Reply

噗浪:

分類

Flickr

    clsung. Get yours at bighugelabs.com/flickr

Blogroll

    馬的警總回來了
Creative Commons Attribution-NonCommercial-ShareAlike 2.5 Taiwan
Creative Commons Attribution-NonCommercial-ShareAlike 2.5 Taiwan