问题是如何将wstring转换为字符串?

我还有一个例子:

#include <string>
#include <iostream>

int main()
{
    std::wstring ws = L"Hello";
    std::string s( ws.begin(), ws.end() );

  //std::cout <<"std::string =     "<<s<<std::endl;
    std::wcout<<"std::wstring =    "<<ws<<std::endl;
    std::cout <<"std::string =     "<<s<<std::endl;
}

带注释的输出为:

std::string =     Hello
std::wstring =    Hello
std::string =     Hello

但是without只是:

std::wstring =    Hello

这个例子中有什么问题吗?我可以像上面那样进行转换吗?

EDIT

新例子(考虑到一些答案)是

#include <string>
#include <iostream>
#include <sstream>
#include <locale>

int main()
{
    setlocale(LC_CTYPE, "");

    const std::wstring ws = L"Hello";
    const std::string s( ws.begin(), ws.end() );

    std::cout<<"std::string =     "<<s<<std::endl;
    std::wcout<<"std::wstring =    "<<ws<<std::endl;

    std::stringstream ss;
    ss << ws.c_str();
    std::cout<<"std::stringstream =     "<<ss.str()<<std::endl;
}

输出结果为:

std::string =     Hello
std::wstring =    Hello
std::stringstream =     0x860283c

因此,不能使用stringstream将wstring转换为string。


当前回答

我花了很多悲伤的日子,试图为c++ 17找到一种方法来做到这一点,它已经弃用了code_cvt facet,这是我通过组合来自几个不同来源的代码所能想到的最好的方法:

setlocale( LC_ALL, "en_US.UTF-8" ); //Invoked in main()

std::string wideToMultiByte( std::wstring const & wideString )
{
     std::string ret;
     std::string buff( MB_CUR_MAX, '\0' );

     for ( wchar_t const & wc : wideString )
     {
         int mbCharLen = std::wctomb( &buff[ 0 ], wc );

         if ( mbCharLen < 1 ) { break; }

         for ( int i = 0; i < mbCharLen; ++i ) 
         { 
             ret += buff[ i ]; 
         }
     }

     return ret;
 }

 std::wstring multiByteToWide( std::string const & multiByteString )
 {
     std::wstring ws( multiByteString.size(), L' ' );
     ws.resize( 
         std::mbstowcs( &ws[ 0 ], 
             multiByteString.c_str(), 
             multiByteString.size() ) );

     return ws;
 }

我在Windows 10上测试了这段代码,至少就我的目的而言,它似乎运行良好。如果这没有考虑到你可能需要处理的一些疯狂的边缘情况,请不要对我进行私刑,我相信有更多经验的人可以改进这一点!: -)

此外,在该表扬的地方表扬:

适用于wideToMultiByte()

复制multiByteToWide

其他回答

下面是一个基于其他建议的解决方案:

#include <string>
#include <iostream>
#include <clocale>
#include <locale>
#include <vector>

int main() {
  std::setlocale(LC_ALL, "");
  const std::wstring ws = L"ħëłlö";
  const std::locale locale("");
  typedef std::codecvt<wchar_t, char, std::mbstate_t> converter_type;
  const converter_type& converter = std::use_facet<converter_type>(locale);
  std::vector<char> to(ws.length() * converter.max_length());
  std::mbstate_t state;
  const wchar_t* from_next;
  char* to_next;
  const converter_type::result result = converter.out(state, ws.data(), ws.data() + ws.length(), from_next, &to[0], &to[0] + to.size(), to_next);
  if (result == converter_type::ok or result == converter_type::noconv) {
    const std::string s(&to[0], to_next);
    std::cout <<"std::string =     "<<s<<std::endl;
  }
}

这通常适用于Linux,但会在Windows上产生问题。

这个解决方案的灵感来自dk123的解决方案,但是使用了一个依赖于地区的codecvt facet。结果是区域编码的字符串而不是UTF-8(如果它没有设置为区域设置):

std::string w2s(const std::wstring &var)
{
   static std::locale loc("");
   auto &facet = std::use_facet<std::codecvt<wchar_t, char, std::mbstate_t>>(loc);
   return std::wstring_convert<std::remove_reference<decltype(facet)>::type, wchar_t>(&facet).to_bytes(var);
}

std::wstring s2w(const std::string &var)
{
   static std::locale loc("");
   auto &facet = std::use_facet<std::codecvt<wchar_t, char, std::mbstate_t>>(loc);
   return std::wstring_convert<std::remove_reference<decltype(facet)>::type, wchar_t>(&facet).from_bytes(var);
}

我一直在找,但我找不到。最后,我发现我可以从std::locale使用std::use_facet()函数与正确的typename获得正确的facet。希望这能有所帮助。

除了转换类型之外,还应该注意字符串的实际格式。

当编译多字节字符集Visual Studio和Win API时,假设UTF8(实际上是windows编码,即windows -28591)。 当为Unicode字符集Visual studio和Win API编译时,假设UTF16。

因此,您必须将字符串从UTF16转换为UTF8格式,而不仅仅是转换为std::string。 当使用多字符格式(如一些非拉丁语言)时,这将是必要的。

其思想是确定std::wstring总是表示UTF16。 std::string总是表示UTF8。

这不是由编译器强制执行的,这是一个更好的策略。 注意我用来定义UTF16 (L)和UTF8 (u8)的字符串前缀。

要在两种类型之间进行转换,您应该使用:std::codecvt_utf8_utf16< wchar_t>

#include <string>

#include <codecvt>

int main()
{

    std::string original8 = u8"הלו";

    std::wstring original16 = L"הלו";

    //C++11 format converter
    std::wstring_convert<std::codecvt_utf8_utf16<wchar_t>> convert;

    //convert to UTF8 and std::string
    std::string utf8NativeString = convert.to_bytes(original16);

    std::wstring utf16NativeString = convert.from_bytes(original8);

    assert(utf8NativeString == original8);
    assert(utf16NativeString == original16);

    return 0;
}

如果你正在处理文件路径(当我发现需要wstring-to-string时,我经常这样做),你可以使用filesystem::path(自c++ 17以来):

#include <filesystem>

const std::wstring wPath = GetPath(); // some function that returns wstring
const std::string path = std::filesystem::path(wPath).string();

代码有两个问题:

The conversion in const std::string s( ws.begin(), ws.end() ); is not required to correctly map the wide characters to their narrow counterpart. Most likely, each wide character will just be typecast to char. The resolution to this problem is already given in the answer by kem and involves the narrow function of the locale's ctype facet. You are writing output to both std::cout and std::wcout in the same program. Both cout and wcout are associated with the same stream (stdout) and the results of using the same stream both as a byte-oriented stream (as cout does) and a wide-oriented stream (as wcout does) are not defined. The best option is to avoid mixing narrow and wide output to the same (underlying) stream. For stdout/cout/wcout, you can try switching the orientation of stdout when switching between wide and narrow output (or vice versa): #include <iostream> #include <stdio.h> #include <wchar.h> int main() { std::cout << "narrow" << std::endl; fwide(stdout, 1); // switch to wide std::wcout << L"wide" << std::endl; fwide(stdout, -1); // switch to narrow std::cout << "narrow" << std::endl; fwide(stdout, 1); // switch to wide std::wcout << L"wide" << std::endl; }