When browsing the C++ Standard Library for how to convert UTF-8 text to C wide character text wchar_t[] and vice versa one will be surprised to find that for such a common problem there is no built-in solution available. It seems one has to resort to the services of the operating system and write non-portable code; e.g. for the Objective-C runtime:
NSString* intrnl = [NSString stringWithContentsOfFile:path
encoding:NSUTF8StringEncoding
error:&e];
std::wstring wideTxt((wchar_t*)[[intrnl dataUsingEncoding:NSUTF32StringEncoding] bytes]
, [asData length] / sizeof(wchar_t));
Surprisingly a web search too does not reveal many light-weight and elegant alternatives. Either they require two separate copies of the same text in UTF-8 and Unicode like UtfConverter, they work just in one direction as utf8::ostream does, or like Poco::UnicodeConverter they stop at UTF-16 which is the wide character encoding on Windows.
When using the Boost C++ libraries however, UTF-8 conversion can be performed by just adding a single line of code after creating your input or output streams while relying on a thoroughly tested code-base.
Regrettably the conversion code is not header-only and thus requires at least one Boost library (e.g. serialization) to be built on your platform. If you think that this is an overkill, you can just grab the libs/detail/utf8_codecvt_facet.cpp file and add it to your compilation items of your target. This is what this post is about.
I found that with Boost 1.44 the original Hello World example published by Paul Dixon did not work as advertised: The linker kept complaining about the missing vtable of the utf8_codecvt_facet object which I found is caused by a missing definition of one of its methods. I guess that this on the other hand was the effect of non-matching namespaces in the header and implementation files. Since it got away when I removed all namespace macros in the original files as follows:
main.cpp:
#include <iostream>
#include <fstream>
#include <locale>
#include <string>
#if defined LINK_BOOST_SERIALIZATION_LIB
#include <boost/archive/detail/utf8_codecvt_facet.hpp>
#endif
#include "utf8_codecvt_facet.hpp"
using namespace std;
int main (int argc, char * const argv[])
{
wifstream inFile("utf8.txt");
#if defined LINK_BOOST_SERIALIZATION_LIB
inFile.imbue(std::locale(std::locale(), new boost::archive::detail::utf8_codecvt_facet));
#endif
inFile.imbue(std::locale(std::locale(), new boost::utf8_codecvt_facet));
wstring wideString;
inFile >> wideString;
cout << "widestring.length()" << wideString.length() << endl;
wstring line;
while getline(inFile, line) {
wcout << line;
}
return 0;
}
utf8_codecvt_facet.hpp:
utf8_codecvt_facet.cpp:

