Standard C++ IOStreams and Locales: Advanced Programmer's Guide and Reference

Author: Angelika Langer, Klaus Kreft
4.4
All Stack Overflow 9
This Year Stack Overflow 1
This Month Stack Overflow 1

Comments

by anonymous   2017-08-20

Personally I would go with this answer, but it might be possible to use a bit of streambuf magic to do this as the text is written to the stream. If you're really interested in doing this though, please take a look at Standard C++ IOStreams and Locales by Langer and Kreft, it's the bible of iostreams.

The following assumes that everything written to the buffer is to be translated, and that each full line can be translated completely:

std::string xgettext (std::string const & s)
{
  return s;
}

The following transbuf class overrides the "overflow" function and translates the buffer every time it sees a newline.

class transbuf : public std::streambuf {
public:
  transbuf (std::streambuf * realsb) : std::streambuf (), m_realsb (realsb)
    , m_buf () {}

  ~transbuf () {
    // ... flush  m_buf if necessary
  }

  virtual std::streambuf::int_type overflow (std::streambuf::int_type c) {
    m_buf.push_back (c);
    if (c == '\n') {
      // We have a complete line, translate it and write it to our stream:
      std::string transtext = xgettext (m_buf);
      for (std::string::const_iterator i = transtext.begin ()
        ; i != transtext.end ()
        ; ++i) {
        m_realsb->sputc (*i);
        // ... check that overflow returned the correct value...
      }
      m_buf = "";
    }
    return c;
  }    

  std::streambuf * get () { return m_realsb; }

  // data
private:
  std::streambuf * m_realsb;
  std::string m_buf;
};

And here's an example of how that might be used:

int main ()
{
  transbuf * buf = new transbuf (std::cout.rdbuf ());
  std::ostream trans (buf);

  trans << "Hello";  // Added to m_buf
  trans << " World"; // Added to m_buf
  trans << "\n";     // Causes m_buf to be written

  trans << "Added to buffer\neach new line causes\n"
           "the string to be translated\nand written" << std::endl;

  delete buf;
}    
by litb   2017-08-20

Beginner

Introductory, no previous programming experience

Introductory, with previous programming experience

* Not to be confused with C++ Primer Plus (Stephen Prata), with a significantly less favorable review.

Best practices


Intermediate


Advanced


Reference Style - All Levels

C++11/14 References:

  • The C++ Standard (INCITS/ISO/IEC 14882-2011) This, of course, is the final arbiter of all that is or isn't C++. Be aware, however, that it is intended purely as a reference for experienced users willing to devote considerable time and effort to its understanding. As usual, the first release was quite expensive ($300+ US), but it has now been released in electronic form for $60US.

  • The C++14 standard is available, but seemingly not in an economical form – directly from the ISO it costs 198 Swiss Francs (about $200 US). For most people, the final draft before standardization is more than adequate (and free). Many will prefer an even newer draft, documenting new features that are likely to be included in C++17.

  • Overview of the New C++ (C++11/14) (PDF only) (Scott Meyers) (updated for C++1y/C++14) These are the presentation materials (slides and some lecture notes) of a three-day training course offered by Scott Meyers, who's a highly respected author on C++. Even though the list of items is short, the quality is high.

  • The C++ Core Guidelines (C++11/14/17/…) (edited by Bjarne Stroustrup and Herb Sutter) is an evolving online document consisting of a set of guidelines for using modern C++ well. The guidelines are focused on relatively higher-level issues, such as interfaces, resource management, memory management and concurrency affecting application architecture and library design. The project was announced at CppCon'15 by Bjarne Stroustrup and others and welcomes contributions from the community. Most guidelines are supplemented with a rationale and examples as well as discussions of possible tool support. Many rules are designed specifically to be automatically checkable by static analysis tools.

  • The C++ Super-FAQ (Marshall Cline, Bjarne Stroustrup and others) is an effort by the Standard C++ Foundation to unify the C++ FAQs previously maintained individually by Marshall Cline and Bjarne Stroustrup and also incorporating new contributions. The items mostly address issues at an intermediate level and are often written with a humorous tone. Not all items might be fully up to date with the latest edition of the C++ standard yet.

  • cppreference.com (C++03/11/14/17/…) (initiated by Nate Kohl) is a wiki that summarizes the basic core-language features and has extensive documentation of the C++ standard library. The documentation is very precise but is easier to read than the official standard document and provides better navigation due to its wiki nature. The project documents all versions of the C++ standard and the site allows filtering the display for a specific version. The project was presented by Nate Kohl at CppCon'14.


Classics / Older

Note: Some information contained within these books may not be up-to-date or no longer considered best practice.

by anonymous   2017-08-20

I don't understand exactly what you're trying to accomplish here. User code shouldn't inherit from the streams themselves, as the streams are intended to provide a generalized locale specific conversion/"stringizing" facility. If you're trying to use an ostream which can write to a new buffer location (i.e. a gzip stream), then one should generally inherit from basic_streambuf instead, which allows you to use the existing iostream conversion facilities but will allow you to redirect their input/output.

If you want to learn the ins and outs of how iostream itself operates, the best book I've heard about the subject is Standard C++ IOStreams and Locales by Angelika Langer and Klaus Kreft. I can't myself vouch for the book because I have yet to get my copy (it is next on my list), but you can find several recommendations for it here on StackOverflow.

You also probably want to take a peek at boost::iostreams, which provides some helpers for anyone wishing to customize the behavior of the iostream system.

by anonymous   2017-08-20

C++ supports character encodings by means of std::locale and the facet std::codecvt. The general idea is that a locale object describes the aspects of the system that might vary from culture to culture, (human) language to language. These aspects are broken down into facets, which are template arguments that define how localization-dependent objects (include I/O streams) are constructed. When you read from an istream or write to a ostream, the actual writing of each character is filtered through the locale's facets. The facets cover not only encoding of Unicode types but such varied features as how large numbers are written (e.g. with commas or periods), currency, time, capitalization, and a slew of other details.

However just because the facilities exist to do encodings doesn't mean the standard library actually handles all encodings, nor does it make such code simple to do right. Even such basic things as the size of character you should be reading into (let alone the encoding part) is difficult, as wchar_t can be too small (mangling your data), or too large (wasting space), and the most common compilers (e.g. Visual C++ and Gnu C++) do differ on how big their implementation is. So you generally need to find external libraries to do the actual encoding.

  • iconv is generally acknowledge to be correct, but examples of how to bind it to the C++ mechanism are hard to find.
  • jla3ep mentions libICU, which is very thorough but the C++ API does not try to play nicely with the standard (As far as I can tell: you can scan the examples to see if you can do better.)

The most straightforward example I can find that covers all the bases, is from Boost's UTF-8 codecvt facet, with an example that specifically tries to encode UTF-8 (UCS4) for use by IO streams. It looks like this, though I don't suggest just copying it verbatim. It takes a little more digging in the source to understand it (and I don't claim to):

typedef wchar_t ucs4_t;

std::locale old_locale;
std::locale utf8_locale(old_locale,new utf8_codecvt_facet<ucs4_t>);

...

std::wifstream input_file("data.utf8");
input_file.imbue(utf8_locale);
ucs4_t item = 0;
while (ifs >> item) { ... }

To understand more about locales, and how they use facets (including codecvt), take a look at the following:

  • Nathan Myers has a thorough explanation of locales and facets. Myers was one of the designers of the locale concept. He has more formal documentation if you want to wade through it.
  • Apache's Standard Library implementation (formerly RogueWave's) has a full list of facets.
  • Nicolai Josuttis' The C++ Standard Library Chapter 14 is devoted to the subject.
  • Angelika Langer and Klaus Kreft's Standard C++ IOStreams and Locales devotes a whole book.
by anonymous   2017-08-20

In addition to the standard mandated encodings C++ also supports an implementation defined list of encodings via locales:

#include <locale>
#include <codecvt>
#include <iostream>

template <typename Facet>
struct usable_facet : Facet {
  using Facet::Facet;
};

using codecvt = usable_facet<std::codecvt_byname<wchar_t, char, std::mbstate_t>>;

int main() {
  std::wstring_convert<codecvt> convert(new codecvt(".1252")); // platform specific locale strings

  std::wstring w = convert.from_bytes("\u00C0");
}

Unfortunately one of the things about wchar_t is that the standard mandates only that it use a fixed width encoding for all locales, but there's no requirement that it use the same encoding in different locales, and so you can't portably convert to wchar_t using one locale and then convert that back to char using a different locale.

There is potentially some portable support for such conversions using functions like std::mbrtoc32 and related functions, but these are not yet widely implemented.

I understand that this can be done with a library such as iconv, but I am curious whether it can be done using only the C++ standard library. I ask this question not because I don't want to use iconv, but because I don't really understand how locales work in C++.

The locale library's design doesn't really lend itself to modern usage. C and C++ are themselves confused about encodings vs. character sets, and locales conflate lexical and orthographic issues with computational aspects such as encoding.

How locales work is a topic a bit broader than is suitable for a stackoverflow answer but there are books on the topic. You'd probably also need to read platform specific materials, because the standard doesn't really give any context for much of the functionality. For example the locale library supports message catalogues, but doesn't tell you what they are or how you'd actually make one because that's functionality is not standardized by C++.