Standard C++ IOStreams and Locales: Advanced Programmer's Guide and Reference

Author: Angelika Langer, Klaus Kreft
4.4
All Stack Overflow 9
This Year Stack Overflow 4
This Month Stack Overflow 4

Comments

by anonymous   2017-08-20

Personally I would go with this answer, but it might be possible to use a bit of streambuf magic to do this as the text is written to the stream. If you're really interested in doing this though, please take a look at Standard C++ IOStreams and Locales by Langer and Kreft, it's the bible of iostreams.

The following assumes that everything written to the buffer is to be translated, and that each full line can be translated completely:

std::string xgettext (std::string const & s)
{
  return s;
}

The following transbuf class overrides the "overflow" function and translates the buffer every time it sees a newline.

class transbuf : public std::streambuf {
public:
  transbuf (std::streambuf * realsb) : std::streambuf (), m_realsb (realsb)
    , m_buf () {}

  ~transbuf () {
    // ... flush  m_buf if necessary
  }

  virtual std::streambuf::int_type overflow (std::streambuf::int_type c) {
    m_buf.push_back (c);
    if (c == '\n') {
      // We have a complete line, translate it and write it to our stream:
      std::string transtext = xgettext (m_buf);
      for (std::string::const_iterator i = transtext.begin ()
        ; i != transtext.end ()
        ; ++i) {
        m_realsb->sputc (*i);
        // ... check that overflow returned the correct value...
      }
      m_buf = "";
    }
    return c;
  }    

  std::streambuf * get () { return m_realsb; }

  // data
private:
  std::streambuf * m_realsb;
  std::string m_buf;
};

And here's an example of how that might be used:

int main ()
{
  transbuf * buf = new transbuf (std::cout.rdbuf ());
  std::ostream trans (buf);

  trans << "Hello";  // Added to m_buf
  trans << " World"; // Added to m_buf
  trans << "\n";     // Causes m_buf to be written

  trans << "Added to buffer\neach new line causes\n"
           "the string to be translated\nand written" << std::endl;

  delete buf;
}    
by litb   2017-08-20

Beginner

Introductory, no previous programming experience

  • Programming: Principles and Practice Using C++ (Bjarne Stroustrup) (updated for C++11/C++14) An introduction to programming using C++ by the creator of the language. A good read, that assumes no previous programming experience, but is not only for beginners.

Introductory, with previous programming experience

  • C++ Primer * (Stanley Lippman, Josée Lajoie, and Barbara E. Moo) (updated for C++11) Coming at 1k pages, this is a very thorough introduction into C++ that covers just about everything in the language in a very accessible format and in great detail. The fifth edition (released August 16, 2012) covers C++11. [Review]

  • A Tour of C++ (Bjarne Stroustrup) (EBOOK) The “tour” is a quick (about 180 pages and 14 chapters) tutorial overview of all of standard C++ (language and standard library, and using C++11) at a moderately high level for people who already know C++ or at least are experienced programmers. This book is an extended version of the material that constitutes Chapters 2-5 of The C++ Programming Language, 4th edition.

  • Accelerated C++ (Andrew Koenig and Barbara Moo) This basically covers the same ground as the C++ Primer, but does so on a fourth of its space. This is largely because it does not attempt to be an introduction to programming, but an introduction to C++ for people who've previously programmed in some other language. It has a steeper learning curve, but, for those who can cope with this, it is a very compact introduction into the language. (Historically, it broke new ground by being the first beginner's book to use a modern approach at teaching the language.) [Review]

  • Thinking in C++ (Bruce Eckel) Two volumes; is a tutorial style free set of intro level books. Downloads: vol 1, vol 2. Unfortunately they’re marred by a number of trivial errors (e.g. maintaining that temporaries are automatically const), with no official errata list. A partial 3rd party errata list is available at (http://www.computersciencelab.com/Eckel.htm), but it’s apparently not maintained.

* Not to be confused with C++ Primer Plus (Stephen Prata), with a significantly less favorable review.

Best practices

  • Effective C++ (Scott Meyers) This was written with the aim of being the best second book C++ programmers should read, and it succeeded. Earlier editions were aimed at programmers coming from C, the third edition changes this and targets programmers coming from languages like Java. It presents ~50 easy-to-remember rules of thumb along with their rationale in a very accessible (and enjoyable) style. For C++11 and C++14 the examples and a few issues are outdated and Effective Modern C++ should be preferred. [Review]

  • Effective Modern C++ (Scott Meyers) This is basically the new version of Effective C++, aimed at C++ programmers making the transition from C++03 to C++11 and C++14.

  • Effective STL (Scott Meyers) This aims to do the same to the part of the standard library coming from the STL what Effective C++ did to the language as a whole: It presents rules of thumb along with their rationale. [Review]

Intermediate

  • More Effective C++ (Scott Meyers) Even more rules of thumb than Effective C++. Not as important as the ones in the first book, but still good to know.

  • Exceptional C++ (Herb Sutter) Presented as a set of puzzles, this has one of the best and thorough discussions of the proper resource management and exception safety in C++ through Resource Acquisition is Initialization (RAII) in addition to in-depth coverage of a variety of other topics including the pimpl idiom, name lookup, good class design, and the C++ memory model. [Review]

  • More Exceptional C++ (Herb Sutter) Covers additional exception safety topics not covered in Exceptional C++, in addition to discussion of effective object oriented programming in C++ and correct use of the STL. [Review]

  • Exceptional C++ Style (Herb Sutter) Discusses generic programming, optimization, and resource management; this book also has an excellent exposition of how to write modular code in C++ by using nonmember functions and the single responsibility principle. [Review]

  • C++ Coding Standards (Herb Sutter and Andrei Alexandrescu) “Coding standards” here doesn't mean “how many spaces should I indent my code?” This book contains 101 best practices, idioms, and common pitfalls that can help you to write correct, understandable, and efficient C++ code. [Review]

  • C++ Templates: The Complete Guide (David Vandevoorde and Nicolai M. Josuttis) This is the book about templates as they existed before C++11. It covers everything from the very basics to some of the most advanced template metaprogramming and explains every detail of how templates work (both conceptually and at how they are implemented) and discusses many common pitfalls. Has excellent summaries of the One Definition Rule (ODR) and overload resolution in the appendices. A second edition is scheduled for 2017. [Review]


Advanced

  • Modern C++ Design (Andrei Alexandrescu) A groundbreaking book on advanced generic programming techniques. Introduces policy-based design, type lists, and fundamental generic programming idioms then explains how many useful design patterns (including small object allocators, functors, factories, visitors, and multimethods) can be implemented efficiently, modularly, and cleanly using generic programming. [Review]

  • C++ Template Metaprogramming (David Abrahams and Aleksey Gurtovoy)

  • C++ Concurrency In Action (Anthony Williams) A book covering C++11 concurrency support including the thread library, the atomics library, the C++ memory model, locks and mutexes, as well as issues of designing and debugging multithreaded applications.

  • Advanced C++ Metaprogramming (Davide Di Gennaro) A pre-C++11 manual of TMP techniques, focused more on practice than theory. There are a ton of snippets in this book, some of which are made obsolete by typetraits, but the techniques, are nonetheless useful to know. If you can put up with the quirky formatting/editing, it is easier to read than Alexandrescu, and arguably, more rewarding. For more experienced developers, there is a good chance that you may pick up something about a dark corner of C++ (a quirk) that usually only comes about through extensive experience.


Reference Style - All Levels

  • The C++ Programming Language (Bjarne Stroustrup) (updated for C++11) The classic introduction to C++ by its creator. Written to parallel the classic K&R, this indeed reads very much alike it and covers just about everything from the core language to the standard library, to programming paradigms to the language's philosophy. [Review]

  • C++ Standard Library Tutorial and Reference (Nicolai Josuttis) (updated for C++11) The introduction and reference for the C++ Standard Library. The second edition (released on April 9, 2012) covers C++11. [Review]

  • The C++ IO Streams and Locales (Angelika Langer and Klaus Kreft) There's very little to say about this book except that, if you want to know anything about streams and locales, then this is the one place to find definitive answers. [Review]

C++11/14 References:

  • The C++ Standard (INCITS/ISO/IEC 14882-2011) This, of course, is the final arbiter of all that is or isn't C++. Be aware, however, that it is intended purely as a reference for experienced users willing to devote considerable time and effort to its understanding. As usual, the first release was quite expensive ($300+ US), but it has now been released in electronic form for $60US.

  • The C++14 standard is available, but seemingly not in an economical form – directly from the ISO it costs 198 Swiss Francs (about $200 US). For most people, the final draft before standardization is more than adequate (and free). Many will prefer an even newer draft, documenting new features that are likely to be included in C++17.

  • Overview of the New C++ (C++11/14) (PDF only) (Scott Meyers) (updated for C++1y/C++14) These are the presentation materials (slides and some lecture notes) of a three-day training course offered by Scott Meyers, who's a highly respected author on C++. Even though the list of items is short, the quality is high.

  • The C++ Core Guidelines (C++11/14/17/…) (edited by Bjarne Stroustrup and Herb Sutter) is an evolving online document consisting of a set of guidelines for using modern C++ well. The guidelines are focused on relatively higher-level issues, such as interfaces, resource management, memory management and concurrency affecting application architecture and library design. The project was announced at CppCon'15 by Bjarne Stroustrup and others and welcomes contributions from the community. Most guidelines are supplemented with a rationale and examples as well as discussions of possible tool support. Many rules are designed specifically to be automatically checkable by static analysis tools.

  • The C++ Super-FAQ (Marshall Cline, Bjarne Stroustrup and others) is an effort by the Standard C++ Foundation to unify the C++ FAQs previously maintained individually by Marshall Cline and Bjarne Stroustrup and also incorporating new contributions. The items mostly address issues at an intermediate level and are often written with a humorous tone. Not all items might be fully up to date with the latest edition of the C++ standard yet.

  • cppreference.com (C++03/11/14/17/…) (initiated by Nate Kohl) is a wiki that summarizes the basic core-language features and has extensive documentation of the C++ standard library. The documentation is very precise but is easier to read than the official standard document and provides better navigation due to its wiki nature. The project documents all versions of the C++ standard and the site allows filtering the display for a specific version. The project was presented by Nate Kohl at CppCon'14.


Classics / Older

Note: Some information contained within these books may not be up-to-date or no longer considered best practice.

  • The Design and Evolution of C++ (Bjarne Stroustrup) If you want to know why the language is the way it is, this book is where you find answers. This covers everything before the standardization of C++.

  • Ruminations on C++ - (Andrew Koenig and Barbara Moo) [Review]

  • Advanced C++ Programming Styles and Idioms (James Coplien) A predecessor of the pattern movement, it describes many C++-specific “idioms”. It's certainly a very good book and might still be worth a read if you can spare the time, but quite old and not up-to-date with current C++.

  • Large Scale C++ Software Design (John Lakos) Lakos explains techniques to manage very big C++ software projects. Certainly a good read, if it only was up to date. It was written long before C++98, and misses on many features (e.g. namespaces) important for large scale projects. If you need to work in a big C++ software project, you might want to read it, although you need to take more than a grain of salt with it. The first volume of a new edition is expected in 2015.

  • Inside the C++ Object Model (Stanley Lippman) If you want to know how virtual member functions are commonly implemented and how base objects are commonly laid out in memory in a multi-inheritance scenario, and how all this affects performance, this is where you will find thorough discussions of such topics.

  • The Annotated C++ Reference Manual (Bjarne Stroustrup, Margaret A. Ellis) This book is quite outdated in the fact that it explores the 1989 C++ 2.0 version - Templates, exceptions, namespaces and new casts were not yet introduced. Saying that however, this book goes through the entire C++ standard of the time explaining the rationale, the possible implementations and features of the language. This is not a book to learn programming principles and patterns on C++, but to understand every aspect of the C++ language.

by anonymous   2017-08-20

I don't understand exactly what you're trying to accomplish here. User code shouldn't inherit from the streams themselves, as the streams are intended to provide a generalized locale specific conversion/"stringizing" facility. If you're trying to use an ostream which can write to a new buffer location (i.e. a gzip stream), then one should generally inherit from basic_streambuf instead, which allows you to use the existing iostream conversion facilities but will allow you to redirect their input/output.

If you want to learn the ins and outs of how iostream itself operates, the best book I've heard about the subject is Standard C++ IOStreams and Locales by Angelika Langer and Klaus Kreft. I can't myself vouch for the book because I have yet to get my copy (it is next on my list), but you can find several recommendations for it here on StackOverflow.

You also probably want to take a peek at boost::iostreams, which provides some helpers for anyone wishing to customize the behavior of the iostream system.

by anonymous   2017-08-20

C++ supports character encodings by means of std::locale and the facet std::codecvt. The general idea is that a locale object describes the aspects of the system that might vary from culture to culture, (human) language to language. These aspects are broken down into facets, which are template arguments that define how localization-dependent objects (include I/O streams) are constructed. When you read from an istream or write to a ostream, the actual writing of each character is filtered through the locale's facets. The facets cover not only encoding of Unicode types but such varied features as how large numbers are written (e.g. with commas or periods), currency, time, capitalization, and a slew of other details.

However just because the facilities exist to do encodings doesn't mean the standard library actually handles all encodings, nor does it make such code simple to do right. Even such basic things as the size of character you should be reading into (let alone the encoding part) is difficult, as wchar_t can be too small (mangling your data), or too large (wasting space), and the most common compilers (e.g. Visual C++ and Gnu C++) do differ on how big their implementation is. So you generally need to find external libraries to do the actual encoding.

  • iconv is generally acknowledge to be correct, but examples of how to bind it to the C++ mechanism are hard to find.
  • jla3ep mentions libICU, which is very thorough but the C++ API does not try to play nicely with the standard (As far as I can tell: you can scan the examples to see if you can do better.)

The most straightforward example I can find that covers all the bases, is from Boost's UTF-8 codecvt facet, with an example that specifically tries to encode UTF-8 (UCS4) for use by IO streams. It looks like this, though I don't suggest just copying it verbatim. It takes a little more digging in the source to understand it (and I don't claim to):

typedef wchar_t ucs4_t;

std::locale old_locale;
std::locale utf8_locale(old_locale,new utf8_codecvt_facet<ucs4_t>);

...

std::wifstream input_file("data.utf8");
input_file.imbue(utf8_locale);
ucs4_t item = 0;
while (ifs >> item) { ... }

To understand more about locales, and how they use facets (including codecvt), take a look at the following:

by anonymous   2017-08-20

In addition to the standard mandated encodings C++ also supports an implementation defined list of encodings via locales:

#include <locale>
#include <codecvt>
#include <iostream>

template <typename Facet>
struct usable_facet : Facet {
  using Facet::Facet;
};

using codecvt = usable_facet<std::codecvt_byname<wchar_t, char, std::mbstate_t>>;

int main() {
  std::wstring_convert<codecvt> convert(new codecvt(".1252")); // platform specific locale strings

  std::wstring w = convert.from_bytes("\u00C0");
}

Unfortunately one of the things about wchar_t is that the standard mandates only that it use a fixed width encoding for all locales, but there's no requirement that it use the same encoding in different locales, and so you can't portably convert to wchar_t using one locale and then convert that back to char using a different locale.

There is potentially some portable support for such conversions using functions like std::mbrtoc32 and related functions, but these are not yet widely implemented.

I understand that this can be done with a library such as iconv, but I am curious whether it can be done using only the C++ standard library. I ask this question not because I don't want to use iconv, but because I don't really understand how locales work in C++.

The locale library's design doesn't really lend itself to modern usage. C and C++ are themselves confused about encodings vs. character sets, and locales conflate lexical and orthographic issues with computational aspects such as encoding.

How locales work is a topic a bit broader than is suitable for a stackoverflow answer but there are books on the topic. You'd probably also need to read platform specific materials, because the standard doesn't really give any context for much of the functionality. For example the locale library supports message catalogues, but doesn't tell you what they are or how you'd actually make one because that's functionality is not standardized by C++.