C++ Boost

Serialization

Dataflow Iterators


Motivation

Consider the problem of translating an arbitrary length sequence of 8 bit bytes to base64 text. Such a process can be summarized as:

source => 8 bit bytes => 6 bit integers => encode to base64 characters => insert line breaks => destination

We would prefer the solution that is:

The approach that comes closest to meeting these requirements is that described and implemented with Iterator Adaptors. The fundamental feature of an Iterator Adaptor template that makes in interesting to us is that it takes as a parameter a base iterator from which it derives its input. This suggests that something like the following might be possible.

typedef 
    insert_linebreaks<         // insert line breaks every 72 characters
        base64_from_binary<    // convert binary values ot base64 characters
            transform_width<   // retrieve 6 bit integers from a sequence of 8 bit bytes
                const char *,
                6,
                8
            >
        > 
        ,72
    > 
    base64_text; // compose all the above operations in to a new iterator

std::copy(
    base64_text(address),
    base64_text(address + count),
    ostream_iterator<CharType>(os)
);
Indeed, this seems to be exactly the kind of problem that iterator adaptors are intended to address. The Iterator Adaptor library already includes modules which can be configured to implement some of the operations above. For example, included is transform_iterator, which can be used to implement 6 bit integer => base64 code.

Dataflow Iterators

Unfortunately, not all iterators which inherit from Iterator Adaptors are guarenteed to meet the composability goals stated above. To accomplish this purpose, they have to be written with some additional considerations in mind. We define a Dataflow Iterator as an class inherited from iterator_adaptor which fulfills as a small set of additional requirements.

Templated Constructors

Templated constructor have the form:


template<class T>
dataflow_iterator(T start) :
    iterator_adaptor(Base(start))
{}
When these constructors are applied to our example of above, the following code is generated:

std::copy(
    insert_linebreaks(
        base64_from_binary(
            transform_width(
                address
            ),
        )
    ),
    insert_linebreaks(
        base64_from_binary(
            transform_width(
                address + count
            )
        )
    )
    ostream_iterator<char>(os)
);
The recursive application of this template is what automatically generates the constructor base64_text(const char *) in our example above. The original Iterator Adaptors include a make_xxx_iterator to fulfill this function. However, I believe these are unwieldy to use compared to the above solution usiing Templated constructors.

Unfortunately, some systems which fail to properly support partial function template ordering cannot support the concept of a templated constructor as implemented above. A special"wrapper" macro has been created to work around this problem. With this "wrapper" the above example is modified to:


std::copy(
    base64_text(BOOST_MAKE_PFTO_WRAPPER(address)),
    base64_text(BOOST_MAKE_PFTO_WRAPPER(address + count)),
    ostream_iterator<char>(os)
);
This macro is defined in <boost/serialization/pfto.hpp>. For more information about this topic, check the source.

Dereferencing

Dereferencing some iterators can cause problems. For example, a natural way to write a remove_whitespace iterator is to increment past the initial whitespaces when the iterator is constructed. This will fail if the iterator passed to the constructor "points" to the end of a string. The filter_iterator is implemented in this way so it can't be used in our context. So, for implementation of this iterator, space removal is deferred until the iterator actually is dereferenced.

Comparison

The default implementation of iterator equality of iterator_adaptor just invokes the equality operator on the base iterators. Generally this is satisfactory. However, this implies that other operations (E. G. dereference) do not prematurely increment the base iterator. Avoiding this can be surprisingly tricky in some cases. (E.G. transform_width)

Iterators which fulfill the above requirements should be composable and the above sample code should implement our binary to base64 conversion.

Iterators Included in the Library

Dataflow iterators for the serialization library are all defined in the hamespace boost::archive::iterators included here are:
base64_from_binary
transforms a sequence of integers to base64 text
binary_from_base64
transforms a sequence of base64 characters to a sequence of integers
insert_linebreaks
given a sequence, creates a sequence with newline characters inserted
mb_from_wchar
transforms a sequence of wide characters to a sequence of multi-byte characters
remove_whitespace
given a sequence of characters, returns a sequence with the white characters removed. This is a derivation from the boost::filter_iterator
transform_width
transforms a sequence of x bit elements into a sequence of y bit elements. This is a key component in iterators which translate to and from base64 text.
wchar_from_mb
transform a sequence of multi-byte characters in the current locale to wide characters.
xml_escape
escapes xml meta-characters from xml text
xml_unescape
unescapes xml escape sequences to create a sequence of normal text

The standard stream iterators don't quite work for us. On systems which implement wchar_t as unsigned short integers (E.G. VC 6) they didn't function as I expected. I also made some adjustments to be consistent with our concept of Dataflow Iterators. Like the rest of our iterators, they are found in the namespace boost::archive::interators to avoid conflict the standard library version.

istream_iterator
ostream_iterator

© Copyright Robert Ramey 2002-2004. Distributed under the Boost Software License, Version 1.0. (See accompanying file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)