Wednesday, December 13, 2006

Counted Method Chaining

I built a fun set of classes using C++ template meta-programming technique to count the number of times a member method of a class is invoked in a chain. These template classes implement a meta-programming technique called Counted Method Chaining. It counts the number of times a member function is called at compile-time and does not let programmer invoke it more than a used defiend limit. The limit is passed as a integer template parameter to the class to which to method belongs. It gives a compile time error only when method chaining is used. Otherwise it throws and exception. These set of classes use the C++ technique of co-variant return types.

/// @brief A genereral case of the template where add function
/// is public and therefore, it can be invoked.
template <unsigned int C>
class Base : public Base <C-1>
{
public:
virtual Base <C-1> & add (void const *, size_t length) = 0;
};

/// @brief The special case of the template when C (count) becomes zero.
/// The add member function is private and therefore, can't be
/// invoked by the client of the class resulting in a compile time error.
template <>
class Base <0>
{
public:
virtual ~Base () {}
private:
virtual Base <0> & add (void const *, size_t length) = 0;
};

template <unsigned int SIZE>
class IOV_Helper : public Base <SIZE-1>
{
public:
/// @brief Constructor: Makes a IOV_Helper object.
IOV_Helper () : count_(0) {}

/// @brief Adds a buffer pointer and its size to be sent over network later
/// using gather write technique.
Base <SIZE-1> & add (void const *ptr, size_t length)
{
if (0 == length)
return *this;

if (count_ >= SIZE)
throw count_ + 1;

iov_[count_].iov_base = const_cast <void *> (ptr);
iov_[count_].iov_len = length;
++count_;

return *this;
}

/// @brief Returns the number of I/O vectors populated currently
unsigned int size () { return count_; }

/// @brief Sends the data in the I/O vectors out to a remote peer.
int send (ACE_SOCK_Stream &out)
{
return out.sendv_n (iov_, count_);
}

private:
iovec iov_[SIZE]; /// @brief I/O vectors array.
unsigned int count_; /// @brief count of the number vectors added.
};

int main (void)
{
IOV_Helper <2> sender; /// Here 2 is the maximum limit of method chaining.
char * data = new char[20];
sender.add(data, 20).add(data, 20); /// OK
sender.add(data, 20).add(data, 20).add(data, 20); /// Compile-time Error
return 0;
}

Friday, September 08, 2006

Template argument dependent name lookup

Puzzle: What will be the output of the following program?

struct B {
void foo () {
printf ("B::foo");
}
};

void foo () {
printf("::foo");
}

template <typename Base>
struct Derived : public Base {
void bar () { foo (); }
};

int main (void) {
Derived <B> d;
d.bar();

return 0;
}

Answer: C++ compiler does not perform template argument dependent lookup because the rule is to lookup as many names as possible at the time of parsing the template the first time. It does not wait till the instantiation of it. So ::foo!

Friday, September 01, 2006

Named Parameter Idiom

I came across another C++ idiom which is "useful if a function takes a large number of mostly default-able parameters". You can check it out here. It makes use of a fairly common technique called Method Chaining. I am thinking of putting together a list of C++ idioms, which are spread across web and also the idioms from popular books.

Thursday, August 17, 2006

Double Application Of Smart Pointers

In my reading, I came across a cool, little technique in C++ which allows us to invoke a pre-method and a post-method automatically when a member method of the class is called. It makes use of so called "Double Application Of Smart Pointers" (see the second bullet).

This technique might be useful to implement different popular things such as pre-conditions and post-conditions, the "Monitor Object" pattern (POSA2), pointcut models in AOP, the Decorator pattern (GOF) and also the Execute-Around-Method pattern.

g++ compiler option -Weffc++

I found it quite interesting to know that g++ compiler actually provides an option (-Weffc++) which enables warnings for constructs that violate guidelines in Effective C++!!

From g++ manual (man g++)
== Begin ==
Warn about violations of the following style guidelines from Scott Meyers’ Effective C++ book:

* Item 11: Define a copy constructor and an assignment operator for classes with dynamically allocated memory.
* Item 12: Prefer initialization to assignment in constructors.
* Item 14: Make destructors virtual in base classes.
* Item 15: Have "operator=" return a reference to *this.
* Item 23: Don’t try to return a reference when you must return an object.

Also warn about violations of the following style guidelines from Scott Meyers’ More Effective C++ book:

* Item 6: Distinguish between prefix and postfix forms of increment and decrement operators.
* Item 7: Never overload "&&", "││", or ",".

When selecting this option, be aware that the standard library headers do not obey all of these guidelines.

== End ==

Indeed, Effective C++ is a very nice work by Scott Meyer. He has compiled two lists of top C++ publications: top 5 books and top 5 articles. His list captures right set of very well written, excellent publications in C++ as I have read most of the literature in those lists. I would also like to add More C++ Gems to that list.

Saturday, July 15, 2006

Generic container idioms

Developing generic containers in C++ can become complex if you want to develope truly generic containers (as much as they can get). Relaxing the requirements on type T is the key behind developing truly generic containers. There a few C++ idioms to actually achieve the "lowest denominator" possible with requirements on type T.

It is easy to come up with a generic stack which requires following operations defiend on type T: a default constructor, a copy constructor, a non-throwing destructor and a copy assignment operator. But thats too much!

The requirements can be reduced to the folloing list: a copy constructor and a non-throwing destructor.

To achieve this, a generic container should be able to allocate uninitialized memory and invoke constructor(s) only once on each element while "initializing" them. This is possible using following two techniques:

1. operator new:
void * mem = operator new (sizeof (T) * NUMBER_OF_ELEMENTS);

2. construct helper using placement new:
template <class T1, class T2>
void construct (T1 *p, const T2 &value) {
new (p) T1(value);
}

operator new allocates uninitialized memory. It is a fancy way of calling malloc.
The construct helper template function invokes placement new and in turn invokes a copy constructor on the initialized memory. The pointer p is supposed to be one of the uninitialized memory chunks allocated using operator new.

Moreover, pointers in the range [end, end_of_allocated_range) should not point to objects of type T, but to uninitialized memory. (end can be considered an iterator pointing at an element one past the last initialized element of the container)

When an element is removed from the container, destructot should be invoked on them. A destroy helper function can be helpful here as shown.

template <class T>
void destroy (T *p) {
p->~T();
}

Similarly, to delete a range, another overloaded destroy function which takes two iterators could be useful. It essentially invokes first destroy helper on each element in the sequence.

Please see More C++ gems for elaborate articles on this topic. (authors Hurb Sutter and Matthew H. Austern)

Sunday, June 11, 2006

Resource Management Patterns

Resource Acquisition catagory:
***********************************
Lookup: Naming service, Trading service, Directory service

Lazy Acquisition: Resource proxy, web browser image proxy
and deferred image download, Java JIT compiler. Good
for scalability and low latency.

Eager Acquisition: Good for predictability, bandwidth
pre-allocation, pre-connect.

Partial Acquisition: Acquire resoruce in steps.

Resource Lifecycle catagory:
********************************
Caching: for performance, identity of resources is important.

Pooling: for boundedness, identity of resources is not important.

Coordinator: To provide commit-or-rollback semantics. Similar to strong
exception safety guarantee. Best example: two-phase commit protocol (prepare() and commit())

Resource Release catagory:
*******************************
Leasing: Free up unused/invalid resources/service location references after some pre-determined lease time if not accessed. Renew lease if accessed in between.

Evictor: Used with caching or pooling. Implements a strategy
for discarding resources from cache or pool (LRU, FIFO etc)

Source:
Pattern-Oriented Software Architecture, Volume 3: Patterns for Resource Management

Wednesday, May 03, 2006

Upcoming C++ language changes (C++0x)

C++ programming laguage is undergoing interesting language changes.

The delegating constructor is a useful new addition which changes the semantics of throwing an exception from a constructor. As of now, exception raised during execution of any constructor of a class (including base-member initialization list), the destructor of that class is not invoked (for that object). This is because the object never really got created successfully. This will be now longer be true after C++0x is released!!!

Below is a list of some of the upcoming language changes.

* Nearly all of the Library Technical Report (TR1)
* Auto type deduction
* Delegating constructors
* Right angle brackets
* extern templates

Source : Highlights of the April 2006 ISO C++ meeting (Herb Sutter)

Sunday, April 30, 2006

Swapping two integers in a one liner

I found some really cool one liner solutions to swap two integers without using a temporary variable. These were posted on "C/C++ Programming Puzzles" community on www.orkut.com. Some of them are listed here.

1. a+=b-=a=b-a;
2. a/=b=(a=a*b)/b;
3. a^=b^=a^=b; same as a=(b=(a=b^a)^b)^a; (Thanks to Robert Vukovic)
4. b=a+b-(a=b);

Each individually are cool solutions. But all of them have one big problem. These are not portable! C standard says that when a value of an operand is changed in the same expression at some other place, then the result of the expression is undefined. Try running the program on gcc with -Wall option. All the above one liners (which are really cool to begin with!) are plagued by that. So I feel the most portable way to achieve the goal is:

a = a xor b;
b = a xor b;
a = a xor b;
(by Harpreet)

Unfortunately it is not a one liner.

Double Checked Locking Pattern, volatile keyword and memory barriers

Double Checked Locking Pattern (DCLP) is used to initialize singleton only once in a multithreaded application and also to avoid cost of acquiring lock everytime singleton is accessed. DCLP should be considered in two different dimensions. (1) CPU instruction reordering (2) Multi-processor machines.

In a single threaded application running on a modern CPU, use of volatile keyword in DCLP is really important. volatile keyword prevents any aggressive instruction reordering that modern CPUs might do. The static instance pointer of the singleton and the singleton by itself (both) should be volatile for the magic to work!

volatile exists for special purposes:
(1) the content of a volatile variable is “unstable” (can change by means unknown to the compiler),
(2) all writes to volatile data are “observable” so they must be executed religiously, and
(3) all operations on volatile data are executed in the sequence in which they appear in the source code.
The first two rules ensure proper reading and writing. The last one allows implementation of I/O protocols that mix input and output. This is informally what C and C++’s volatile guarantees.

In multiprocessor environment, DCLP should be used with memory barriers. This takes care of cache coherency issues between multiple CPUs and its effects on DCLP.

Source: C++ and the Perils of Double-Checked Locking

Saturday, April 29, 2006

Some C/C++ standards trivia

"Maximal munch" rule of C++ compiler tokenizer causes ">>" token to be indentified as a right shift operator. Basically, the tokenizer tries to match a string with longest possible "known" token. Because of this, in nested templates, X<Y<int>> can't be parsed sensibly as the ">>" is not split into two ">" (as desired by programmer). A simple solution is to put a space between two consecutive ">"s. This would have otherwise required context sensitive parsing. But C++0x standard promises to relieve programmers from this seemling vexing parse of C++. One of the few language changes in C++0x is to allow "X<Y<int>>!

Another addition is "extern template", which provides a syntax for telling the compiler that a specific template instantiation will be provided somewhere else. For more information please see: DDJ - More News From the Front

C99 standard defines a new (not new anymore) keyword "restrict". The restrict keyword tells a compiler that the specified pointer is the only way to access a given piece of data. This can allow the compiler to optimize aggressively. The restrict keyword is a bit confusing because it applies to the pointer itself rather than the data the pointer points to (contrast const int *x and int *restrict x). I think C++ still does not have this keyword like many other things in C99 (for example, variable size arrays!!)

Please see: Guidelines for writing efficient C/C++ code

Tuesday, April 25, 2006

C++ templates should define traits

Exposing type information "out" from a class template appears to me as a good practice. For example, a class template Array<T> should define a trait which looks something like typedef T Array:TYPE in its public interface. Not only the parameter type, it should also expose any composite types used inside the template.

There are several good reasons behind it.

1. Member functions of the class can make use of the traits.
2. You can avoid typing really long names using these quick typedefs.
3. Consistency.
4. Other classes can make use of the traits. For example, derived classes. Sometimes it is simply not possible to write a non-template derived class of a parameterized class if the parameterized base-class does not expose necessary traits.

For example,
template <typename T> class Array;
template <typename T> class Access_Table {
Array<T> array_;
};

A new class AT_Adapter wants to adapt Access_Table<T> by adding get_array() function. It simply returns the Access_Table<T>::array_ by reference.

class AT_Adapter : Access_Table <char> {
const WHAT_IS_THIS_TYPE &get_array () const;
};

But what is the return type of the function? Therefore, class Access_Table<T> should be modified as

template <typename T> class Access_Table {
typedef Array<T> ARRAY_TYPE;
ARRAY_TYPE array_;
};

So that, the following works:

class AT_Adapter : Access_Table <char> {
const Access_Table<char>::ARRAY_TYPE &get_array () const;
};

Friday, April 14, 2006

Really short STL notes

My STL notes after reading Effective STL by Scott Meyer. For most of the points below, there are subtle important reasons for that. The reasons are not mentioned here as it would be too long. Please read Effective C++ by Scott Meyer.

1. Use member insert function instead of copy algorithm and inserter.

2. Minimize capacity: string (s).swap (s)
Clear and minimize capacity : string ().swap (s)

3. Associative containers use equivalence (strict weak ordering) and not equality.
Equivalence: !(w1 < w2) && !(w2 > w1)
In strict weak ordering, equal values are not equivalent!
Therefore, Comparison function should return false for equal values when equivalence is expected.

4. Use map::operator [] to modify/retrieve existing elements in the map
Use map::insert to add new elements

5. Iterator can be implicitely converted to reverse_iterator.
reverse_iterator to iterator: use member base() function.
const_cast works for conversion from const_iterator to iterator of strings and vector because they are char* and T* pointers. For others to convert from const_iterator to iterator use std::advance (i, distance (i,ci))

6. istream_iterator is used for formatted input. (does not read spaces)
istreambuf_iterator also reads spaces between strings.

7. v.reserve(size) does not invoke constructor (even default). It simply allocates more memory, if any, to hold size objects. Insertion of value at v.end () is wrong in general, use back_inserter (v) instead. Insertion is important and not assignment even after reserve.

8. v.remove () may not erase elements from the container, therefore, v.size() may not change after v.remove.

9. binary_search should receive same comparison function as the sort function did.

10. Custom function objects should be adaptable. To make them adaptable inherit from unary_function or binary_function. Adaptable function objects can be used with not1.
ptr_fun, mem_fun and mem_fun_ref are used to adapt member functions for algorithms like for_each.

Monday, April 10, 2006

STL short / one liners

A very nice collection of wrappers is available on Stackoverflow.

1. Copy everything in the a container to std::cout (e.g. std::set<std::string> c; )
std::copy (c.begin(), c.end(), std::ostream_iterator <std::string> (std::cout, "\n"));

2. Clear vector v and minimize its capacity (potential number of elements it can hold without resizing).
vector <std::string>().swap (v); Yes! It is a swap with an empty temporary!!

3. Remove all integers of value = 0xDeadBeef;
std::erase (std::remove (c.begin(), c.end(), value), c.end());
This is known as erase-remove idiom.

4. Invoke a member function on all the elements of the container.
std::for_each (c.begin(), c.end(), std::mem_fun (&Class::function));

5. Copy one container to other
std::copy(V.begin(), V.end(), L.begin());

6. Fill an array with random numbers
std::generate(V.begin(), V.end(), rand);

7. Read a file of integers in a set
std::istream_iterator <int> data_begin (std::cin), data_end;
std::set <int> S (data_begin, data_end);

OR
std::set <int> data ((istream_iterator <int> (datafile)),istream_iterator <int> ());

OR
istream_iterator <int> start (cin), end;
copy (start, end, std::back_inserter (v));

8. Reading entire file in one go.

Solution 1:
std::istreambuf_iterator < char > begin(std::cin), end;
std::string str(begin, end);

Solution 2:
std::ostringstream temp;
std::ifstream infile ("file.txt");
temp << infile.rdbuf();
std::string str = temp.str();

Saturday, March 18, 2006

Standard concurrency support to C++

The "Concurrency Revolution" is coming at us fast (to gulp us) whether you are ready for it or not! Please see a presentation (audio/video) by Hurb Sutter at parc. Our favorite language, C++, is also not far behind in providing standard concurrency support to the language/library. Please see: Threads and memory model for C++. I feel there is exciting time ahead and a lot of really cool stuff to continue this blog for many years to come!

Sunday, March 12, 2006

What is a "thunk"?

A "thunk" appears to be an overloaded term in computer science. Use of the word "thunk" goes back to Algol 60 implementation. In general, as I understand it, thunk is a function which is used to replace something. More often than not, it is auto-generated. This "something" could be an expression (in a programming language) or an address.

In programming language world, a known usecase is call-by-name. The mechanism of dynamic linking (.DLL/.so files) uses local thunks to invoke dynamic linker at run-time and replace the thunk with the actual function for all the later invocations of the function. Please see how this technique can be exploited to modify program behavior at run-time.

In the C++ world, transparent replacement of addresses is very useful. An example is, adjusting pointers when virtual functions are invoked using one of the many possible base classes of the derived most class when multiple inheritance is involved. Depending upon which base class is used, the "this" pointer is adjusted using thunk methods to point at the right position in the object.

Helpful source: http://en.wikipedia.org/wiki/Thunk

Friday, March 10, 2006

C++ object layout implementation


There are several cool implementation techniques of C++ object layout, especially when multiple inheritance, virtual functions and virtual inheritance is involved. Understanding the consequences of these rather advanced concepts in the language on the implementation of the language is quite interesting. There are several "gotchas" involved. Example, it is highly recommended to invoke the virtual base class's constructor in the derived most class when there is a diamond shape inheritance hierarchy. It is a pretty eloborate subject so I won't duplicate the same thing here. I would just assemble some really good articles on the topic. Lippman's book "Inside The C++ Object Model" talks about it at length.

Please see:

C++ Object layout in multiple inheritance
Adjustor thunks
Multiple inheritance in C++ (Stroustrup)

Friday, February 24, 2006

Functions inside constructor should throw (if they fail)

This is a for constructors, which do non-trivial resource allocation
and initialization using helper member functions in the class. Exceptions
raised in such member functions should be propagated upto the constructor
and not eat them. In other words, such member functions should throw
exception if they can't satisfy their contract even though they provide
strong exception safety guarantee. Exception propagated upto
the constructor will be automatically propagated where the object is being
constructed indicating failure in object construction. Otherwise, silently
eating exceptions during object initialization will create an object in
an inconsistent state. The scope in which it is created will never know
that actually the object construction failed. If object construction can
be reasonably performed even in the face of exception, then it need not be
propagated out.

Example, LStack: A Stack implemented as a linked-list.

LStack::LStack (const LStack & s)
{
this->copy_all_nodes (s);
}

/// Following function provides strong exception safety guarantee
LStack::copy_all_nodes (const LStack &s)
{
try {
/// Copy nodes from the linked list of s into this's linked-list
/// copying of list elements may fail at arbitrary intermediate
/// position.

}
catch (...) {
this->delete_partially_initialized_list ();
/// throw; /// Should be done.
}
}

In the above case, copy_all_nodes is flowed because it failed to satisfy its
contract and still did not indicate its failure to the constructor.
Consequently, the client trying to use copy of LStack object will
be disappointed for sure.

This is important in case of constructor because it is difficult to tell
that object construction failed because there is no return value.

Tuesday, February 14, 2006

Why overloaded ++ operator signatures are different?

The canonical form of overloaded pre-increment and post-increment
operators for class Foo are declared in the following way in C++.

/// Preincrement operator
Foo & operator++ (void);

/// Postincrement operator
const Foo operator++ (int);

The int in the post-increment operator is obviously to disambiguate
between post and pre forms of the operator. Then, why is the return type
different? As many other C++ features, this one also has a subtle
meaning!

In C++, for ints, you can write

++i = k;

but not,

i++ = k;

This is because, i++ results in a Rvalue to which you can't assign.
Unlike i++, ++i results in a Lvalue to which you can assign which
is infact ( incremented ) i itself. Not that ++i = k; has a great
importance, but it is a fact that C++ has been designed that way.
I would be interested in knowing the reason. One reason is that
i++ = k; is not allowed is that it is just ambiguous.
but ++i = k; is not ambiguous.

A const in the return type is also required to disallow passing
the return value of i++ to a function taking non-const parameters
by reference. e.g. bar (Foo &); .... bar (f++); The return value of
f++ (which is f) is not supposed to last beyond the function call.

Therefore, in C++ these semantics of basic types should be followed by user-defined types.

Unless you declare pre-increment operator to return a reference to *this
it will not completely mimic the above basic type semantics. It is quite
important to preserve the basic type semantics when you overload these
opertors because the client and the compiler expect that you indeed preserved them.

Thursday, January 05, 2006

Curiously Recurring Template Pattern (CRTP)

It is a C++ idiom in which inheritance and template type parameters are cleverly integrated to generate flexible types at compile-time. In CRTP, a template class inherits from its own template parameter. For example,

template <typename Base>
class MixinOne : public Base {};

This technique is also one of the ways of using "Mixins". Here class Mixin helps to "mix in" some useful functionality into any class type Base. For example, class Mixin can make class Base a singleton. Complete code example is here. In this case of CRTP, a hierarchy (a new class plus inheritance relationship) is created when Mixin template is instantiated. Thus hierarchy creation is differed till the instantiation of the template. This is pretty cool!

Moreover, CRTP can appear in a different fashion in which a class passes itself as a template parameter to its base class. For example,

class Derived: public MixinTwo<Derived> {};

The non-template class Derived inherits interface/structure (hopefully customized for itself) from Mixin class. In this case of CRTP, the class hierarchy is "created" when Derived class is coded unlike in the earlier case where, class hierarchy is "created" when template is instantiated.

This curious name to the idiom was given by James Coplien. Also see this for a complete example of the technique.