In this series I've explained why Perl roles exist, and discussed Perl roles versus inheritance and Perl roles versus duck typing. Comments on the latter posting have raised several good questions that I'll address in another posting. In particular, some people see the relative informalism of duck typing as a major benefit and rarely see value in other possibilities for abstraction and safety that roles provide. (Sometimes that's the right choice, too.)
Today's topic takes the opposite approach.
Subclassing Inheritance
Many people who learned object orientation through languages such as Java and C++ see inheritance as a vital component to managing large programs. I've argued before that this type of inheritance (by which I mean "a subclass extends a superclass") provides two features. First, the language's type system understands that a subclass/superclass relationship means that it's safe to substitute an instance of a subclass in any code which expects an instance of a superclass. Second, the subclass may (mostly) transparently reuse code defined in the superclass with little or no syntax required.
In other words, subclassing inheritance provides a mechanism of code reuse and a mechanism of identifying safe substitutability.
This works great when you can model all of the entities in your program in a singly-rooted hierarchy. Many simple programs do this effectively.
As the difficulty for creating sane biological taxonomies indicates, the real world does not lend itself to such artificial simplicity. (Extinction of the duck-billed platypus might have helped Linnaeus -- thanks to educated foo's suggestion for correcting this analogy -- but I'd miss the little guys. Besides that, I can't give birth to live young myself, so I'm obviously not a mammal.)
This is miles from the interesting question, however. I assert that the
real and proper question for any API which wants to assert a property about the
objects it affects is "Do you behave in a way consistent with my expectations?"
In other words, it's much less interesting and much less general to say "My
log_message()
function requires an instance of a
String
as its argument" than to say "My log_message()
function requires an instance of something which Stringifies
."
If an object of one type can stand in for an object of another type, does it matter how that object does so? Hold onto that thought.
Abstract Base Classes
If you're arguing in your head right now saying "Program to an interface, not an implementation!" you're absolutely right. Well-encapsulated programs define well-understood APIs and let the internals of those APIs worry about themselves. As long as you know that an object (whatever its type) implements the proper interface, surely you can get on with your programming and let it go its own way.
One mechanism of ensuring that all object instances of a class a hierarchy implement the proper interface is to specify that interface in an abstract base class from which all classes in that hierarchy inherit.
If you've programmed in C, you might recognize this as a somewhat more modern descendent of a separate header file (unless you put executable code in your public header files, in which case no advice I give will help you).
Depending on the strictness and dynamicism of your language and compiler and runtime environment, you might get a compile-time warning that any subclass of this ABC implements all of the required methods rather than inheriting them. In other words, you do get the enforcement of this contract without all of that pesky code reuse.
If you believe that Don't Repeat Yourself is important in software -- and it is -- you may have the unenviable task of rooting around in a singly-rooted inheritance hierarchy to push concrete method implementations to the root of the tree where you have to maintain less duplicate code. This is why concrete base classes often contain a lot of methods that may or may not apply to all of their subclasses. Copy and paste seems wronger than overinheritance from god classes.
Some languages suggest that a singly-rooted inheritance hierarchy creates more problems than it solves, and they allow any class to inherit from multiple parents. This solves the code reuse problem to an extent, but it creates other problems related to the structural layout of objects of multiple classes, potential conflicts in attribute names, method resolution and visibility ordering, and circular parent relationships. These can present debugging difficulty.
Interfaces
Java (probably wisely) eschewed multiple inheritance, but recognized that an instance of any given class may conform to multiple interfaces properly. Thus it provides Java interfaces. (I've glossed over the history of this feature by speculating only on its motivation. Students of programming languages should look at C++, Eiffel, Objective C, and Sather for a better view of design influences.)
A Java interface is, effectively, an abstract base class from which you do not inherit. Thus, your Java class can inherit from a parent class and implement as many interfaces as you like.
One benefit of a Java interface is that you can use the name of an interface anywhere you could use the name of a class, and then you can use any object which implements that interface in any API that expects an object which implements that interface, no matter how it implements that interface.
One drawback of the Java interface is that offers no code reuse either.
Think of Java interfaces as slightly safer multiple inheritance without the possibility of code reuse and a slightly worse syntax, and you have them.
Are they really that bad? If you use them correctly, no. Does anyone?
Concrete Problems
Imagine you have an API written by someone else. You don't have the right (or access or source code) to change it. You have to live with it.
You have a method called on a Logger
object called
log_message()
. It takes a single argument -- a String
. Any String
you pass to the logger gets logged to the appropriate place.
Suppose you have an object which represents a Customer
-- a name, an address, some notes. Suppose you want to log the relevant customer information. Easy, right? Just produce a String
from the Customer
and send that String
to the log_message()
method.
Except suppose that the library's version of String
supports onoe encoding and the String
produced by the Customer
object is an incompatible encoding... or this or that or you just object to the two-step boilerplate code that makes you manually stringify your Customer
objects when they already know how to stringify themselves.
A better approach is to change the log_message()
signature to
decouple it from the concrete String
class to an interface which
means "Anything which implements this interface produces a String
when I call its stringify()
method."
Of course, that means changing the library. You may not have access to do this.
Suppose you did, however. You could create an abstract base class from which String
could inherit -- if you have permission to modify that library, and if it doesn't already inherit from a concrete base class. You could create an interface which String
implements, if you have permission to modify that library.
Perhaps that's a silly example. It's a deliberately simple example. Imagine that the necessary interface has two methods, or ten. Imagine that you want to pass in an object which performs its own logging, or diagnostic testing, or remote proxying.
You will likely find that everywhere you want genericity and code reuse decoupled from a singly-rooted inheritance hierarchy you need to program to interfaces, not instantiable classes -- and you use interface types, not classes, in all of your signatures, because you don't want to prohibit people from doing useful things even if their singly-rooted inheritance hierarchy doesn't match yours.
If I'm right -- if the real question is "Do you understand these methods the same way I understand them in this context?" -- this is busywork because there's one glaring flaw in the interface and abstract base class concept: there's no way of saying "Instances of this class are semantically equivalent and substitutable for instances of that class" unless you manually extract interfaces from every class defined in your system.
Every Class Implies a Role
Here's one subtle trait of roles: every class declaration also declares a role.
If you define a class:
class MyAwesomeClass
{
method foo { ... }
method bar { ... }
method be_very_awesome { ... }
}
You can write:
class MyCompletelyUnrelatedClass does MyAwesomeClass
{
method foo { ... }
method bar { ... }
method be_very_awesome { ... }
}
... and even if there's no inheritance relationship between these classes (as you can see, there isn't) and no formal declaration of the MyAwesomeClass
role (and there isn't), you get all of the benefits of roles (method composition, compile-time API verification, genericity, substitutability) without having to modify the declaration of MyAwesomeClass
, or rejigger its inheritance hierarchy, or extract a role manually. Any API you've already written which operates on an instance of MyAwesomeClass
will operate safely and correctly on an instance of MyCompletelyUnrelatedClass
without modification.
You get code reuse too, if you want it.
Hi,
I was reading your last code example, regarding declaring a class creates a role based on that class (if I am reading correctly). I can see in Moose source where we create a type for the class, but can't spot where we create the role. If so, this is a feature I didn't know about, and is definitely interesting.
Just one thing, regarding: "the language's type system understands that a subclass/superclass relationship means that it's safe to substitute an instance of a subclass in any code which expects an instance of a superclass" I feel strongly about this as well, that you should try hard to make sure it's the case, but I think right now since you can add attributes to subclasses that can be required, isn't it possible to create a subclass that can't be stuck in the same slot as a superclass?
Thanks for this series of articles chromatic. Very helpful.