What if everything we thought about types in dynamic languages were wrong?
(Dear Internet forum readers, Thank you for reading all the way into the second paragraph. I shouldn't have to answer a rhetorical question with the obvious answer of "It isn't," but now you have both the answer and the moral authority to mock everyone with a narcissistic streak large enough and an attention span short enough to rush immediately to a favored Internet forum and post a lengthy "Look at how smart I am and how dumb someone else is!" screed. Have a nice afternoon!)
I've updated Test::MockObject, UNIVERSAL::isa, and UNIVERSAL::can recently. I've also spent a lot of time working with Moose-powered code to eliminate duplication, improve genericity and polymorphism, reduce the possibility of errors, and increase testability and test coverage.
Then I read the documentation for Data::Thunk and something interesting struck me.
I've long argued that checking types by name and mechanism in Perl 5 is problematic. So is giving up by pretending everything is a duck. The conflict between specifying your requirements too strictly and removing the possibilities for extension or genericity you can't imagine and being too lax and allowing the possibility for error is difficult to navigate.
Yet sometimes I think about this in the wrong way.
A thunk in the Data::Thunk
sense is a way to delay an expensive
calculation until you need it. (I quite like lazy synthetic
attributes in Moose for similar reasons.) While synthetic object attributes
are promises encapsulated in an object behind accessors—and you can pass
that object around without ever triggering the lazy generation until you need
it—a thunk exposed as a non-object, perhaps an array or an other
primitive first class value, doesn't have the same level of encapsulation.
In other words, it's far too easy to tell that @immediate_values
is different from $thunk_to_calculate_values
.
... except that perhaps we think about things incorrectly.
Suppose you want to generate a list of prime numbers for cryptographic purposes. The first few prime numbers are cryptographically worthless. The numbers get ever more expensive to calculate as you go on. Your code needs to find a balance between calculating too many or too few, but you don't necessarily know which pair will suffice for your purposes until you calculate them. (Also, you probably don't have to calculate all of the intermediate primes between 1 and 2 and n and m, if you have a good algorithm to pick a few potential primes and continue from there.)
I see three possibilities to represent the data structure containing this list of primes:
- A plain array
- An iterator or generator (whether with internal language support or an object)
- Something lazy as a combination of both
You might be fortunate enough to use a language such as Haskell with this laziness as a fundamental language feature. Good for you! You may use Python, in which case a generator expression might be the best approach. Hooray, I suppose. You may use Perl 5, and so you have plenty of options for syntax. That flexibility can be handy.
How much are you going to let your internal representation of a storage mechanism tie your hands from a design perspective?
If you choose the array, you've tied your code to a specific mechanism and a specific syntax. The same applies to an object or generator, though the object gives you slightly more options, in the polymorphic sense, to retain an interface but provide a different implementation. Even so, you can't swap a lazy array for an object without bridging the difference in interface, unless your language explicitly supports this.
Therein lies my question about type checking in dynamic languages.
I've been a good programmer. I rewarded myself with a couple of cookies for separating the concern of generating a list of random numbers from the code which uses that list of random numbers. It's easier to test, to maintain, to document, to maintain, to do everything I might need to do to it.
Yet the two pieces of code I've worked so hard to decouple in form are still tightly coupled via the types of a parameter used to communicate between them, because they both have a dependency on being some sort of array, some sort of generator or iterator, or some sort of object which provides a generator or iterator, when all I really want to be able to say is "These two separate pieces of my system communicate when one of them provides a promise to provide multiple prime numbers".
I used the word tied in a previous paragraph. If you caught the pun, great. (If you're not a Perl 5 programmer and you wonder about the pun, don't. It's really not that clever.) That approach is good in some ways and awful in others, because it does allow uniformity of interface (with an awkward and slow implementation) but it doesn't allow me to decorate the arrayish variable with the type information that "This thingie you can treat as an array is a promise to provide random numbers when you want them. Don't ask how. Leave efficiency concerns to the implementation. You worry about what you get out of it."
(One also sometimes wonders why Python simultaneously prides itself on a rigid orthogonality and parsimony of syntax such that toddlers often speak in valid Python programs before they learn the vagaries and inconsistencies of English while borrowing and uglifying generator and filter syntax from other languages when it would have been simpler to say that generators are specific enhancements and refinements of lists. Then again, I as an American from the frontier states lack certain clever Continental irony.)
In other other words, maybe us dynamic language get types so wrong because we're so caught up in thinking "What primitive does this resemble?" or "What global string name does this or any class in its inheritance hierarchy match?" that we too rarely stop to ask ourselves the important question of "What does this data represent?"
Would that our languages allowed us to express that meaning instead of merely the mechanism.
Maybe it's because it's 1 am here, but i read this twice and i have only a very faint idea how you got from the first to the last line.
I want a safe subtype of a builtin (array, in this case). I don't want it to look like an object (even if I have to use a metaobject protocol to define it) or a generator expression (even if it uses a generator expression internally). I don't want to monkeypatch a global metaobject and I don't want to modify any of the other array-handling code in the language or libraries I use.
For bonus points, I want to create it with as little ceremony as possible, perhaps my InfiniteLazyPrimes @primes;.
Perl 5's tie is not right, but I'm starting to think it's a lot less wrong than "everything that's not syntax is a method call on an object" or "special data cases deserve special syntax".
This makes me think of two issues:
1. Arrays shouldn't be basic primitives, and type restrictions should be defined by roles. Arrays are an iterable, countable, indexable sequence, and most often when a function says it wants an array, it really only needs an argument with one or two of those properties. Asking for an array when the only behaviour you really care about is being able to loop across it... well that's going to lock out the most elegant solutions to some problems. Unfortunately that's the way things are.
2. Often people misuse type restrictions when really what they mean is "I don't want this other type because that's an error", what they really should be doing is a sanity check to make sure the wrong types aren't accepted, rather than restricting to only the known good list of types that they can conceive of using. Strict typing makes it easier to do it this way unfortunately.
One of the reasons I like working with Perl is that when you're reusing a module that someone else has developed, you run into situations where you're constrained by the imagination of the original module's author far less frequently than languages that lead people to pepper their code with unnecessary type declarations, final declarations, private member variables and so forth.
Ironically, in my experience this leads to code that respects encapsulation and resorts to less hacks.
I've wandered a bit off-topic, but it comes back to the argument of restrictions based on what something is vs what it does, I fall firmly in the does camp. And an array is a thing not a behaviour.