I don’t think we have any actually good programming languages, and I don’t think I’m alone in believing this. Programming is hard, and language design is harder. We’re still learning. But I think they’re all failing us in a shockingly fundamental way.
The root of the trouble is a distinction I’d like to draw between data and objects. Let me know if you think there are better terms to use.
Programming languages give us tools to represent things. Sometimes these things are values: the integer 1. A 1 is a 1, same as any other 1. Sometimes these things have identity: this integer 1 lives over here, and that integer 1 lives over there, they aren’t the same. This can matter because (just as one example) sometimes these things are mutable: now that integer 1 becomes integer 2. It wasn’t just a value.
For some things, the internals should be hidden away, but for others, we wonder why we’re expected to litter all these repetitive getters and setters about. Sometimes, we can use a thing here or there without issue, and sometimes we have to serialize and deserialize it to get around. That action causes it to lose some of its identity: now there are two things.
Sometimes we care about how these things get used, especially how they get extended. Perhaps we decide a Thing has a fixed set of behaviors. (I’m going to stop playing coy: an interface, with a fixed set of methods.) Then users can end up creating their own Things (i.e. new classes that implement that interface). Or alternatively, we can decide a Thing has a fixed set of variants (i.e. algebraic datatypes). Then users can freely create any behavior—any function—over a Thing .
If you’ve been counting, that’s 5 different choices a language can make about how to represent something. Naively, there could be 32 different possible designs, and that’s assuming there aren’t a lot more choices than the ones above (and that they’re all binary… some aren’t!) But these decisions correlate together into essentially just two major designs that make a lot of sense:
Data Objects Equality Values: any 1 is a 1. Object identity: this 1 is not that 1. Identity It’s just bytes, copy away. You can serialize / deserialize or clone, but that’s an action that gives you a new object with distinct identity. Mutability Immutable: nothing else makes sense. You wouldn’t assign 1 = 2 . Usually mutable: objects help us organize state. Abstraction Exposed internals: it’s just data, it conforms to a schema. There’s no harm in seeing it. Encapsulation is one of the best aspects of OO, you can maintain internal invariants, as long as you don’t mistakenly expose mutable internals. Extensibility The schema gives us a fixed set of variants, over which you can of course write any function you want. We have a fixed set of exposed operations, but different variants can be constructed (including an improved ability to evolve a variant without impacting clients).
We do see a couple other variants in the wild, but I submit that (other than a few domain-specific examples) these are really just work-arounds for inadequate language design. We have immutable objects so often in Java because, what choice do you have? You can’t actually represent data as data in that language. Java’s support for data begins at int and ends at float . C doesn’t do much better: tack on structs and arrays.
An anecdote: ever see generated code in Java try to store a blob of data? Say, an array of integers? If you just have a static array, well, that turns into bytecode that allocates an array and one-by-one assigns each element to the correct value at class load time, and can quickly exceed the bytecode size limits for a method. There isn’t (yet?) a format for just any kind of data in .class files, so you end up encoding the data as strings and then concatenate and decode that string at runtime. “We built this nice compact state machine transition table and now we… oh, EWW!”
In a world where we’re good at program design, we’d be making a conscious choice about whether the thing we’re trying to represent is data or an object. And our programming languages would help us and support us in representing it as we chose.
... continue reading