Why 6? Why Not Objects? Why Java?

Six Things

If you already know a computer programming language, and especially if you are an academic doing computer science, you might start to wonder where these SIX things came from. Why not five or eight?

The academic answer (for not more than six) is that it's "Turing Complete." That means you can use them to make a Universal Turing Machine (UTM), which is mathematically the simplest possible computer that can be programmed to emulate anything that can be computed -- including another UTM. Turing Machines are so foundational to everything computational, it's not hard to find a reasonably good description of how it works, but here's one that's still open to the public. My website here is probably not the best place for a proof of Turing Completeness, but I can discuss it informally...

I write compilers (I wrote the book!) which translate one Turing Complete language (like C or Java) into another Turing Complete language (like the machine language of the hardware), so I have become very familiar with what needs to be carried across in the translation. The grammar that readers of my book (and users of my TAG compiler) write their compilers in essentially has exactly those six elements as the syntax of their specification. It's a good start.

Now we look to see if there could be less than six, mostly by trying to identify one of them that is redundant.

You can remove subroutines from a program by replicating the code -- which is basically how you do them in a UTM -- but then you have all these separate, hopefully identical, copies to maintain, and if an update misses one of them, what a nightmare! We really want to go the other direction, making subroutines more powerful -- and Java does that! We call them "objects" and they are really little more than collections of subroutines plus the data they operate on, all grouped together with a single group name so you do not need to look inside unless it failed to operate as advertised. In the final segment of your learning to program in Java, you will come to appreciate the power of these collections of subroutines -- I mean objects and classes -- that enable you to write a much bigger program than you otherwise would have thought possible. The reason we write programs in the highest-level Turing-complete language available is that it enables us to think abstractly, which is faster and less error-prone than spending all your time at the lowest level. All programming languages -- indeed all computer hardware -- in use today have subroutines. We cannot remove subroutines from the list of Six.

You could leave out iteration and make everything you want to do in a loop as a recursive subroutine -- as is necessary in Lisp -- but that loads your thinking down with all the power and cost of recursion when all you want to do is repeat something a bunch of times. Recursion is not an essential aspect of writing programs the way (non-recursive) subroutines and simple iteration are. You can do recursion using an array and iteration. Iteration is more fundamental than recursion.

Similarly, you can leave out sequence and substitute for it some other mechanism to enforce the proper execution order -- such as recursion in Lisp -- but what you have done is not disproved the need for sequence (or iteration), but only that it is so fundamental that you are forced to manufacture it out of other available primitives in your language. I did the same thing with the need for subroutines in Chomp. We can do that, but it proves the necessity of that abstraction.

Sequence is particularly interesting, because the UTM itself is defined to be sequential, one step at a time, in order. It similarly has conditionals, changes in the sequence based on the symbols in the input tape. The UTM tape is its input and output, and can also be used to off-load variables, which are normally stored in the UTM states. Value calculation in a UTM is particularly messy, so nobody talks about it, but when they do, it turns out to be a combination of the states and transition rules. A UTM that does anything useful would need an astronomical number of states -- something like 2 raised to the power of the number of bits in the computer it emulates, and/or a very long tape (no wonder it's defined to be infinite).

Iteration in a UTM is done with GoTos -- "GOTO" is a 4-letter word that polite people do not use in public, at least not in computer-literate public after Edsger Dijkstra's famous 1968 letter in CACM -- indeed all iteration was done with GoTos before structured programming showed us they were unnecessary. Well, almost unnecessary: sometimes when your program encounters something horribly wrong, the only to recover is to jump way over there in your program. Modern languages like C and Java do that kind of semi-structured GoTo with exception and break and out-of-sequence return, all of which programmers use so often that they forget that these commands are really limited GoTos. The reason GoTos are considered harmful is that they make your code so random and disorganized that not even the programmer who wrote it can read it a week later. Unrestricted use of break and try/catch (exception handling keywords in Java) can still make your code unreadable, and I have been accused of it often.

So why not make GoTos the fundamental primitive (one of my "six")? The biggest reason, as I said, is that they are not necessary and having them there in the language for the programmers to use leads them to make far more mistakes than ever they can hope to find. GoTos are not necessary, because you can write a completely GoTo-less program using only conditionals, iteration, and subroutines -- not even exceptions and break -- but it gets very messy and hard to read. We want our abstractions to help us understand what the program is all about, not make it harder. Programs are the biggest, most complicated things in the known universe that must be absolutely -- well, maybe 99.999% -- error-free to be useful, so anything we can do to make that job easier, is A Good Thing.

You can replace iteration with GoTos, but not conditionals. The problem is that if you do that (if you go back to the 1940s and 50s, or assembly language even today), you have made your program much harder to read. Among my "six things," sequence and conditionals move the execution path forward, but iteration moves it backward. So if you look at a program and see the keywords for iteration, you know that the control flow goes backward there; a subroutine jumps off to some distant place but always comes back to the next command; and the others all go forward. A GoTo, on the other hand can go anywhere; you cannot tell with a glance what's going to happen next. Sometimes you cannot even tell after hours of studying it.
 

Why Not Objects?

Objects are not one of my Six (I guess they would make it 7 if so), yet many educators consider them to be fundamental to the nature of programming. I deny that, for two important reasons.

Most important is the fact that the Real World is not object-oriented. Some things are easily understood in that model, many more are not. Numbers are not objects, they are ways to count objects (and other things that are also not objects). You can see that in the fact that numbers in Java are not objects. The Java Math class, most of its methods are "static" -- which is the lame excuse we give to methods that might logically belong in a class because they are related in subject matter, but there is no reasonable way for them to operate on objects, and the language recognizes that fact.

Many things in our experience are not intrinsically object-oriented, but if we try hard enough we can fit them into the OOPS Procrustean bed. Java Strings are that kind of thing. In the wild, character strings -- also known as text -- are like numbers: they are descriptors of real things, both objects and other things that are not objects. Many of the operations we want to perform on character strings are binary, that is they take two operands that are equal in rank, and return a third string that is not a modification of one of the originals, just like numbers. In fact languages like C and Java use the "+" operator for string concatenation, just like numbers. I personally consider that a mistake, because concatenation is a fundamentally different operation from addition, leading to all sorts of hard-to-find errors, but there it is, still evidence that the two operations share their binary-ness in common.

Second, object-oriented thinking is not intrinsic to programming the way sequence, iteration, conditionals, variables and I/O are. Notice that I left subroutines out as being intrinsic, because they are not. Subroutines are an important convenience that enables us to write bigger programs than we can without them, but a UTM can be written without subroutines. That is also true of the encapsulation that OOPS does (as also the other languages before OOPS that permited separate compilation units), because subroutines and objects are fundamentally the same thing in that regard. OOPS is not a different kind of thing to learn than any of the other Six, the way each of the Six is different from its brothers, but rather it is fundamentally the same concept as subroutines, only bigger.

We have three necessary control-flow kinds of things: sequence, iteration, and conditionals. These are about arranging the steps (I called them "commands") into a "structure" to operate on data. Data are the numbers that the operational steps operate on. The three structural things plus the data that they operate on together form a program. As a programming convenience, we can take some of the data and some of the structured programming steps and put a named box around them, which we call a subroutine. It is very like a whole program, complete with I/O (that would be the parameters and the return value, and also any access that subroutine makes of outside variables), except that a program can have many subroutines that talk to each other. Some programming languages even allowed subroutines to contain other subroutines.

Objects are basically the same thing, putting a named box around some data and some subroutines and maybe even some structured programming steps. Objects can (but do not necessarily) contribute to stronger data types, which enables the compiler to warn programmers when they make certain kinds of programming errors. I consider a strongly typed language so important that I refuse to program in weakly typed languages like C.

In their essence, objects are "just like" subroutines, so they do not merit a separate category in my enumeration of important programming concepts. However, OOPS is an inescapable part of Java as defined, and Java is the most strongly typed programming language in wide use today, so we teach objects in their place, which is when we get to larger programs where they become noticibly useful.
 

Why Java?

If subroutines are good, then collections of subroutines are better. The technical word used for these collections is abstraction, but it's slightly misleading. An abstraction is the process of finding a common feature of a bunch of different things, and giving that common feature a name so it refers to any of those things and not any particular one of them. It's a way of thinking about your program as "This part calculates the results" and "This part draws it on the screen" and "this part does this other thing..." and so on. My "six things" are abstractions, so that you don't need to think about the details of how the hardware does iterations (with GoTos!) or how a subroutine does what it promises to do, it's just a subroutine with this particular name, and so on.

The proponents of OOPS try to tell you that Objects (one of the "O"s in "OOPS") is the only or best way to get abstraction, but they forget that we had abstraction in programming languages long before Objects escaped from SmallTalk. The real benefit that the Objects in Java gives you that you didn't have before is that these Objects are inextricably part of the (static) strong-type system.

This is important. Some people try to insist that dynamically typed languages still can be (or are) "strongly typed" but they ignore the essential nature of strong types, which is to give the compiler advice about your intentions concerning this data, so it can warn you if you deviate from your declared intentions. So-called dynamically typed data is tagged at runtime, then the system decides what to do with it (at runtime) and you get what you get, which is probably not what you wanted, but it's too late to fix it. If you want that kind of flexibility (and confusion and danger) you can do it in a truly strongly typed language like Java, but you must do so intentionally in your own code.

Make no mistake, strong data types are a hassle to program -- at least for beginners, although after a while it becomes second nature -- but that is far less hassle than trying to figure out after your program fails, why it failed. MOST OF YOUR TIME as a programmer is finding mistakes in your code after it has successfully compiled; you want to shift that burden to the compiler as much as possible, so that it is easily fixed before you try it out on real data. Strong types (as in Java) do that for you.

I have written large programs in assembly language (completely untyped), Modula2 (which was more strongly typed than Java, but now extinct) and HyperTalk (which was dynamically typed = weakly typed, and also now extinct) and finally now in Java. The same client who bought my assembly language work budgeted a whole year for the next project, which I finished in three months (including debugging my own compiler) because it was in Modula2. Then I moved on to HyperTalk, and was thoroughly frustrated by how much longer things were taking than I was used to. After Apple killed HyperCard and Microsoft killed VB6, I moved to Java because it was not a single-vendor product, and my productivity shot back up. I was using my own compilers and development tools in all four cases; the only difference was how strongly the language was typed.

Before Modula2 died, I was the official USA delegate to the International Standards Organization Working Group standardizing the language. The New Zealand delegation reported frequently on the testing of their compiler, which tracked the evolving standard. Their compiler compiled M2 to C, then used the university C compiler to produce runtime code. They ran a study comparing students programming in C and M2, each using their own preferred language of choice, and the C programmers made (I think it was) six times more errors than the M2 programmers, some of those errors not even possible in Modula2. The most remarkable thing was that the compiled M2 programs were smaller and ran faster than the same programs written in C, even though the final code was generated by the same compiler! It seems that the programming language C was designed for the PDP-11, a minicomputer too small to run an optimizing compiler, and also too small to not need optimized code, so the language intentionally exposed a lot of the low-level hardware to the programmer, who was thus encouraged to make manual optimizations that only worked on the PDP-11. Compilers for larger and faster computers needed to undo all those quirky machine hacks before they could generate good code for computers whose internal architecture was very different from the PDP-11, but only if they could figure out what the programmer's intentions were, which isn't very often because people are far more clever than computers. Without all those error-prone low-level hacks, the M2 code could be optimized much more efficiently.

The bottom line: write your program in as high-level a programming language as you can find, and let the compiler generate the best code for what you told the computer is your intent. Java is the best we have for that today. Except when you write a video game -- for example, for the third segment of this course -- much of the graphics can more easily be expressed in very-high-level game engine semantics, so if such an engine is available, use it. It will go much faster. Java has libraries for numerous common programming tasks, and using those libraries effectively makes a very-high-level programming language out of the successive library calls, just like the Chomp commands are a higher-level programming language than the Java code behind those commands. Use the best you can get your hands on, you will make fewer errors and reach your target much faster.

Whatever your programming language, as you design your program, you will think of abstractions that you can write subroutines (and Objects) for: do it, then call on the abstractions and stop thinking about the details. You won't think of all possible abstractions up front, but as you are writing code, you may find yourself writing the same (or similar) code over and over. Convert it into a subroutine and call it where needed. You will be a better programmer. Those abstractions will be your very own very-high-level programming language. It will save you time and grief.

Tom Pittman
Revised 2020 July 6+