Note: This page is my personal journey to discover Forth and put it in the context of computing history. It is adapted from my slides for a short talk. I've done everything in my power to make this page scale up and down for various screen sizes. I welcome suggestions and corrections for both the content and display of this page. Here's my contact page .
(The "Easter egg" in this drawing is alt.religion.kibology, which should get a chuckle from old timers. The rest of you can look it up.)
There were programming resources on the Web, but nothing like what's available now! To actually learn to program, I bought books, and still do.
The comp.* (wikipedia.org) groups and particularly comp.lang.* were great places to learn about and discuss programming. By the time I got there in the late 90s, Perl was a pretty hot topic, especially as it took a dominant role in the early Web as the dynamic page and form processing programming language via CGI (wikipedia.org).
In the 1990s, Usenet newsgroups (wikipedia.org) were where it was at. For example, Linus Torvalds's initial announcement of Linux was to comp.os.minix in 1991.
When I was a wee programmer, I would sit around the virtual Usenet campfires listening to the tall tales and legends of the elders.
The only time I understood this was when I completed the book The Little Schemer by Friedman and Felliesen, which walks you through creating it for yourself. It is a magical book and I implore you to try it.
Sharp-eyed Lisp-lovers and other mutants will perhaps recognize this thing as the Y combinator expressed with lambdas.
For a while, Royal was owned by the Italian typewriter company, Olivetti, who also made some really interesting computers (wikipedia.org).
Mel was real and the Royal McBee RPC-4000 was real. Look at that teletype (aka "teleprinter"). If typewriters and "Royal" together make a little bell in your head go "bing" as your mental carriage hits the end of the page, then you're right: Royal McBee was a merger between the Royal Typewriter Company (wikipedia.org) and McBee, a manufacturer of accounting machines.
I heard tell of a programming language so flexible that you could change the values of integers.
They said that language was called Forth and it was created by a mad wizard called Chuck Moore who could write any program in a couple screens of code.
But I never forgot about the legend of Forth.
Years went by and I wrote a lot of PHP and JavaScript. I watched the Web evolve (and sometimes de-evolve).
He made Forth a recurring theme and it just sounded so darned interesting.
The blog series "programming in the twenty-first century" (prog21.dadgum.com) by game developer James Hague gave me the final push.
So I went on an adventure and now that I have returned, I think I have some answers.
(Oh, and I confirmed the legend . I can make any integer equal anything I want. Stick around 'til the end to see that Forth magic trick.)
RPN is considered to be highly efficient and, being somewhat inscrutable to outsiders, highly geeky.
HP calculators are famous for using RPN syntax. If it weren't for these calculators, I suspect it's likely that RPN syntax would be virtually unknown outside of computer science.
Putting all of that power into a "shirt pocket" calculator was an astounding accomplishment at the time. Legend has it that the size of the HP-35 was based on the dimensions of Bill Hewlett's actual shirt pocket. HP-35 calculators have been in space. They killed off the slide rule.
By the way, the HP-35 calculator (wikipedia.org) pictured here is really interesting. In the early 1970s, HP had powerful desktop calculators. Actually, what they had were really programmable computers, but they still called them calculators (wikipedia.org) for sales reasons. But these were big desktop machines that ran off of wall current.
In fact, as you'll see, my quest is mostly a series of incorrect assumptions I made by looking at the language without the context of history.
RPN notation is one of the most visually obvious aspects of the Forth programming language. But it turns out, RPN is not what Forth is about or the reason Forth exists. As we'll see, the situation is reversed.
Perhaps you've seen postfix or Reverse Polish Notation (RPN) (wikipedia.org) before? The principle is simple: Instead of the usual "infix" notation which puts operators between operands ( 3 + 4 ), RPN puts operators after the operands ( 3 4 + ).
At first, I thought this was what Forth was all about:
So this relates to Forth how?
As it is known about "ed, the standard text editor" (gnu.org), dc doesn't waste your VALUABLE time (or teletype paper) with output you don't need!
In this example, we input 3, then 4. * multiplies them. Now we have the result (12) available. But first, we input 5 and 6 and multiply them with another * to also store that result (30). The final + adds both stored results (12 + 30) and stores that result (42). Unlike an HP calculator, dc doesn't show us any of the stored results, including the last one until we "print" it with the p command.
Anyway, the point here is that RPN syntax lets you express nested expressions without requiring parenthesis to get the order of operations the way you want them. This is one of the reasons RPN fans (including those HP calculator fans I alluded to) are so enamoured with it.
Besides using RPN syntax, the dc calculator (wikipedia.org) is completely programmable. Oh and it also happens to be one of the very first Unix programs and pre-dates the C programming language!
I'm being cheeky here. Users of bc , are hardly noobs. But it is arguably even geekier to use the much older dc program. bc was once just an infix expression translator for dc to make it more palatable for people who didn't want to use RPN. Thus the gentle teasing.
But does that mean we know what Forth is all about now? If we know how to enter things in postfix notation, we "get" Forth? No! Not even close...
So Forth is like an RPN calculator? We input values and then operate on them? Well, that statement is not wrong
Sharp-eyed readers will note that we print the result with a "." command rather than "p". But that's the only difference.
As you can see, someone sitting at a Forth interpreter can perform this calculation exactly the same as with the dc calculator (or an HP calculator).
Historical note 2: When a computer asks, "SHALL WE PLAY A GAME?" in all caps, you must answer NO, as we learned in 1983's WarGames (wikipedia.org)
Historical note 1: In the old days, people and computers just WENT ABOUT SHOUTING AT EACH OTHER ALL THE TIME IN ALL CAPS BECAUSE LOWERCASE LETTERS WERE TOO EXPENSIVE.
As you may have guessed, these four stack words (PUSH, POP, SWAP, DUP) also happen to be Forth words.
Above, we have an illustration of PUSH and two other common stack operations:
A stack is a data structure often explained with a "stack of plates" analogy. You PUSH a plate on the stack and you POP a plate off the stack. The first item you put on the stack is the last item out of the stack.
The use of a data stack is probably the second most visible thing about the Forth programming language.
Now let's see something you probably wouldn't find on an HP calculator. Something non-numerical...
The '.' (DOT) operator is different since it only takes one value (to print it) and does not put anything back on the stack. As far as the stack is concerned, it is equivalent to DROP. As far as humans are concerned, it has the useful side-effect of letting us see the number.
As you can see, entering a number puts it on the stack. The math operators take two values from the stack, do something with them, and put a new value back on the stack.
Rather than being concerned with the syntax or notation, we're now interested in what these operations are doing with our data stack.
Let's revisit our math problem from earlier. This is the Forth code on the left and the results on "the stack" on the right.
But sometimes not naming things is even better.
It's up to the Forth developer to make harmonious word choices. It can get far more clever or poetic than my example.
There can, indeed, be some object named CAKE that we have placed on the stack (probably a memory reference) which can be DUPed, and then HAVEd and EATen.
Actually, this isn't just a silly example. Forth's use of the stack can lead to a natural, if somewhat backward use of nouns and verbs. (Kind of like Yoda's speech habits. "Cake you will dup, yes? Have it and eat it you will, hmmm?")
Getting the joke here will require knowing this English idiom (wikipedia.org).
This is valid Forth, assuming CAKE, HAVE, and EAT have been defined:
Have we "cracked" Forth yet? Now we know two things: it uses RPN syntax and it is stack-based.
Great, so stacks can be a very elegant way to handle expressions.
Working without names (also known as implicit or tacit or point-free programming) is sometimes a more natural and less irritating way to compute. Getting rid of names can also lead to much more concise code. And less code is good code.
(It's almost as bad as file names used as a versioning system: my_doc_new_v5.4(copy)-final2 ...)
Does your programming language make you do this?
A lot of imperative languages are like that factory. As your values go down the line, you've got to come up with nonsense names like result2 , or matched_part3 .
Imagine, if you will, a factory assembly line in which each person working the line is a hateful fussbudget who refuses to work on the part in front of them until you name it. And each time the part has been worked on it must be given a new name. Furthermore, they refuse to let you re-use a name you've already used.
But it's funny how our programming languages often require us to explicitly name intermediate results so that we can refer to them. On paper, we would never give these values names - we would just happily start working on the list.
(Hopefully your answer is "no" or the rhetorical question doesn't work.)
Do you feel a need to give a name to each sum pair...or even the running total?
If I ask you to add these numbers:
The stack frees us from being forced to create explicit names for intermediate values.
Well, Forth certainly does use a stack. It is definitely a stack-based language.
Let's see if I can stumble through it...
I understand the explanations on these websites now, but it took me a while to get there. Your journey may be shorter or longer. Probably shorter.
For the term "concatenative programming" itself, the Factor programming language website has an excellent page defining the term: Factor documentation: Concatenative Languages (factorcode.org). And, of course, there's the Wikipedia entry, Concatenative programming language (wikipedia.org).
An awesome resource for all things concatenative is The Concatenative Language Wiki (concatenative.org). It lists many concatenative languages and has a page about Forth, of course.
On this journey of Forth discovery, you'll inevitably run into the term "concatenative programming".
Ah, this must be it because it sounds fancy.
But that's just the surface. It turns out this "concatenative language" concept goes way past that...
No names (unless we want them), just nouns and verbs.
The point here is that concatenative style has us "composing" functions (which you can think of as verbs) simply by putting them in sequence. Each function will be called in that sequence. The values that are produced at each step are passed along to be consumed as needed.
(Of course, if you're a programmer used to years of something like C or Java or JavaScript, the inside-out parenthetical form will probably seem pretty natural too. Well, guess what? Your mind has been warped. It's okay, mine has too.)
Unlike the math examples, where the infix notation looks more natural to most of us than the postfix notation, the concatenative example of this baking program looks more natural (at least in a human language sense) than the inside-out function application example, right?
An applicative language has you apply a function to a value, which returns another value. Using familiar Algol-like (or "C-like", or "Java-like", or "JavaScript-like") syntax, arguments are passed to functions within a pair of parenthesis. In the above example, the parenthesis end up deeply nested as we pass the output of one function to another.
"Concatenative programming" came after. In fact, Joy is where the "concatenative" description comes from! (von Thun specifically credits Billy Tanksley for creating the term "concatenative notation".)
Perhaps you've heard of "functional programming?" As you can see, that term was being used in 1977.
I know just enough to believe I understand this paragraph from the paper's abstract:
Can Programming Be Liberated from the von Neumann Style? (PDF) (worrydream.com) This paper is dense with notation and I haven't personally attempted to wade through it, yet. I'm sure it contains many profound ideas.
As you can see, combinators are crucial in Joy. Let's take a moment to dive into those, because this is a pretty fascinating avenue of computer science...
From A Conversation with Manfred von Thun (nsl.com), which is a really great read in its entirety.
I can't describe Joy's genesis better than the man himself. Here is von Thun in an interview about Joy:
A program is simply a list of functions that is read from left to right.
Specifically, all functions take one stack as input and return one stack as output. The stack is not named, it is implied.
...and all functions are unary (or an "arity of 1").
But we've barely touched the conceptual power of combinators with our map examples. Let's go a little deeper on this fascinating subject:
Note: I had a bit of a time actually running Joy to test out these examples. Thankfully, I eventually ran into Joypy (github.com), a Joy written in Python. My Linux distro comes with Python installed, so the whole process for me was:
If you have the slightest interest in Joy , I highly recommend reading or skimming this delightful tutorial by Manfred von Thun himself: An informal tutorial on Joy (hypercubed.github.io).
There seems to be a happy medium between named and unnamed. Also, the point-free style seems to benefit greatly from short (even very short ) definitions to avoid mental juggling and greater composibility.
This "point-free" style can be a blessing... or curse. Unlike computers, human brains have a hard time juggling too many things on the stack.
(It's kind of confusing to talk about "first" and "second," though because that's the opposite order in which we supply those arguments on the stack...)
As in the JavaScript examples, map takes two parameters. The first is the function (or "program" in Joy) to apply, and the second is the list to apply it to.
In this case, the first quotation is the number list, [1 2 3 4] .
The syntax here may require a little explanation. The square brackets ( [] ) are Joy's quote mechanism. Quotations are a lot like lists, but they can contain programs as well as data.
Notice how both JavaScript examples have variables such as the parameter n and the result bigger . This is an example of what I mentioned a moment ago when discussing the advantages of stacks: "Traditional" programming languages often make us name values before we can work with them.
The output of map() is a new list containing the result of each application.
Perhaps map() is familiar to you. But if not, just know that it takes two parameters like so: map(array, function) . The first parameter is implicit in these JavaScript examples, but it's there. The array object, [1, 2, 3, 4] calls its own map() method. The second parameter is a function (named inc in the first example and left anonymous in the second), which will be applied to every member of the list.
In the first example, we have familiar Algol-like syntax with functions that take arguments in parenthesis.
(The second example with the arrow function syntax works exactly the same way, but more compactly. I included it to make the comparison with Joy a little more even-handed. Feel free to pick a favorite and ignore the other one.)
map is one of the more common examples, so I'll use it as an example.
You can even have functions that take functions that take functions and so on to do powerful things. But you'll need to meditate on them every time you have to re-read that part of your code.
"Higher-order" just means functions that take other functions as input and do things with them.
What do the building blocks of combinatory logic look like?
Anyway, when we use combinators, this particular flavor of universal computation is called combinatory logic (wikipedia.org).
Electronically speaking, this is the same principle that allows a NAND gate to simulate all other gates. NAND gates are a fundamental computational building block. You can make an entire computer with nothing but NAND gates and that computer can (slowly) solve any computable problem you can imagine.
It turns out that computation is a fundamental feature of the Universe . As far as we can tell, any universal system of computation is equally capable of solving any computational problem. And once you realize how little is required, you can invent a universal computer yourself!
So combinators share something in common with lambda calculus and Turing machines. These systems provide all of the building blocks you need to perform any possible computation in the sense of the Church-Turing thesis (wikipedia.org) or "computability thesis". (We've also discovered some problems that are not computable and no system can compute them like "the halting problem," but these are pretty rare.)
(Ben Lynn's whole website is full of neat stuff like this. If you're looking to entertain yourself for any amount of time from an afternoon to the rest your life, Lynn has you covered.)
It's kind of hard to imagine at first. But you can see it happen right before your very eyes. The mind-blowing tool on this page by Ben Lynn: Combinatory Logic (stanford.edu) takes a term expressed in lambda calculus and replaces everything with just two combinators, K and S. (We'll talk more about those two in just a moment because they are super special.)
Yeah, you can re-work any expression into a combinatorial expression and completely replace everything, including the variables, with combinators.
Remember, combinators are just "higher-order" functions that take functions as input. Well, it turns out these are all you need to perform any computation. They can replace logical operators and even variables.
It would be impossible to write a complete catalog of combinators just as it would be impossible to write a complete catalog of integers. They're both infinite lists. Nevertheless, some well-known combinators have been identified as having special properties. In the book above, many of these have been given the names of birds.
Here's something from my bookshelf. It's To Mock a Mockingbird by mathematician and puzzle-maker Raymond Smullyan. It uses puzzles involving birds to solve logic problems and classify some well-known combinators.
We say "object x" rather than "value x" because, being a combinator, I could take a function as input as well as a value. In fact, "object" is intentionally very abstract, so x could contain a scalar value, or list, or function, or another combinator, or anything. Whatever that object is, I returns it.
The way to read "(I x) = x" is: " I applied to some object x results in... x ."
The simplest of all combinators is I, the "identity combinator". There are a ton of different ways to write this. In lambda calculus, it looks like this: I = λx .
That's right, the S and K definitions above are a complete system for universal computation.
Actually, it's even crazier than that. You don't even need I because that, too, can be created with S and K .
But the real point is this: S and K are special for one very interesting reason. Together with I , they form the "SKI calculus" and just these three combinators are all you need to perform any computation in the known universe.
(By the way, the y combinator above should not be confused with the Y combinator. Do you remember that arcane lambda calculus artifact projected over that head with the third eye way up near the beginning of this page? That thing was the Y combinator! It turns out, it's all, like, connected, you know?)
How about S ? I'll leave implementing that in JavaScript as an exercise for the reader. It's clearly much more complicated since it has three levels of "function that yields a combinator" on the left and the result is an equally complicated combinator that first applies parameter z to combinator y .
See how the result of K("hello") is a function that returns "hello" no matter what you give it as input?
K is super easy to write in a language like JavaScript, which is also a nice choice because you can play with it right in the browser console like I just did:
The result is that K makes a combinator that throws away any input and just returns x . Weird, right? But it turns out to be useful.
(Programmers familiar with the concept of currying will see that this is like the partial application of a function, where a new function is "pre-baked" with the argument x . The term "currying" is named in honor of mathematician Haskell Curry (wikipedia.org), after whom the Haskell programming language is also named.)
The way to read " (K x y) = x " is: " K applied to x yields a combinator , which, when applied to y always evaluates to x ."
Since it's the simpler of the two, let's use the K combinator as an example:
Both of these take more than one parameter of input. But if you're used to Algol-like function syntax, the way this works may be surprising.
But are they worth using in "real" programs? In limited doses, absolutely!
Okay, we get it, combinators are a crazy way to compute.
Figuring out the output of the S combinator once was enough to keep me occupied for a while. It boggles my mind to imagine feeding it another S as input on paper, let alone discovering these particular combinators in the first place.
Combinators is also Wolfram's ode to the discoverer of combinatory logic, Moses Schönfinkel (wikipedia.org) who, like so many of the giants in the field of computer science, did his work on paper decades before the first digital electronic computers beeped their first boops.
I think it is wild and fun to see someone play with a subject like Wolfram does in this book. Each page is saying, "Look at what is possible!"
Wolfram demonstrates combinators that keep producing different output for a gazillion iterations and then get stuck in a loop. Some of them produce regular patterns for a while and then start producing different patterns. Some just loop forever at the outset. As in other universal systems, there is no end to the complexity produced by these two simple constructs. It is infinite. And all of this is just S and K combinators taking combinators as input and returning combinators as output.
It starts with a (much too) terse introduction to the SKI combinator calculus and then launches into page after page of visualizations of S and K combinators being fed into each other. Like fractals or automata, simple inputs can produce patterns of surprising sophistication.
The book shown here is another from my bookshelf. It's Combinators: A Centennial View by Stephen Wolfram.
is the same as this much longer statement:
Mind you, it's very easy to go overboard with this stuff and write something far less readable than some simple procedural code. (Gee, ask me how I know this.)
Once I started discovering how combinators and curried functions could eliminate big old chunks of code, I was hooked! The old, dreary procedural code became a new fun puzzle!
My personal history with exploring higher order functions in a production setting is through the Ramda (ramdajs.com) JavaScript library, which I discovered from the talk Hey Underscore, You're Doing It Wrong! (youtube.com) by Brian Lonsdorf, which is fantastic.
I think map() is a great example of the power of combinators to clean up a program with abstraction. Once you start using simple combinators like this to abstract away the boilerplate logic of yet another loop over a list of items, it's hard to go back.
Both of those pieces of JavaScript give us the result of applying the function bar() to an array foo .
First of all, Forth does have facilities for dealing with combinators:
Okay, so we've gone pretty deep into this concatenative programming and combinator thing. How does this actually relate to Forth?
It's debatable which of these two are more readable because the measure of readability is in the eye of the beholder. But I think you can imagine getting good at reading the Joy example.
Note that the Joy example is not just shorter and has no variable names but it has abstracted away the mechanics of recursion. All we're left with is the logic specific to the factorial problem itself.
As shown in the Joy factorial definition above, linrec is a "linear recursion" combinator. It takes takes 4 parameters, each of which is a quoted program. null is a predicate which tests for zero. dup is the same as in Forth. pred is an operator which yields a number's predecessor (given 4, yields 3). " * " multiplies two numbers, just like you'd expect. Given these pieces, perhaps you can take a guess at how linrec works?
Computing a factorial requires a cumulative result. Without recursion, you need an explicit variable to hold the intermediate result as you loop through the numbers.
Computing the factorial of a number is often used as an example of recursion. The final answer is the input number multiplied by the previous number multiplied by the previous number multiplied by... the rest of the numbers all the way down to 1.
Even different forms of recursion can be completely handled for you by combinators in Joy thanks to the uniformly unary functions.
Joy uses combinators to "factor out" all sorts of logic.
For some compact higher-order function definitions in Forth, check out this Gist by Adolfo Perez Alvarez (github.com).
Anyway, now we're armed with Forth's combinatorial ability: Treating functions ("words") as values so other functions can take them as input. This allows us to define combinators in Forth.
Perhaps it would be helpful to see that this silly statement:
EXECUTE gets an address from the stack and runs whatever word is found at that address.
" @ " is a standard Forth word that loads the value from the given address and puts that value on the stack.
Remember, the variable hello-token leaves its address on the stack when it is called.
So the code above simply reads, "Store the address of hello in the variable hello-token ."
This part will look super cryptic if you're new to Forth:
This creates a new variable called hello-token which will store the "execution token" for the hello word.
(We'll learn how compiling words actually works later. For now, please just gracefully accept what you're seeing.)
This is Forth for, "Compile a word called hello that prints the string Hello."
First, let's see how EXECUTE works. The syntax will be alien to non-Forth programmers, but the concept will be no problem for anyone used to using first class functions.
With this, you can very compactly define combinatorial words such as MAP , FOLD , and REDUCE .
This will run the word returned by the word FOO :
Let's look at this from a historical perspective. First, the notions of postfix syntax (RPN) and a data stack for the basis of the language:
The point I'm making is that Forth may accomodate the abstract point of view, if the developer chooses to take it. But Forth is not based on abstract concatenative computing principles or combinatory logic.
To Joy, it may be the same mechanical process under the hood, but the language itself sees these tokens more like a mathematical expression. It's a much more abstract outlook.
Forth's only concern (as a language) is to process these three tokens and act upon them according to some simple rules. (If the token is in the dictionary, execute it. If it's a number, put it on the stack.)
While both languages share a cosmetically similar syntax, and both produce the same result for this expression, there is a fundamental difference between how the two languages "think" about the expression because they arrived at this place in completely different ways.
Joy: "The composition of the functions 2, 3, and + is identical to the function 5 ."
Forth: "Push 2 and then 3 on the stack; add them; push result 5 on the stack."
Nevertheless, I do not believe studying "concatenative programming" in general or Joy specifically is a good way to understand the history and genesis of Forth!
So yes, Forth is concatenative. It implicitly passes values from one function invocation to the next. And it supports higher-order functions.
Anyway, so the simple mechanics of RPN and stack-based operation are very natural for digital computing machines and their use goes back to the very beginning.
I got most of this information from this excellent paper by Raul Rojas: Konrad Zuse's Legacy: The Architecture of the Z1 and Z3 (PDF) (ed-thelen.org).
Fun fact: The control unit used special control wheels to encode microsequences. If the microsequence wasn't programmed correctly, it could short-circuit the machine and destroy the hardware!
It is also a stack machine. Again, this is with a mere two registers, which get juggled in a particular sequence as you load and store values.
As mentioned above, it can be said to use RPN, though there are only two registers and nine instructions. Opcodes were encoded in eight bits. The computer is programmable via punched paper tape (you can see the tape device to the right of the control console, though it's a bit of a scribble in my drawing).
The Z3 could do addition in less than a second and multiplication in three seconds. It had 64 words of 22 bits each and worked with the equivalent of modern floating-point numbers.
(By the way, a certain amount of electro-mechanical logic is still used in modern nuclear reactor safety systems because the big mechanical components are not as vulnerable to nuclear radiation as semiconductors!)
The drawing of the computer labeled "Z3" on the right is of the Z3 computer (wikipedia.org) designed by engineer and computer scientist Konrad Zuse. This is widely considered to be the first programmable digital computer ! It used electro-mechanical relays like the telegraph networks of the day.
So I think it's reasonable to assume that RPN syntax and use of stacks are a historically accurate way to examine Forth's "origin story."
Stacks were known and used in the time of Forth's origins, though they were generally limited to 2-4 items in registers.
Postfix notation was definitely in the air when Chuck Moore created Forth.
As we'll soon see, Forth really is about the "nuts and bolts". You bring your own theories with you.
So I must conclude that understanding concatenative programming is super cool, but it doesn't actually help us understand the true nature of Forth because it doesn't describe how Forth came to be. It is not part of Forth's "origin story."
Concatenative programming, with its emphasis on combinators (and immutable data structures, which we haven't talked about), doesn't have the same historic grounding for Forth the way that RPN syntax and stack-based programming do.
It's important to remember that "coding", the actual act of turning an abstract program into machine code, was long ago considered to be a mere secretarial skill, not far removed from typing and other forms of data entry. This is why some people (including myself) refer themselves as "programmers" rather than "coders".
Until then, programming was "close to the metal." Even the idea of "structured programming" with programming language concepts like if/else or while/for loops was once considered novel! Until then, everything was done with address jumps or GOTO .
While the ideas of combinators and other types of universal computation were well known in certain mathematical and computational circles, I would argue they were not very amenable to existing computer hardware until much later when computers became fast enough to support "functional programming" styles and abstractions.
But Joy and the term " concatenative programming " come from the 1980s.
I think the answers to the "why" questions are best answered by looking at when .
" Why does Forth work this way?"
" Why does Forth have this syntax?"
There's nothing wrong with thinking about Forth in these terms, but it doesn't answer the "why" questions:
So while all these descriptions of the Forth language are true (RPN, stack-based, concatenative), they all describe the language Forth from the vantage of hindsight .
If this image doesn't make any sense to you, citizen of the future, it's from the iconic movie poster by Drew Struzan for Back to the Future (1985) (wikipedia.org).
It's difficult to imagine now, but changing parameters for a program, re-compiling it, and running it again could take a day (assuming you didn't make any mistakes).
It was very modern for the time, but...
There were switches for each register on the control console, but programs could be written to and read from paper punch cards.
The 704 was a fully programmable "modern" computer with magnetic-core memory, multiple registers, a 36-bit instruction set, and 36-bit words ("word" as in native memory size for the processor, not "word" as in Forth functions).
The computer in question is the IBM 704 (wikipedia.org) It was one of those room-filling vacuum-tube computers with tape drives the size of refrigerators.
(Note: Chuck mentions the Smithsonian Astrophysical Observatory (SAO) and the Massachusetts Institute of Technology (MIT) in roughly the same time period, and it's a bit difficult to be entirely sure which part is talking about which organization. But if you look at a map, SAO is at Harvard University. Harvard and MIT are about a mile apart in Cambridge, Massachusetts. It's basically a singular point if you zoom out a bit. So that helps explain the overlap.)
In Forth - The Early Years (PDF) (worrydream.com), Chuck Moore recites a fairly terse history of Forth, from the earliest pre-Forths to the creation of the language standard.
These "programming the program" statements in Moore's simple interpreter did not use keywords. They were statement numbers encoded on a punchcard.
It was also faster and more compact.
His free-form input format turned out, ironically, to be more reliable for human use than Fortran, which required formatted columns. (At the time, any mis-aligned columns in Fortran punchcard input would require a re-run of the program!)
Here's a quote from The Evolution of Forth (forth.com):
Moore made an interactive interpreter on a computer with nothing we would recognize today as an interactive terminal.
If it had stopped there, it would have been a clever trick and perhaps worthy of a footnote in history.
Already, Moore has exhibited the defining combination of traits shared by great programmers around the world: Inventive and allergic to tedium.
So, at last, we have discovered the true origin of the Forth language : Moore wrote a simple interpreter to reduce waste and tedium.
Free-form input was unusual at the time. It's obviously a super nice alternative to recompiling your calculation program every time you want to change some numbers!
According to Moore, the interpreter's statement numbers would have been roughly equivalent to these Forth words:
This is the origin of the system that would eventually be named Forth .
And what exactly did Chuck Moore do with that B5500 machine?
In fact, the Burroughs Large Systems engineers were transistor computer pioneers. And the B5000 series was a pioneering system.
The B5500 (or "B 5500" - the official manual puts a space between the B and the number) was a solid-state computer. It was part of the "second-generation" of computers (wikipedia.org). These computers had discrete transistors on circuit boards. By contrast, the first generation before them used vacuum tubes (like the aforementioned IBM 704) and the third generation after them used integrated circuits.
Now we head from Massachusetts to California where Moore found himself at Stanford University where he received his BA in Physics and started graduate school. He worked with Stanford's Burroughs B5500 .
Truly, now we have the beginnings of a fully-fledged programming language. It's not named Forth yet, but we're getting closer.
As Moore demonstrated with CURVE, a powerful, extensible interpreter is a huge time-saver (certainly when compared to re-compiling the program!) and allows the user of the program to add to the program's functionality on the fly. It's difficult to overstate how powerful this can be.
Aside: At this point, I also think it's interesting to compare Moore's budding interpreter language with another interpreter created specifically to be embedded in larger programs for controlling them: The Tcl programming language (wikipedia.org). 27 years after Moore started his work, John Ousterhout created Tcl out of frustration with ad-hoc, half-baked solutions in 1988 at Berkeley. The name comes from "Tool Command Language". But the comparison goes deeper than just the shared motivation. Tcl and Forth have similar levels of syntactical purity and flexibility. Everything in Tcl is a string! Both languages give the user the power to define fundamental parts of the system, such as new control structures, in the language itself. If this sounds interesting, you owe it to yourself to play with Tcl for a while. It is extremely clever and extremely capable. The main implementation has been well cared-for and can be found on most Unix-like systems, often installed by default.
This made the interpreter much more flexible and capable.
The CURVE program was even more "programmable" than his Fortran program at SAO. He took those ideas and expanded them to include the idea of a parameter stack and the ability to define new procedures.
Moore worked on the Stanford Linear Accelerator as a programmer. His focus was on steering the beam of the electron accelerator.
(As we'll see, symbols like "+" and "-" are words in Forth.)
It contained a much more sophisticated interpreter this time with a data stack and control flow operators.
As each new ability emerged, Forth became increasingly interactive.
But as time went on, computer interactivity increased. They became less like typewriters and more like the machines we use today.
The computers of this era and earlier were paper manipulators. They were kind of like really complicated typewriters. They displayed their output on paper, they were programmed with paper, and they kept long-term storage on paper!
Donald Murray didn't invent the concept of perforated paper tape for data storage, but his system used it for the encoding of transmitted messages from the keyboard. It doesn't seem like a stretch to trace the origins of this storage method to Murray's system.
The Teletype Model 33 (wikipedia.org) I drew above was one of the most popular teletypes used with computers. It was created by The Teletype Corporation in 1963, which means it shares a birth year with the ASCII standard! It remained popular until the mid-1970s when video terminals finally came down in price enough to push printer teletypes aside. In fact, Teletype Co. made the Model 33 until 1981, which is much later than I would have guessed!
Teletype machines started as point-to-point text communication tools (like the telegraph), but they were later used over switched networks like the world-wide Telex system which used pulse dialing to automatically route a connection through the network.
The existing Baudot code (from which we also get the term "baud") was modified by Murray into something that very much resembles what we still use today. Murray also introduced the concept of control characters, which still clearly retain their typewriter origins in the names: CR (carriage return) and LF (line feed).
In the late 1800s, the concept of a typewriter which operated over telegraph lines had been explored and existed in a variety of forms. But the transmission code, paper tape, and typewriter system devised by Donald Murray (oztypewriter.blogspot.com) is the one that won out. And it was arguably Murray's choice of QWERTY keyboard that cemented it as the standard around the world.
The Latin "tele-" prefix means "far" or "at a distance". These machines trace a direct lineage from telegraphs and Morse code.
First, let's talk about what "TTY" means in 1965. Teleprinters (wikipedia.org) or "teletypewriters" or just "teletype" were all printer devices. They printed to continuous sheets of paper fan-folded to fit into boxes.
"With the TTY came paper-tape and some of the most un-friendly software imaginable - hours of editing and punching and loading and assembling and printing and loading and testing and repeating."
Aside: If you want to see an awesome demonstration of interactive computer usage on paper, check out this demonstration by Bob Spence: APL demonstration 1975 (youtube.com). Bob Spence (wikipedia.org) is best known for his own contributions, including a number of early clever computer interaction ideas that are worth re-examining today. Bob's demo is extremely pleasant to watch and brilliantly presented in split screen. Notice how paper output lets you mark up stuff with a pen - pretty nice feature! And APL (wikipedia.org) is a whole other rabbit hole which has interesting intersections with the point-free and higher-order function programming we've encountered earlier.
If not pre-saging, Moore was certainly on the bleeding edge of interactive computer usage!
This would have been right around the time that the original LISP REPL (Read-eval-print loop) (wikipedia.org) was created in 1964 on a PDP-1.
( Fun fact: A "second generation" time-sharing operating system called Multics (multicians.org) was the spiritual ancestor of and name from which Brian Kernighan made the joke name Unix : "One of whatever Multics was many of".)
Specifically, the invention of timesharing (stanford.edu) was a huge shift away from the "batch processing" style of computing that had come before (like with input via punchcard).
In addition to the reduction in size, the other emerging change was direct interactive use of a computer via teletype.
In the mid-1960s, "mini-computers" came out. They were still huge by today's standards, but no longer required a large room of their own.
Moore's complete system is now kind of like an integrated development environment and kind of like an operating system.
Now you can edit the program within the program.
In addition to its funny new name, Forth had also gained new abilities:
It turns out the IBM 1130 was hugely influential to a bunch of early big-name programmers in addition to Moore. Something was in the air.
As you may have gathered by now, Chuck Moore is a pretty extraordinary computer programmer.
As mentioned, at this time, we're still interacting with the computer (mostly) via paper, but these minis brought the idea of interactive computing to "the masses" because they were so much smaller, cheaper, and more reliable than the sorts of computers that had come before.
But, wow, imagine having disk drive cartridges with 512 KB of storage at your disposal. What would you do with all that space?
As noted, the unit Chuck Moore worked on had a disk drive, which would have bumped up the price an additional $9,000. That would be the equivalent of buying an above-average house and adding a couple brand-new 1965 cars in the driveway.
And it was affordable! The base model was as low as $32,000. Compare that to $20,000, the median price for a house in the U.S. in 1965. Just think of that: If you could afford a house, you were well on your way to being able to afford a computer!
The IBM 1130 (wikipedia.org) is one of those new-fangled "minicomputers" we've talked about. Gosh, it was so small, the CPU weighed less than a car!
Yup, this really is the origin of the name, "Forth". Funny how temporary things tend to stick and last forever, isn't it?
With disks, now we can have file names!
(We'll get to the specifics of the syntax soon. That's another vital part of understanding Forth.)
A return stack makes this possible. Without a return stack, we have no way of telling the computer how to "get back" to the place in QUAD where we left off after DOUBLE is done.
A second word called QUAD uses the previous definition by calling DOUBLE twice, quadrupling the number in a rather amusing way.
In the example above, we've defined a word called DOUBLE which duplicates the number on the top of the stack and adds the two numbers together.
It's not just the name that makes this the first real Forth: A dictionary of named words which can be called interactively or recursively in the definitions of other words is one of the defining features of Forth. The ability to use words as building blocks is the Forth language's primary abstraction.
(This book will be mentioned again later.)
He also wrote a book (unpublished) at this time called Programming a Problem-Oriented Language. It's written in typical Moore fashion, without superfluous words or exposition. Feel free to contrast this with the article you're reading now.
When this project was abandoned by the employer, Moore was upset by the whole situation, particularly the way business software was increasing in complexity. This won't be the last time we see this theme crop up.
You don't have to read between the lines to see Moore's obvious distaste of COBOL (wikipedia.org), the COmmon Business-Oriented Language. What's impressive is that he managed to still use Forth while also using the required COBOL modules.
Anyway, the Univac was even more powerful and modern than Moore's previous computer and he took advantage of it.
If you're not familiar with the technique, you should know that wire-wrapped (wikipedia.org) connections are extremely high quality. Wire is wrapped with great force around a post, making a gas-tight connection that will not corrode (corrosion can occur outside the connection, of course). A little bit of the insulation gets wrapped in the last turns, which provides flexibility and strain relief. There are NASA guidelines for making a perfect wire-wrap connection.
As was also the trend at the time, the CPU was constructed of discrete cards connected together by a wire-wrapped backplane.
Anyway, the UNIVAC 1108 is an even more modern computer than the IBM 1130. Now we're moving into using integrated circuits for everything, including the register storage. (Speaking of registers, the 1108 had 128 of them and must have been interesting to program!)
That's a teletypewriter built into the desk of the console. I presume the tractor-feed paper would have spooled to and from containers behind the sleek facade.
You have to wonder: Did the sci-fi art of the time drive the design of these computers or did the computers and industrial design of the time inform the art? Or, more likely, did they both feed off of each other in the classic cycle of, "life imitates art imitates life?"
When these computers cost more than a house, it makes perfect sense that they were constructed into beautiful custom furniture that made them look like space ships.
First of all, the UNIVAC 1108 (wikipedia.org) is a great example of the awesome "retro-futuristic" design of these old machines. Just look at the sweeping angles in my drawing of the console. That's a cool computer console!
A new port of Forth written in assembler and could call COBOL modules because that's what the corporate suits wanted in 1970.
But on the Forth language front, there was another development...
As you can see, the work itself was extremely interesting and cutting-edge. But how Moore went about it was also very interesting, which a series of computer drawings will demonstrate in a moment.
It did advance the state-of-the-art in on-line data reduction. Astronomers used it to discover and map inter-stellar molecules just as that became hot research."
"There were two modes of observing, continuum and spectral-line. Spectral-line was the most fun, for I could display spectra as they were collected and fit line-shapes with least-squares."
NRAO had a policy of using Fortran on its minicomputers, but based on the success of his previous work, Moore was begrudgingly given permission to use Forth instead. I couldn't possibly do justice to summarizing it, so here's Chuck's own words describing the software he wrote for the NRAO (also from Forth - The Early Years ):
(Note that Moore stayed at the NRAO headquarters in Virginia and was not on-site at Kitt Peak.)
The 36ft scope was used for millimeter-wavelength molecular astronomy. This is the range above "microwaves" and these telescopes pretty much have to be constructed at dry, high altitude sites because water vapor in the air can interfere with the radio waves.
But the scope for which Moore wrote software was a single 36ft (11 meter) dish at Kitt Peak in Arizona called The 36-Foot Telescope . It was constructed in 1967 and continued working until it was replaced with a slightly larger and more accurate dish in 2013.
By using radio interferometry, arrays of telescopes can be treated as essentially one huge telescope with the diameter of the array (missing the sensitivity a dish of that size would have).
My drawing above is of the Very Large Array (wikipedia.org) in New Mexico. NRAO is also a partner in a huge international array in Chile.
The 300ft dish had been the world's largest radio telescope when it went active in 1962 at the NRAO site in West Virginia.
Here is the official website of the National Radio Astronomy Observatory (nrao.edu). But for a better summary, the Wikipedia entry (wikipedia.org) is the way to go. Be sure to scroll down to the incredible image and description from 1988 of the collapsed 300ft radio telescope:
The visible part of the spectrum is very small by comparison. It starts at 420 THz (terahertz) and ends at 720 THz. The familiar rainbow of colors captured in the mnemonics "Roy G. Biv" or "Richard of York Gave Battle in Vain" (ROYGBIV) lists colors in order of lowest frequency (Red) to highest (Violet).
(Speaking of Gigahertz, apparently Intel Core i9 processors can run at clock speeds up to 6 Ghz, but most CPU designs top out at around 4 Ghz. This may be important for Forth for reasons I explain later.)
Radio telescopes can work with everything from 1 kHz, which is just below the uses of "radio" as we think of it for navigation, communication, and entertainment, to 30 GHz, which is still well under the visible portion of the electromagnetic spectrum. Consumer microwave ovens operate at about 2.45 GHz.
Radio telescopes are like visual telescopes, but they collect lower frequency waves. Thanks to the magic of computers, we can process these signals to see what the radio telescopes see.
Aside: I believe Stallman was right. There's absolutely nothing wrong with writing programs for money or selling software. But using the law to prevent people from truly owning that software by limiting how or where to run it, or even preventing people from writing their own similar software, if they are capable, is an abominable practice and should be countered at every step.
It wasn't until 1980 that Stallman finally got fed up enough with the state of proprietary and legally encumbered software to start the "free-as-in-freedom" software revolution. Companies were increasingly using copyright to prevent modification, improvement, or duplication by the end user. Stallman, being a pretty incredible programmer, wrote free clones of such programs. He announced the GNU project (wikipedia.org) in 1983.
To put things in context, in the summer of 1970, Richard Stallman (wikipedia.org) was just out of high school and was writing his first programs in Fortran (which he hated) and then APL.
We take it for granted now that "free" or "open" software unencumbered by patents and restrictive corporate licenses is a good thing. But this was absolutely not a mainstream position in the early 1970s.
At this time, there are talks of patenting Forth.
As mentioned, Forth was ported to a bunch of different computers at NRAO.
It includes a high-level description of the system with examples of interactive Forth usage and a neat diagram on the first page, which you can see in the screenshot.
There's a great overview paper of the whole NRAO system by Moore and Rather in a 1973 Proceedings of the IEEE : The FORTH Program for Spectral Line Observing (PDF) (iae.nl).
Later, Elizabeth "Bess" Rather (wikipedia.org) became the co-founder of FORTH, Inc with Chuck and remained one of the leading experts and promoters of the Forth language until her retirement in 2006.
Rather went on to write the first Forth manual in 1972 and write papers about it for the NRAO and other astronomical organizations.
By the way, after her initial reaction of shock and horror, Elizabeth Rather embraced Forth. From The Evolution of Forth (forth.com):
While Moore was at NRAO, he also wrote software to point the telescope. Elizabeth Rather (Moore credits her as Bess Rather in his paper) was hired for support and they worked together on at least one port. The Forth system migrated across multiple machines at NRAO which, as we'll see, highlights one of the technological strengths of the standard Forth implementation.
I think that when you get to the heart of what Forth is all about, Moore's displeasure with the ANSI standardization suddenly makes tons of sense. In short, the whole point of Forth is to create your own toolkit. Having an all-inclusive language standard is great for making sure Forths are interchangeable. Unfortunately, it's also antithetical to adapting the language to your specific hardware and software needs.
Quote from thesection in this cool collection of Forth quotes (ultratechnology.com) by Jeff Fox.
"All of my fears of the standard and none of the advantages of the standard have come to pass. Any spirit of innovation has been thoroughly quelched. Underground Forths are still needed. I said I thought the standard should be a publication standard but they wanted an execution standard."
Oh, and it wasn't just the instruction set that was compatible. The 360 computers also had standardized peripheral interfaces, which were compatible between machines. There was a huge market for peripheral devices. IBM themselves made 54 different devices such as memory, printers, card readers, etc. The 360 also spawned a whole third-party peripheral industry, much like the IBM PC-compatible era that started in 1981 and continues to the desktop computer I'm typing on right now in 2023.
System/360 computers were a "big bet" (5 billion dollars according to IBM themselves: System 360: From Computers to Computer Systems (ibm.com)) that nearly destroyed the company. The bet clearly paid off because they made these machines from 1964 to 1978.
Until microcode came along, if you bought a "cheap" computer to get started and then upgraded to a more powerful computer, you would have to re-write your programs in a new instruction set. (If you happen to have written your programs in a high-level language like Fortran, you would still have to re-compile your programs from punchcards, and you would need the Fortran compilers on both computers to be perfectly compatible!) It's easy to see why being able to upgrade without changing your software would have been appealing.
The cheaper 360 computers used microcode while the more expensive and powerful machines had hard-wired logic. NASA even had some one-off models of IBM 360 made just for them.
The System/360 (or S/360) computers were extremely successful, largely because of availability, longevity, and compatibility. IBM claims to be the first company to use microcode (wikipedia.org) to provide a compatible instruction set across all S/360 computers despite the hardware differences between models.
Moore mentions first having ported his Forth system to the IBM 360/50 (wikipedia.org).
As usual, Moore was right on the very crest of computing with his ultra-flexible Forth system.
This real-time control and analysis was basically a luxury available on no other system at the time. Even Honeywell, the creator of these computers, had only been able to achieve the most primitive concurrency for them and it was nothing like this.
As is typical for a Chuck Moore endeavor, this telescope application pushed other new boundaries: The system actually ran across two computers (we're about to see the second one) and gave real-time access to multiple astronomers. Because it spread the load the way it did, there were no issues with concurrency, which is something we programmers struggle with to this day.
The implementation of Forth on the H316 is considered to be the first complete, stand-alone implementation because it was actually programmed on the computer itself and was used to create other Forths. It is at this point that Moore has achieved a fully ascendant system.
You can see a tiny scan of the original ad with a woman admiring her new Honeywell Kitchen Computer that barely fits in her kitchen here (wikipedia.org).
The ad for the Honeywell Kitchen Computer was in full "Mad Men" mode and was extremely patronizing, as was unfortunately typical for the time. But if you can look past that, the whole thing is quite funny:
Beyond just its appearance, this particular console has a really wild history. The extravagant gift company, Neiman Marcus, actually offered the Honeywell H316 with this pedestal as a "kitchen computer". It cost $10,000 and would have come with a two-week course to learn how to input recipes and balance a checkbook using toggle switches and lights to indicate binary data! (As far as anyone knows, none of these were actually sold.)
Just look at the space-age lines on that thing! It looks straight out of a Star Trek set. Sadly, there's basically no chance the one Moore actually worked on had this console. Less than 20 of them were sold. But thanks to my drawing, we can pretend.
You can see the Honeywell 316 (wikipedia.org) and the brochure (wikimedia.org) image from which I made my drawing.
I drew Chuck behind the system in this one because I couldn't bring myself to obscure an inch of that glorious pedestal console.
As you can see in the drawing, Chuck Moore began to grow in power as his system evolved and this manifested in additional arms ! Or maybe I started to get a little loopy while drawing old computers for these slides in the final evenings before I was due to give my talk? I'll let you decide what is real.
The DDP-116 was a 16-bit computer (the first available for purchase), but still part of that "second generation" of computers we've mentioned before, with individual transistors and components wire-wrapped together on huge circuit boards. (Check out the pictures on the DDP-116 link above for all sorts of excellent views of the insides and outsides of an example machine and its peripheral devices!) It happens to have also been a pretty rare computer. It didn't sell in vast quantities like the IBM systems.
(The DDP-116 was originally manufactured by Computer Control Company in 1965, but CCC was sold to Honeywell in 1966 and became its Computer Controls division.)
As mentioned above, the Forth system was also ported to the DDP-116 (t-larchive.org). and used with its "parent" system on the H316 featured above.
You'll also note that Chuck Moore has gained his fourth and final arm in my drawing above ("fourth," ha ha). This may or may not reflect actual events. Also, I'm not sure if Moore would have been using a video terminal at that time. It's possible. DEC's first video terminal was the VT05 (columbia.edu), which came out in 1970.
It was also the birthplace of the C (wikipedia.org) programming language. Dennis Ritchie ported Ken Thompson's B language to the PDP-11 to take advantage of its abilities. Unix was then re-written in C starting with Version 4. So the Unix we know today and a large portion of the command line utilities that are standard with a Unix-like system were programmed on the PDP-11. (And you can thank Richard Stallman's GNU project for freeing those for the masses. GNU stands for "GNU's Not Unix!")
It would be difficult to overstate the impact of this machine. Probably the most famous piece of software released on the PDP-11 was the first version of Unix (wikipedia.org) that actually bore the name "Unix".
It's hard to see in my drawing, but the PDP-11 front panel is one of the most iconic computer interfaces ever made. Hobbyists make working models, including ridiculously cute and awesome miniature versions. Here are two model versions - click on them to go to the original wikipedia.org files, where you can admire their full beauty:
The original machines were made starting in 1970 with wire-wrapped backplanes and discrete logic gates. Later models introduced "large-scale integration," which is a term we'll see later, so hold that question! These later versions of the PDP-11 were still being made twenty years later in 1990! There are apparently still PDP-11s performing crucial tasks today, with nuclear power plants being one of the most prominent examples.
All told, these conveniences made the PDP-11 fun to program! Assembly language programmers rejoiced. The ideas in the PDP-11 spread rapidly and are to be found in the most popular architectures in use today. Compared to what came before it, PDP-11 assembly language will look surprisingly familiar to modern assembly programmers.
It was a 16-bit machine and had an orthogonal instruction set (meaning the same instruction could be used in multiple ways depending on the operand. This makes the mnemonics of the instruction set smaller and more logical and much easier to memorize). This was even more powerful because I/O was memory-mapped, so the same instructions used to move values around in memory and registers could also be used to transfer data to and from devices.
The PDP-11 (wikipedia.org) was by some measures the most popular minicomputer ever.
All of this porting of Forth to new machines is possible because of indirect threaded code .
But before we address that, I'll try to briefly explain how threaded code is stored and executed.
"Hey, wait!" I hear you saying. "If Chuck hates complexity so much, why did he use such a complex method for Forth?"
And indirect threaded code is even more complicated (and harder to explain).
Yeah, there's no way around it, threaded code is complicated.
That's true. The big difference is that threaded code (wikipedia.org) in this sense doesn't actually contain the instructions to call the subroutines. It stores just the addresses. Therefore another routine is responsible for advancing a pointer over the address list and executing the subroutines.
"Threaded code" in this usage is not related to concurrency, i.e. "multi-threaded programming".
Anyway, this is direct and it's not threaded. Just jump to an address.
Alternatively, many processors have a more advanced call instruction. A call is more complicated because it has to do additional work behind the scenes. It must store a return address on "the stack" before jumping to the specified address. Then a ret (return) instruction at the end of the called routine can use the stored address to resume the execution just after the "call site" where the call was first made. Why are return addresses stored on a stack? That's because you can nest calls. Pushing addresses as you jump and popping them in reverse order as you return keeps things nice and neat. This "the stack" is not what Forth refers to as "the stack". Forth's main stack is better known as "the parameter stack". Many Forth implementations also have a return stack!
This is the simplest type of "call" to store in a program. We simply have the jmp (jump) instruction followed by the address to jump to. Here I show both a hard-coded address ( 0x0804000 ) and a register ( eax ). Both of these are "direct" for our purposes.
So now we have the "indirect" part of "indirect threaded code." But what's the "threaded" part?
So jmp [eax] means "jump to the address stored at the address stored in register eax."
For this example to make sense, you need to know that the square brackets around the register ( [eax] ) is a common assembly language convention that means "the value at the memory address that is stored in register eax".
To complete our definition of "indirect threaded" code, we just need to put both concepts together:
Another way to look at the list of addresses above is that, conceptually, threaded code is basically a list of subroutines.
There are two consequences of storing code like this:
Instead of containing the actual instructions to jump or call subroutines:
None of the rest of the material requires understanding any of the above, so please don't feel you need to fully grok (wikipedia.org) it before continuing. Indirect threading is an important part of Forth's history, but there are plenty of Forths that do not use it.
I personally would not have understood that explanation at all until much later in my journey (I know this because similar - probably better - explanations flew right over my head). No doubt you're faster than me at apprehending this stuff and are already halfway through implementing your own Forth based on these descriptions.
In Forth , indirect threaded code is a list of addresses pointing to the "inner interpreter" portions of words, which execute the rest of the word. What types of inner interpreters could we have, anyway? Well, for example, we might have one kind of word that stores a string in memory and another that executes machine code. But the only limit is your imagination.
But what calls all of these inner interpreters? An outer interpreter, of course! The outer interpreter is the part we actually interact with when we sit down to type at a Forth terminal.
Instead of pointing directly at subroutines, these addresses point at interpreters. Talk about ultimate flexibility - every subroutine in an indirect threaded program can have its own custom interpreter for the rest of its instructions...each of which can also be threaded...or indirectly threaded!
Well, this allows us to store a separate "code interpreter" (or "inner interpreter") for different kinds of subroutines!
This is where it gets pretty crazy. So now we've got a second level of indirection. Why on Earth would we do this?
The important thing is that we've now fully traced the origins of Forth from a simple command interpreter to the full-blown interactive language, editor, operating system, and method of code storage and execution it became.
We'll revisit this topic from another angle soon. But if you're interested in these mechanics (and want to see the origin of the boxes and arrows drawings at the top of this section), check out this multi-part article series for The Computer Journal, MOVING FORTH Part 1: Design Decisions in the Forth Kernel (bradrodriguez.com), by Brad Rodriguez.
So the memory and performance improvements of this style of subroutine call were potentially very great indeed. This is one of the reasons for Forth's legendary reputation for high performance .
In addition to its compact storage, threaded code would have been even more efficient on the contemporary machines during Forth's gestation because calling subroutines often wasn't as simple as the call instruction found on "modern" architectures.
It is very dense, compact on disk and in memory.
Threaded code was much more common in the days of yore.
But even knowing all of this, I was still a long way off from a true understanding of how this all comes together in an actual working system. I didn't really understand how it worked. And I didn't understand what Forth was actually like to use In other words, I still didn't understand Forth as a programming language.
Reading about Forth's history is a wonderful way to understand what makes Forth special and what it's about .
None of this was planned. Chuck didn't sit down at a terminal in 1958 and conjure up Forth. Instead, he grew a system to serve his needs and to make use of new hardware as it was made available.
Now that we have everything in historical context, I think it's much clearer why Forth exists and why it takes the peculiar form that it does.
Forth is extremely compact because machines at the time had limited memory and this gave Forth an edge on other interpreters (and even compiled languages!) on mainframes and mini-computers.
Forth is self-hosting because you can bootstrap a Forth implementation from a handful of words implemented in assembly and then write the rest in Forth;
Forth is interpreted because that is interactive and allows the programmer to make fast changes on the fly or simply "play" with the system. This is part of Forth's adaptability and flexibility;
Forth is concatenative because building a language that can operate as a string of words is incredibly flexible and can adapt to just about any programming style without any help from the language itself. (And it turns out this is especially true when you throw in higher-order functions);
Forth is stack oriented because that's a compact and convenient way to store values without needing to add variables or name things;
Forth is postfix because that's a natural order for the computer and lends itself to an incredibly minimalistic interpreter implementation: get the values, operate on them;
At last! Now we can put it all together:
This gives us the why .
Having no other clear course of study, I decided to heed the wisdom of the crowd.
Well, what else could I do?
As we've established, Chuck Moore believes a Forth system is best when it is custom-tailored to the system and task at hand. So it should come as little surprise that writing your own Forth or Forth-like is entirely "par for the course" in any would-be-Forther's quest to discover the True Meaning of the language and enter the mystical realm where All is Revealed.
I've mentioned it before, but I'll point it out again. Notice the phrasing "implement a Forth."
"To understand Forth, you have to implement a Forth."
Somewhere along the way, I came across these quotes...
I did a tiny bit almost every night. A lot of it was debugging in GDB.
Then I spent roughly a year porting JonesForth into a complete working copy in NASM assembler. (Yes, that's a "port" from one flavor of i386 asm to another.)
I prepared myself for dealing with the JonesForth source (i386 assembly language in the GNU GAS assembler) by learning some assembly and Linux ABI basics. JonesForth is 32-bit only and uses the Linux system call ("syscall") ABI directly.
I wrote Assembly Nights when I realized how much I was enjoying myself:
To really get to know it, I took Forth to bed with me.
Especially if the x86 assembly language tricks are new to you like they were for me.
And I learned that it takes time to absorb such a twisty-turny method of code execution.
In the process of writing the port, I learned how a traditional indirect threaded Forth works.
So, of course, I thought maybe I could do better...
Richard W.M. Jones does an excellent job of walking you through the workings of the interpreter and explaining the i386 instruction set features he uses.
One of the first things you encounter when you open up the jonesforth.S (a single file which contains the assembly language portion of JonesForth) are many ASCII art diagrams.
But when we say that Forth is an interpreted language, this is not what we're talking about. There's also the "outer interpreter" that the programmer interacts with.
Perhaps you recall from the section about indirect threaded code above that the second level of indirection allows different "interpreter" routines to execute different types of threaded subroutines? Well, that's all those two ASCII diagrams are trying show.
Both ASCII art diagrams above are just part of the complete indirect threaded execution system. They're just showing how the "inner interpreter" works to execute Forth words.
With the benefit of the distance of time, it is clear to me that these things only make sense once you already understand them to some degree. But the act of making them is extremely useful for solidifying your understanding.
After I was done with my port, I tried to make an ASCII art diagram of my own to capture my new understanding. In fact, this is one of several.
Ah, for that we have this:
But even knowing this only helps to explain how code starts executing. How does this type of Forth know what to run after a word is complete?
If you take anything from this image, it's that INTERPRET looks up words (functions) by name and calls them by executing the interpreter routine whose address is stored in the word (again, this is the indirect threading part). In turn, there may be any number of interpreters, but the three main types used in JonesForth are:
In the vector image I made above for nasmjf, I attempted to map out the whole thing in my own words.
The indirect threaded code is just the tip of the iceberg!
Okay, so now the eax register points to the next threaded subroutine address in memory. The jmp starts executing whatever that points to, which will be the "inner interpreter" for that subroutine.
(Rant: And a "double" is 32 bits on Intel chips for the really annoying reason that they kept the definition of "word" at 16 bits even as the platform moved to 32 and then 64-bit architecture. So "word" on Intel architectures is a completely meaningless thing that you just have to memorize as "16 bits" even though "word" is supposed to be the native data size of the architecture. And what's worse is that the tools for working with programs on Intel chips like GDB then refer to everything with the corresponding C names for everything, which naturally assumed that the architecture names would be based on reality. But they aren't. So terms like "double" and "long" are basically just absolutely worthless legacy garbage to memorize and useful only to C and Intel architecture veterans.)
This implementation uses the i386 lodsd instruction to take care of two operations in one: move a "double word" from memory into a register, and then update another register so that it points to the next "double" spot in memory.
Remember the list of addresses in the explanation of "indirect threaded" code? This is how we execute them sequentially.
Every code word has this macro at the end. (Some Forths actually call a subroutine for this. JonesForth uses this two-line macro because the action is so efficient in i386 machine code.)
Notice the term "code word". That's the Forth term for words written in pure assembly language.
To get from one code word to another requires a bit of assembly pasted at the end of each one. This is the NEXT macro. Here it is from nasmjf :
I do have another attempt to explain how this all nests in a sort of indented pseudocode:
If you weren't lost before, surely this will do the trick?
The EXIT macro handles the return stack. Then there will be a NEXT after that to conclude whatever code word primitive we were in (we're always in at least one because the "outer-most" interpreter is a code word primitive!), so the process we described above will automatically start where we left off at the "call site" of the word we just finished executing.
These "colon" words (so-named because they are assembled via the "COLON" compiler, which we'll talk about in a moment), all end in the so-called EXIT macro.
Remember, there's two fundamental types of words in a traditional Forth like JonesForth: "Code" words and "colon" words. Code words are primitives written in machine code. Colon words are the "regular" words actually written in the Forth language.
To get from one colon word to another uses a bit of assembly pasted at the end of each in a chunk called the EXIT macro. Here it is from nasmjf :
(So, yeah, we have EXIT and QUIT , neither of which leave Forth... Hey, it was the 1960s. Things were different then.)
You'll notice we didn't even talk about QUIT . Other than the name, that one's not nearly as bad - it's really just the end of the outer interpreter loop.
I'm sure every Forth implementer has their own mental model.
This nested view of the process is as close as I've ever been to explaining (to myself) what the entire execution flow looks like at a high level.
My comment in nasmjf attempting to explain the execution of indirect threaded code as a nested sequence of NEXT and EXIT and QUIT :
Absolutely nothing else drives the flow of an indirect threaded Forth application: It's addresses stored in registers, a return stack, and a handful of assembly instructions at the end of each machine code word jumping to the next instruction.
You, citizen of the distant future, will not have recognized this parody, but at least now you can look it up.
Historical note: The above "Crazy Chuck" drawing is a parody of a popular meme with actor Charlie Day's character in the episode "Sweet Dee Has a Heart Attack" from the show It's Always Sunny in Philadelphia :
Don't you see how simple it is?
Let's do the same. I've thrown the terms "code word" and "colon word" around a lot. I've explained them a bit, but we've never given a proper introduction.
When growing a system like this, most of us would have thought bigger, Moore thought smaller.
Ultimate flexibility and simplicity at the lowest level of the implementation comes at the cost of easy understanding at higher levels.
I think that's the genius of Forth: That all of these little pieces can work together to make a running system and yet still remain independent . You can learn each of these in isolation. You can replace them in isolation.
And each was added over a period of time as the need arose.
Each of these tiny parts is extremely simple on its own.
As we've seen, Charles H. Moore did not create Forth all at once in a single lightning bolt of inspiration. It began as a simple command interpreter and executor and grew from there. It has always consisted of tiny little parts, working together.
This is where the historical context is, once again, very revealing:
"Hey, wait! But if Chuck hates complexity so much, why did he use such a complex method for Forth?"
I'll repeat your question from before so you don't have to:
Forth is complex when taken as a whole. But it is made of tiny pieces, each of which is very simple. The concept was created over a period of years on very constrained systems. Each part created only as needed.
Let's see some real code words so we can de-mystify them once and for all. These are extremely simple and extremely concrete examples of actual NASM assembly language source from my nasmjf port of JonesForth:
Again, Code words are primitives written in machine language supplied by the Forth implementation.
Notice the NEXT macro we talked about previously? Remember, that's just another two lines of assembly pasted at the end of this routine.
The DEFCODE macro is housekeeping - it creates the entry's header in the Forth word dictionary.
(JonesForth uses the i386 call/return stack as a Forth parameter stack so we can use the native "pop" and "push" to make these operations easy. In exchange, we lose the ability to use "call" and "ret" for subroutines.)
Is that really SWAP? Yes, it really is! We're just telling the CPU to pop the two most recent values from the stack and then push them back in the opposite order.
Which means that esp points to the current top of the parameter stack. So pushing that value on the stack duplicates the top value. (This could also have been written more clearly with three instructions: one "pop" and two "push"es.)
To understand why this duplicates the top item on the stack, you need to know how the esp register is used. Here's the relevant comment from the JonesForth source:
We're down to just two instructions now! We move the value pointed at by the esp register into eax and then push it onto the stack.
Now let's see these three words in action in a real Forth program that moves some real numbers around in memory...
Now we have an entire Forth word defined as a single instruction! DROP just "removes" the top value from the stack. In this case, we pop it into the eax register and then don't do anything with it, essentially throwing it away. (Alternatively, we could have decremented the esp register, but in this case, the "pop" is both shorter and clearer.)
I think there's something pretty magical about realizing that typing these instructions is running specific machine code sequences exactly as they were entered. In this implementation, there's no optimizing compiler or virtual machine acting as middle-man. You really are communicating directly with the processor.
Again, these instructions could exist in the definition of a word or you could type them interactively in the running Forth interpreter. The result is the same.
Apart from pushing the two numbers on the stack ( 8 7 ) , we've now seen the assembly language code for the entire program shown above. That makes this pretty "bare metal" stuff, right?
The above example shows what it might be like to use these three primitives right at the keyboard. The column on the right shows the state of the parameter stack after each line of input.
You can also call them interactively in the interpreter.
The code word primitives we've just defined are used by the rest of the Forth implementation to define colon words in the language itself. If you write Forth applications, your own colon words will probably use these heavily.
How about something a little more realistic?
Which reminds me, did you know there is such a thing as a one-instruction set computer (wikipedia.org)? And of course you can run Forth on them: 16-bit SUBLEQ eForth (github.com).
Check out this amazing article by Frank Sergeant: A 3-INSTRUCTION FORTH FOR EMBEDDED SYSTEMS WORK (utoh.org).
There are some theoretical minimums. But as you get down to an absurdly small number of instructions, the Forth code written with the primitives (to implement the rest of the language) approaches absurdly large amounts of convolutions that test the limits of both programmer ergonomics and computational inefficiency.
If you weren't already wondering, perhaps you are now: How many Forth words need to be defined in machine code to have a "bootstrappable" Forth system?
Now let's see what colon words are all about...
Okay, so we've talked about code words that are just chunks of machine code that can be called upon at any time.
Note: The statement about Maxwell's equations surely refers to Alan Kay's famous quote about LISP from A Conversation with Alan Kay (acm.org):
The author's posting of the project to the Forth sub-reddit (reddit.com) has additional insight:
See? There's Usenet again. It wasn't just me reading all that lore.
(You can also have strings, which looks like data...but those are just input that happens to follow one of the special words, e.g. ." (dot quote), that knows how to handle the input!)
As this example demonstrates, colon words are defined entirely by other words, which may be code words or other colon words. You can also have numeric values, e.g. 8 and 7, which are handled by the interpreter.
The example colon word definition above creates a new word called SDD that is a composition of the three code words we defined earlier: SWAP , DROP , and DUP . Perhaps the word "composition" brings to mind the concatenative terminology we explored earlier in this quest?
A colon word is so-named because its definition begins with the " : " character.
8 7 8 7 SDD 7 7
SDD word is, of course, identical to calling the three separate words SWAP , DROP , and DUP in sequence. The effect of calling our newword is, of course, identical to calling the three separate words, andin sequence. In indirect threaded code terms, this colon word has been "compiled" into the addresses of the "inner interpreters" for each of the three code words. But feel free to ignore this detail! Let's demystify this further because the Forth "compiler" is probably much, much simpler than you'd think:
How ":" works Here's what really happens when we enter this: : SDD SWAP DROP DUP ; Colon ( : ) fetches the word name (SDD) and sets "compile mode". Semicolon ( ; ) completes the word's entry in the dictionary and unsets "compile mode". It might still be surprising that ":" is a Forth word. It looks like the sort of thing we would call "syntax" in other programming languages, but it really isn't. It's a word. You can even replace ":" with your own definition to extend or alter Forth to do your bidding! It may be hard to fully grasp for a while, but Forth's only syntax is the whitespace between tokens of input. Tokens are tokenized by a word called "WORD", which is an incredibly confusing overload of the term. Sorry. (You'll also notice I've mentioned the term "dictionary" a couple times now. It's kind of obvious that a dictionary can hold words, but I haven't properly explained the Forth dictionary yet. Don't worry, we're almost there.) Okay, so " : " switches the "outer interpreter" into compile mode and ; switches it back. But what does that mean?
"Compiling" in Forth means putting one of two things into memory: The address of a word, or
A value literal and a bit of code that pushes it on the stack At its simplest, compiling is just like executing, but we're storing addresses instead of jumping to them. Actually, that's understating the elegance and simplicity of how this works, which is one of the most mind-blowing things in Forth. Forth uses the same interpreter to both compile and execute code! In a traditional Forth, the interpreter executes words as you enter them. Unless you're in "compile mode", then it is compiling those words as addresses into memory on the fly as you enter them. It's straight from the keyboard to memory. To make this concrete, let's step through the example. Here's our definition again: : SDD SWAP DROP DUP ; In "normal mode", the interpreter is executing everything as we enter it. When the interpreter encouters the " : " word, we're still in "normal mode", so it looks " : " up in the dictionary, finds it, and executes the word. The definiton of " : " will collect the name "SDD" and turn on the "compile mode" switch. Now when the interpreter hits the " SWAP " word, it will look up its address in the dictionary as usual, find it, and store the address in the next available memory slot where we compile new words (a very important built-in variable called " HERE " keeps track of this memory position). The same thing happens for " DROP " and " DUP ". We're compiling as fast as we can type! Then a bunch of really interesting things happen when the interpreter gets to " ; " (SEMICOLON). First, " ; " is looked up and found in the dictionary and then...Hey, wait! Why isn't the address of the " ; " word also compiled into our new definition? That's a great question! Time for another trick. One of the flags stored in a word's dictionary entry is the "immediate" flag. When this flag is turned on, the word is always executed immediately even in compile mode. The " ; " word is an immediate word, so it executes instead of being compiled. (Ready to have your head turned inside-out? There are also tricks for compiling immediate words into word definitions! It's simple enough, but still pretty mind-bending stuff when you first encounter it.) The definition of " ; " turns off compile mode. Then it does some housekeeping to complete the entry of the new SDD word in the dictionary. As soon as " ; " returns control to the outer interpreter, we're now sitting in normal mode again and our new SDD word is available to be called directly or compiled into other words. See what I mean? It's all made of these tiny little parts. Each part is incredibly simple, but trying to explain how the parts fit together takes paragraphs of text. Speaking of simple...
Almost no syntax = simple interpreter and extreme extensibility The tiny set of rules that govern the interpreter: WORD gets a token.
Is it in the dictionary? (And are we compiling?)
Is it a numeric literal? (And are we compiling?)
Otherwise, error! Let's look at our example code again. The first line runs, the second line compiles: 8 7 SWAP DUP + : SDP SWAP DUP + ; 8 7 SDP It would be annoyingly redundant to walk through the two lines of Forth above step-by-step because they are nearly identical. The only difference is that the first line simply executes each word as it is encountered (SWAP, DUP, +). The second line compiles those three words into a new word called SDP (for "Swap Dup Plus"). The result of both lines is the same. (7 and 16 on the stack). Only the numbers (8 and 7) and the spaces separating words have any special meaning to Forth's "outer" interpreter. Everything else is looked up in the dictionary. Ah, but did you notice the order of the bullet points above? We check to see if a token is in the dictionary before we check to see if it is a numeric literal. Yes, even numbers are looked up in the dictionary first! Does that perhaps give you any ideas about that magic trick I promised at the start of this article? Don't worry, the trick is forthcoming. Furthermore, input is not returned to the main Forth "outer" interpreter until a dictionary word completes executing. So there is absolutely no limit to the types of domain-specific language (wikipedia.org) you can create. And if that weren't enough, You can also replace every single piece of the Forth interpreter itself. Remember, they're all independent little cogs in the machine. Forth is the ultimate freedom. I've alluded to this in several different ways above, but I'll make a bold claim: Forth has the simplest syntax and therefore the simplest parser, interpreter, and compiler ever used in a "mainstream" general-purpose programming language. Two other languages previously mentioned, Lisp and Tcl, are also famously syntactically minimalistic languages. People have written incredibly tiny implementations of each: Lisp: sectorlisp, a 512-byte implementation of LISP (github.com/jart)
Tcl: picol, a Tcl interpreter in 550 lines of C code (antirez.com) Mind you, both of these people (Justine "jart" Tunney and Salvatore "antirez" Sanfilippo) are incredible programmers, but these examples hint at what is possible. But Forth surely takes the cake. Even a certified non-genius like myself can write an entire Forth interpreter in a couple hundred assembly instructions. (See "Meow5" below.) Because of its extreme simplicity, tokenizing Forth can be done in a mere handful of assembly instructions on many processors. And as mentioned, once you've written a Forth interpreter, you're well on your way to a working Forth compiler. I've alluded to Forth's flexibility and extensibility on several different occasions now. But this is no mere party trick. Forth relies on the fact that you can do anything in Forth. In the next example, we'll see how Forth implements control structures.
The definition of IF...THEN from jonesforth.f: : IF IMMEDIATE ' 0BRANCH , HERE @ 0 , ; : THEN IMMEDIATE DUP HERE @ SWAP - SWAP ! ; This right here is one of the most mind-blowing things about Forth, and a solid reason to title this, "The programming language that writes itself." Even something as fundamental as IF is defined in the language! Forth is not the only language that can do this, but few languages invite the programmer to participate so thoroughly in the inner workings as often or as joyfully as Forth. Figuring out how the IF and THEN definitions above actually work is left as an exercise for the reader, but here's a brief explanation of the new words they use: ' - gets the address of the word that follows, put on stack 0BRANCH - branch to the next value if the top of the stack has 0 , - 'compile' the current stack value to the memory at HERE @ - fetch value from address on stack, put value on stack ! - store to memory (stack contains address, then value) (By the way, I'll go on the record to say this: The early parts of bootstrapping Forth in Forth (at least the top 25% of jonesforth.f) is significantly more mind-bending than implementing the low-level code word definitions written in assembly language. In fact, any time I needed to return to the assembly, it was like a comforting blanket of simplicity compared to the logic puzzle of those Forth-in-Forth primitives!) But, even seeing control structures like IF..THEN implemented in the language may not have prepared you for seeing this next trick. This should drive home the fact that Forth has almost no native syntax:
The definition of ( ) nested comments from jonesforth.f: : ( IMMEDIATE 1 BEGIN KEY DUP '(' = IF DROP 1+ ELSE ')' = IF 1- THEN THEN DUP 0= UNTIL DROP ; ( From now on we can use ( ... ) for comments. ... Yeah, you read that right. Even comments are implemented in the language! And you can re-define them or add your own kind of comments! Some of you are soiling yourselves in excitement right now. Some of you are soiling yourselves in fear. We're all just sitting here in our own filth now. And now, at last, we are ready to discuss the power of the Forth dictionary.
The Dictionary A Forth dictionary traditionally uses a linked list. Word matching is done starting from the end (most recent entries) first, so: You can redefine any word, even the ones originally defined in assembly!
word, even the ones originally defined in assembly! Words depending on previous definitions of redefined words won't break because the compiled addresses still point to the original word, not the new definition!
You are in complete control!
are in complete control! Again, Forth = freedom! It's not just minimalistic syntax. Arguably, the real reason Forth is so extensible is because of the dictionary. As mentioned in the points above, more recent word definitions override older ones with the same name - the interpreter stops at the first match. But as mentioned above, existing compiled words that use the old definitions are not affected because name of the old word, they've stored the address. The address of the old word still points to the old word. You don't have to strictly replace. You can extend words by calling the original word from a new one with the same name! You are perhaps wondering what happens if you attempt to make a recursive word. By default, ':' (COLON) marks the word currently being compiled into the dictionary as hidden or disabled so that previous definitions can be called, as mentioned. This is why we have a word called RECURSE which inserts a call to the current word within itself. Because all information in Forth is global (including the address of the current word being compiled, defining RECURSE is incredibly simple (just four words in the JonesForth definition). Besides making new control structures or other types of extensions to the language, what else can we do with these abilities?
It's not just the language itself that is unusually malleable. Your program written in Forth can be flexible too. Here is an example lifted and paraphrased from Thinking Forth by Leo Brodie. Say we create a variable to hold a number of apples: VARIABLE APPLES 20 APPLES ! APPLES ? 20 Forth variables put addresses on the stack. Note: I have a physical copy of Thinking Forth because I think it's great. But the publishers have kindly made it available for free online: Thinking Forth (PDF) (forth.com) Let's walk through the three lines above. Here's the first line: VARIABLE APPLES The VARIABLE word creates a new spot in free memory. Then it creates a new word in the dictionary called APPLES that pushes that particular memory address on the stack when it is called. (Note that like ":", "VARIABLE" is grabbing the next token of input for use as a new dictionary name. This is possible because "the little cogs in the Forth machine" are available for any use you can think of. And one of those cogs is the word WORD, which gets the next token from the input stream. Both ":" and "VARIABLE" use WORD to do this, just like Forth's own outer interpreter!) Okay, so we have a variable named APPLES now. The next line is: 20 APPLES ! This puts the value 20 on the stack, then the address for APPLES. The "!" (STORE) word stores the value 20 at the APPLES address. (In other words, "!" takes two values as input: an address and a value. It stores the value at that address.) Conceptually, you can think of the above as APPLES = 20 in "normal" programming syntax. And now the third line: APPLES ? This line prints the value stored at APPLES. The word "?" fetches a numeric value from an address and prints it (which pops the value off the stack again). Again, APPLES puts its address on the stack. So "?" simply takes an address from the stack as input for printing. By the way, here's the entire definition of "?" in JonesForth: : ? @ . ; Look at how small that is! The only thing you need to know to understand this definition is that "@" (FETCH) pops an address from the stack and fetches the value stored at that address and puts the value on the stack. "." (DOT) pops a value from the stack and prints it as a number. Okay, on with our example. We're about to be dealt a terrible blow...
We pepper our program with this APPLES variable. The application works perfectly for a couple years. Then we are told that we must now keep track of two different kinds of apples: red and green. What to do? Unfortunately, this is exactly the sort of conundrum we see in real life software all the time. You knowingly prepared for all sorts of different quantities of apples, but it never occurred to anyone that we would need to track different types of apples. This problem seems very bad. Do we have to completely re-write our application? (Well, outside of this example, the correct answer might be "yes". Maybe this changes the whole "theory" of the program, in the Programming as Theory Building (ratfactor.com) sense. In which case, a re-write or big refactor of our apple counting program is likely the right answer. But for this example, we're assuming that we have thousands of lines of apple-handling functionality that will not need to change. We'll say that grouping the apples by color here is just an essential surface detail.) All right, obviously we can't store two values in one variable and expect all of the existing code to still work. So what could we possibly do? Here's a very clever and very Forth solution:
A new variable will store the current type of apples. VARIABLE COLOR As with "APPLES" above, VARIABLE creates a memory space and a new word called "COLOR" that puts the address of the memory space on the stack when it is called. Next, we'll create a second new variable and a new colon word.
"REDS" will count red apples. Colon word "RED" sets the current type of apple to red: COLOR = REDS: VARIABLE REDS : RED REDS COLOR ! ; Remember, variables are also words in the dictionary, so we've created three additional words so far: COLOR, REDS, and RED. (Only one of these, RED, is recognizably a function. But really all three of them are.) As you may recall from earlier, "!" (STORE) takes two parameters, a value and an address, and stores