Implementing DOES> in Forth, the entire reason I started this mess

throwaway81523
·
3 weeks ago
·
[ - ]

CREATE makes a dictionary entry for a word named by the string you supply in the input stream, whose execution semantics are to push onto the stack the address of the dictionary space following the entry that was just created. That address lives in the variable HERE. Execution semantics for a word means some code invoked when you execute the word. That code in turn is pointed to by an address living in a cell that is part of the dictionary entry.

DOES> overwrites that address so that executing the word, instead of doing the default thing, now runs some different code, namely the code that you supply after the DOES>.

This is something of a kludge because the usual implementation stores something (the default semantics) in that cell when you run CREATE, then later overwrites it when you run DOES. Since lots of Forth targets today are microcontrollers whose code storage is in flash memory, overwriting individual already-written cells in code space is not nice.

Early Forths had <BUILDS ... DOES> instead of CREATE ... DOES> . You can see how the angle brackets originally looked symmetrical but after things changed, the bracket only appeared on DOES> and that may be part of why people find it confusing.

<BUILDS didn't install any default action into the newly created word. It left it uninitialized so it would get filled in when DOES> came along. CREATE DOES> was sort of an optimization since CREATE already existed too, making <BUILDS unnecessary. So they got rid of <BUILDS during standardization back in the minicomputer era where this stuff always generated code in ram (or maybe magnetic core) rather than flash. That optimization in turn has bitten some implementers in the butt. So <BUILDS DOES> has come back into usage among some MCU implementations like FlashForth. FlashForth is pretty nice by the way.

Well I didn't mean to type that much, but I hope it helps.

kazinator
·
2 weeks ago
·
[ - ]

It would seem like less of a kludge if DOES> precipitated into a deferred context terminated by its own semicolon. So you could just type

  CREATE FOO 42 , DOES> @ ;

into an interpreter to create the constant. Then if placed inside a definition, there would be two semicolons:

  : CONSTANT CREATE , DOES> @ ; ;

It's an extra nesting which makes it clear you have a definition that makes a definition. You could even put words between the semicolons which just become part of the definition of CONSTANT.

It feels as if this DOES> thing is a kludge that activates within definitions, and kind of "hijacks" the rest of their instructions. Without DOES>, the material after it would be part of the definition of CONSTANT and not part of the definition of the word produced by CREATE. The switcheroo feels hacky.

throwaway81523
·
2 weeks ago
·
[ - ]

DOES> is separate from the definition. The compiler just remembers what the most recently defined word was, and DOES> modifies a cell in that word's dictionary entry.

·
2 weeks ago
·
[ - ]

rickcarlino
·
3 weeks ago
·
[ - ]

Similar (but NOT identical) concept in RetroForth I really enjoyed learning about years ago: https://rickcarlino.com/2021/til-how-retroforth-implements-d...

It’s nice to see Forth internal deep dives hitting the front page, great article.

jackdoe
·
3 weeks ago
·
[ - ]

Amazing article!

I hate DOES> I was implementing it well after 1am last night and I hate it, I have this feeling as something gets harder to implement it means its not right, but I know DOES> is right, so its me, I just couldn't implement it well. It was super frustrating. But now I feel better :)

I am new to Forth but it feels like `create does>` has to be replaced with some new construct, I just want word code to operate on its data, but I need to gain more experience to find out, for now `create does>` will do.

alexisread
·
3 weeks ago
·
[ - ]

You can replace it, there are two (nearly 3) forths that replace it:

https://github.com/dan4thewin/FreeForth2/blob/master/ff.asm

which uses a double loop to lookup first macro words, and then immediate words

https://github.com/ablevm/able-forth/blob/current/forth.scr

Ableforth implements a defer/expand operation \ to effectively quote words. The basic loop is then simply parse a text word, look it up and execute it.

Both make use of macros (code generators) to implement deferred behaviour, as well as code inlining. Ultimately all these operations implement defer by manipulating the execution flow, something that algebraic effects also do.

I have a feeling that algebraic effects can be used in a Forth to implement DOES.

astrobe_
·
2 weeks ago
·
[ - ]

Also Chuck Moore's ColorForth doesn't have "create does>" at all. Probably he considered the complexity was not worth it. I don't have it in my interpreter either, and I don't miss it; for the few near-misses I just use regular literals and calls. For instance, for the textbook example of "self-indexing arrays":

    : index-array swap cells + ;
    create a1 10 cells allot
    create a2 20 cells allot
    : array1 a1 index-array ;
    : array2 a2 index-array ;

I used to use the "implementation-dependent" trick of popping the return address (in e.g. index-array) to get the data. Less verbose, a bit more efficient. But my implementation doesn't permit it anymore.

Recently I've found out that implementing "self/this" pseudo-value and pseudo-method calls much more useful. The relation with this and "create does>" is that latter can be seen as poor man's closure, or poor man's object [1].

[1] https://stackoverflow.com/questions/2497801/closures-are-poo...

retransmitfrom
·
3 weeks ago
·
[ - ]

Two forths is one half.

vdupras
·
3 weeks ago
·
[ - ]

In Dusk OS, I implement does> but with one level of indirection removed. Rather than using create, you bind any number to a behavior. Example:

: foo 1+ does> . ;

42 foo bar

bar \ prints 43

If you want, you can add the "traditional" indirection in the initialization part, for a similar effect.

So, not quite the same, but almost, and I think it echoes the intuition you have, which is also mine.

anthk
·
2 weeks ago
·
[ - ]

I miss decimal output for the stack contents by default (with .S).

zelphirkalt
·
2 weeks ago
·
[ - ]

I used some GForth to solve a few Advent of Code puzzles, but had to stop, when I couldn't figure out how to properly read in lines of different lengths from a file. I had no idea how to use the pad and had not progressed that far in learning. But I have another question about Forth in general, and maybe someone can explain it to me:

With Forth's stack based nature, how is it possible to ever have performant data structures? If I only ever do things by pushing and popping from the stack, then I would think, that data structures are inherently limited to linear access times. But there are libraries implementing arrays and so on. I don't understand how this is possible in a performant way. How those structures are made, so that they are performantly accessible. Or perhaps Forth is really seriously lacking performant data structures? But that seems crazy unlikely.

whiteandnerdy
·
2 weeks ago
·
[ - ]

Forth supports direct memory access; you can request arbitrarily sized chunks of heap memory and retain pointers either on the stack or aliased by dictionary words.

https://forth-standard.org/standard/core/ALLOT

zelphirkalt
·
2 weeks ago
·
[ - ]

But how is that then used? If I maintain pointers for each part of a data structure on my stack, won't I have to linearly go through those pointers, to use them to lookup what they are pointing to? And surely it is not ergonomic or feasible to create a new word for something like each element of a 1000 by 1000 matrix?

unnah
·
2 weeks ago
·
[ - ]

You can put pointers inside data structures on the heap, similarly to what you would do in C. The functions that process those data structures only need local variables (or stack slots) for a few pointers, and with those pointers they can fetch more pointers from the data structures on the heap. In principle the entire program can be structured pretty much identically to C, although in Forth you would typically split the code into smaller functions, and often use the stack instead of local variables.

mikewarot
·
3 weeks ago
·
[ - ]

Long, long ago I wrote a Forth for OS/2 in assembler (mostly out of spite, because I was told you couldn't write OS/2 programs in assembler, you had to use C++)

I still don't know what DOES> really does... ;-)

Someone
·
3 weeks ago
·
[ - ]

It’s the delimiter between the code that will run when the word is run and the code that gets compiled to be run later.

Typical usage is for the “code that will run immediately” is to store some data, and for the “code that gets compiled to be run later” to use that data.

perhaps the simplest example is CONSTANT, which can be defined like this:

     : CONSTANT ( w "name" -- )
         CREATE ,
     DOES> ( -- w )
         @ ;

Here, the “code that will run immediately” is

  CREATE ,

which a) reads a name from the command line and creates a word with that name, and then takes the top of the stack and stores that directly after the word’s definition.

The “code that gets compiled to be run later” is

which fetches the formerly stored value (taking the address of the formerly created word from the stack)

DOES> has to do some shenanigans to make that work, but that’s an implementation detail, and will be dependent on the particular FORTH being used.

kazinator
·
2 weeks ago
·
[ - ]

OK, so if we have

  : NAME alpha beta ... psi omega ;

what happens is that at at compile time, a dictionary entry NAME is created, and then the alpha ... omega words are compiled to be run later.

When DOES> is introduced:

  : NAME alpha beta ... DOES> ... psi omega ;

all of the above still holds. We still have a dictionary entry NAME, which denotes all of the words up to the semicolon, including DOES>.

Then, when we execute NAME in a compilation context, because the word sequence contains DOES>, everything to the left of DOES> is specially treated: it is executed immediately in the compilation context and is removed. But that's not all; DOES> doesn't just execute everything to the left and disappear; it leaves something behind: some word which is then combined with the material to the right of DOES> to form the run-time sequence.

In your example, when we run CONSTANT, the part to the left of DOES> fetches a name from the input stream, and creates a word, and then makes the value on the stack the definition.

the accumulation of to-be-run later words is interrupted, and everything before DOES> is done now, at definition time, and removed from the definition.

The CREATE material, when executed, leaves behind a reference to the word denoting the constant. Then DOES> creates a definition for that word, using the remaining material.

Is that more or less it?

Someone
·
2 weeks ago
·
[ - ]

> Then DOES> creates a definition for that word, using the remaining material.

Typically (likely always, as sharing code is the main reason DOES was invented), it compiles that code once and magically makes “that word” ‘jump’ to that code. That way, when, for example, you use the definition with DOES> multiple times, you only compile “the remaining material” once.

> Then, when we execute NAME in a compilation context, because the word sequence contains DOES>, everything to the left of DOES> is specially treated

That would be too magical. If you want a word to be executed when compiling code, you make it IMMEDIATE. For the CONSTANT example I gave, that’s not done, as it is executed in interpretation context, and then creates a word, compiles the number from the top of the stack, and then hooks up the word just created to the code compiled earlier.

kazinator
·
2 weeks ago
·
[ - ]

That's what I was wondering; the multiple times. If we have an ordinary definition : foo a b c d ; of course the "a b c d" is compiled once and tied to foo.

But the code after DOES> is repeatedly referenced in new definitions that are the result of executing the word which contains DOES>, like the CONSTANT example.

The original : CONSTANT CREATE , DOES> @ ; could be entirely compiled so that it contains a compiled sequence for the @ part. When DOES> is executed, it patches a pointer to that part into the word produced by CREATE, and then somehow skips the execution of that part.

How do you manage the storage? There has to be some refcounting or garbage collection. What if four words point to the same instruction sequence, and we FORGET three of them.

Ah, but FORGET works in a LIFO discipline; you can't just forget arbitrary entries. If B was defined using parts of A, then B is newer. You cannot forget A without forgetting B first. I think.

tengwar2
·
2 weeks ago
·
[ - ]

OS/2: seems an odd thing to say. I used to write in C and MASM, and there were no particular barriers.

rwmj
·
3 weeks ago
·
[ - ]

Jonesforth doesn't implement it because it's complicated and not really necessary to understand the basics. Also it's, erm, an exercise for the reader, and this reader solved it very well ;-)

xkriva11
·
2 weeks ago
·
[ - ]

I tried do do it in a 16-bit JonesForth-based implemenetation and it required to rename JonesForth CREATE to CREATEHEAD, implement a primitive DODOES and then define these two words:

: CREATE WORD CREATEHEAD DODOES , 0 , ;

: DOES> IMMEDIATE ['] LIT , HERE @ 6 CELLS + , ['] LATEST , ['] @ , ['] >DFA , ['] ! , ['] EXIT

5-
·
3 weeks ago
·
[ - ]

i've never looked closely at any of this, and it's been a long time since i looked at all.

reading colorforth code and especially commentary (https://github.com/Howerd/colorForth) it seemed that it refines the concept of staging into colours (does> might correspond to cyan?).

hopefully someone more knowledgeable will chime in here!

pyinstallwoes
·
3 weeks ago
·
[ - ]

Okay, so what's the significance of it and what's the boon?

Surprised so little public forth's implement it.

oneearedrabbit
·
2 weeks ago
·
[ - ]

Decades ago, a closure in Forth was especially innovative with DOES>:

  : COUNTER
    CREATE ,
    DOES> DUP 1 SWAP +! @ ;
  0 COUNTER PK
  PK .  \ => 1
  PK .  \ => 2

A semi-equivalent in Javascript is:

  const counter = init => {
    let x = init;
    return () => { x += 1; return x; };
  };
  const pk = counter(0);
  console.log(pk());  // => 1
  console.log(pk());  // => 2

sph
·
3 weeks ago
·
[ - ]

All the most common Forth I know implement CREATE DOES>

What’s funny, is that I used to know how it works, now any time I come across these kind of articles I get more and more confused and further away from understanding. It’s like reading those convoluted explanations of what a monad is.

alexisread
·
3 weeks ago
·
[ - ]

It implements defining words ie. Additional compiler words.

It does this by doing something now, and later (you could read create does> as now later>)

So for

: CONSTANT CREATE , >DOES @ ;

This makes the defining word CONSTANT, which when run (now) compiles the next word

So CONSTANT myvar will compile myvar. Myvar, when run (later) will get it's value and push it to the datastack.

spc476
·
3 weeks ago
·
[ - ]

It only briefly goes into what it does, this article goes into how it's done for a particular implementation.

anthk
·
3 weeks ago
·
[ - ]

Even EForth under Sublex/Muxleq implements it.

throwaway77699
·
3 weeks ago
·
[ - ]

TIL https://github.com/howerj/subleq/ https://esolangs.org/wiki/Muxleq

KlausWinter
·
3 weeks ago
·
[ - ]

[flagged]