Dynamic translation of Smalltalk to WebAssembly

145
23
lioeters
11 months ago
thiscontext.com

stevedekorte
·
11 months ago
·
[ - ]

Great to see work like this being done. Javascript is often a "good enough" language, but an efficient Smalltalk (or Self) language with support for things like system images, become:, coroutines, and other advanced features would open up a lot of advanced programming techniques like fast portable images, transparent futures, cooperative concurrency without async/await on every call, etc.

davexunit
·
11 months ago
·
[ - ]

On a similar note, you can now run Scheme in the browser via wasm: https://spritely.institute/hoot/

The current release can do coroutines via the delimited continuation support, but the next release will have ready-to-use lightweight threads (aka "fibers") that integrate with JS promises. No async/await marked functions/calls.

epolanski
·
11 months ago
·
[ - ]

Effect-ts is also built on the same principles (fibers) if one wants to stay in TS land.

https://effect.website/docs/guides/runtime

OnlyMortal
·
11 months ago
·
[ - ]

As an Obj-C guy, if I declare a method as input only and no return, it can run async in the run loop.

I assume Smalltalk can do the same?

Retr0id
·
11 months ago
·
[ - ]

> I don’t expect the WASM translations to be much (or any) faster at the moment, but I do expect them to get faster over time, as the WASM engines in web browsers improve (just as JS engines have).

I haven't been following WASM progress, do people generally share this optimism for WASM performance improvement over time, relative to JS? I was under the vague impression that all the "obvious" optimizations had already been done (i.e. JIT, and more recently, SIMD support)

tracker1
·
11 months ago
·
[ - ]

I think it depends on what you are doing... I think there are huge opportunities for interop performance as well as just optimized process paths for certain things. I'm not that deep on the technical side, more of an avid observer. It just seems that as long as many things in WASM are slower than general performance for say Java or C#, that there is definitely room to improve things.

As an example, looking at the in-the-box Garbage Collection support that's being flushed out, as an example will improve languages that rely on GC (C#, Java, Go, etc) without having to include an implementation for the runtime.

Another point where there's potential for massive gains are the browser UI interop as well, and I think there's been a lot of effort to work within/around current limitations, but obviously there's room to improve.

dgb23
·
11 months ago
·
[ - ]

There seems to be alot of work going into WASM itself, much of it is performance related:

https://github.com/WebAssembly/proposals

mjhay
·
11 months ago
·
[ - ]

If only you could directly access the DOM or other browser APIs. I understand the GC proposal might help with this (?).

dgb23
·
11 months ago
·
[ - ]

I found a discussion with some explanations to why that is and what other solutions could be looked at:

https://github.com/WebAssembly/design/issues/1184

andsoitis
·
11 months ago
·
[ - ]

”The main reason is that direct access to the DOM requires the ability to pass references to DOM/JS objects through Wasm. Consequently, when GC happens for JavaScript, the collector must be able to find and update such references on the live Wasm stack, in Wasm globals, or in other places controlled by the Wasm engine. Hence the engine effectively needs to support GC in some form.

However, the new proposal for reference types that we split off from the GC proposal tries to give a more nuanced answer to that. It introduces reference types without any functionality for allocating anything within Wasm itself. In an embedding where host references are garbage-collected that still requires a Wasm implementation to understand GC. But in other embeddings it does not need to.”

mjhay
·
11 months ago
·
[ - ]

Thank you. That helps with my confusion a lot.

mason_mpls
·
11 months ago
·
[ - ]

As someone who wants WASM to free us all from JS. Is this even worth it if DOM bindings for WASM will never get a released? I’ve been following this since 2018 and it seems like we’re still at square 1 with DOM bindings. Incredibly frustrating.

Oreb
·
11 months ago
·
[ - ]

> I was under the vague impression that all the "obvious" optimizations had already been done (i.e. JIT, and more recently, SIMD support)

Aren’t threads still missing? That seems like a pretty major optimization, now that almost any CPU you can buy has multiple cores.

aseipp
·
11 months ago
·
[ - ]

Yes, but they're very close to being standardized and have multiple existing implementations among different browser engines and also non-browser runtimes. (Of course, I don't think the threading proposal is exactly what OP had in mind for the case of general WASM perf improvements, but you are right it is in practice a big performance barrier in the bigger scheme.)

garaetjjte
·
11 months ago
·
[ - ]

Shared memory is standardized, you can run WASM threads in the browser through web workers. WASI threads standardization is in limbo because apparently component model is very important and nobody knows how threads will interact with that.

aseipp
·
11 months ago
·
[ - ]

Yes, I should have been more explicit: the raw WASM proposal only really defines how shared memory and cross-thread atomics work where it's assumed each thread runs a wasm module (with shared memory regions that are appropriately mapped.) It does not specify how the host environment actually spawns or manages threads or what hostcalls are available for that.

That said I think in the browser something like emscripten can do something akin to "use web workers with shared array buffers" to back it all up so that threading APIs roughly work, but yes, WASI currently has nothing for this.

·
11 months ago
·
[ - ]

connicpu
·
11 months ago
·
[ - ]

From my understanding, most of the WASM performance improvement expectations have been around the cost of calling browser APIs from within WASM. The performance itself is basically near native speed with a small overhead for the cost of bounds checking/wrapping the memory block. Last I checked, most WASM apps are calling DOM APIs through a javascript middleman, which obviously sucks for performance. But native importing of DOM APIs is something that I believe was being worked on and could be here soon?

afavour
·
11 months ago
·
[ - ]

Yeah, I'd be surprised. I'm sure there are improvements to be made out there (e.g. maybe the sandbox could be faster) but the crazy leaps we've seen in JS performance are primarily because the complexity in the language makes for a very complex implementation with plenty of inefficiencies. By comparison WASM is already pretty close to the metal.

·
11 months ago
·
[ - ]

DonHopkins
·
11 months ago
·
[ - ]

Craig Latta's Caffeine work live coding with Smalltalk and SqueakJS is amazing.

https://observablehq.com/@ccrraaiigg/caffeine

>Caffeine integrates SqueakJS, a JavaScript implementation of the Squeak Smalltalk virtual machine, with several JavaScript runtime environments, including web frontends (web browsers, with DOM, DevTools, and Observable integration), backends (Node]S), and Web Workers.

https://github.com/ccrraaiigg

Craig Latta - Caffeine - 26 May 2021:

https://vimeo.com/591827638

>Caffeine ( caffeine.js.org ) is a livecoded integration of the SqueakJS Smalltalk virtual machine with the Web platform and its many frameworks. Craig Latta will show the current state of Caffeine development through live manipulation and combination of those frameworks. The primary vehicle is a Caffeine app called Worldly, combining the A-Frame VR framework, screen-sharing, and the Chrome Debugging Protocol into an immersive virtual-reality workspace.

>Craig Latta ( blackpagedigital.com ) is a livecoding composer from California. He studied music at Berkeley, where he learned Smalltalk as an improvisation practice. He has worked as a research computer scientist at Atari Games, IBM's Watson lab, and Lam Research. In 2016 he began combining Smalltalk technologies with the Web platform, with an emphasis on spatial computing. He is currently exploring spatial audio for immersive workspaces.

SqueakJS – A Squeak VM in JavaScript (squeak.js.org) 115 points by gjvc on Oct 27, 2021 | hide | past | favorite | 24 comments

https://news.ycombinator.com/item?id=29018465

DonHopkins on Oct 27, 2021 | prev | next [–]

One thing that's amazing about SqueakJS (and one reason this VM inside another VM runs so fast) is the way Vanessa Freudenberg elegantly and efficiently created a hybrid Smalltalk garbage collector that works with the JavaScript garbage collector.

SqueakJS: A Modern and Practical Smalltalk That Runs in Any Browser

https://freudenbergs.de/vanessa/publications/Freudenberg-201...

>The fact that SqueakJS represents Squeak objects as plain JavaScript objects and integrates with the JavaScript garbage collection (GC) allows existing JavaScript code to interact with Squeak objects. This has proven useful during development as we could re-use existing JavaScript tools to inspect and manipulate Squeak objects as they appear in the VM. This means that SqueakJS is not only a “Squeak in the browser”, but also that it provides practical support for using Smalltalk in a JavaScript environment.

>[...] a hybrid garbage collection scheme to allow Squeak object enumeration without a dedicated object table, while delegating as much work as possible to the JavaScript GC, [...]

>2.3 Cleaning up Garbage

>Many core functions in Squeak depend on the ability to enumerate objects of a specific class using the firstInstance and nextInstance primitive methods. In Squeak, this is easily implemented since all objects are contiguous in memory, so one can simply scan from the beginning and return the next available instance. This is not possible in a hosted implementation where the host does not provide enumeration, as is the case for Java and JavaScript. Potato used a weak-key object table to keep track of objects to enumerate them. Other implementations, like the R/SqueakVM, use the host garbage collector to trigger a full GC and yield all objects of a certain type. These are then temporarily kept in a list for enumeration. In JavaScript, neither weak references, nor access to the GC is generally available, so neither option was possible for SqueakJS. Instead, we designed a hybrid GC scheme that provides enumeration while not requiring weak pointer support, and still retaining the benefit of the native host GC.

>SqueakJS manages objects in an old and new space, akin to a semi-space GC. When an image is loaded, all objects are created in the old space. Because an image is just a snapshot of the object memory when it was saved, all objects are consecutive in the image. When we convert them into JavaScript objects, we create a linked list of all objects. This means, that as long as an object is in the SqueakJS old-space, it cannot be garbage collected by the JavaScript VM. New objects are created in a virtual new space. However, this space does not really exist for the SqueakJS VM, because it simply consists of Squeak objects that are not part of the old-space linked list. New objects that are dereferenced are simply collected by the JavaScript GC.

>When full GC is triggered in SqueakJS (for example because the nextInstance primitive has been called on an object that does not have a next link) a two-phase collection is started. In the first pass, any new objects that are referenced from surviving objects are added to the end of the linked list, and thus become part of the old space. In a second pass, any objects that are already in the linked list, but were not referenced from surviving objects are removed from the list, and thus become eligible for ordinary JavaScript GC. Note also, that we append objects to the old list in the order of their creation, simply by ordering them by their object identifiers (IDs). In Squeak, these are the memory offsets of the object. To be able to save images that can again be opened with the standard Squeak VM, we generate object IDs that correspond to the offset the object would have in an image. This way, we can serialize our old object space and thus save binary compatible Squeak images from SqueakJS.

>To implement Squeak’s weak references, a similar scheme can be employed: any weak container is simply added to a special list of root objects that do not let their references survive. If, during a full GC, a Squeak object is found to be only referenced from one of those weak roots, that reference is removed, and the Squeak object is again garbage collected by the JavaScript GC.

DonHopkins on Oct 27, 2021 | parent | next [–]

Also: The Evolution of Smalltalk: From Smalltalk-72 through Squeak. DANIEL INGALLS, Independent Consultant, USA

https://smalltalkzoo.thechm.org/papers/EvolutionOfSmalltalk....

>A.5 Squeak

>Although Squeak is still available for most computers, SqueakJS has become the easiest way to run Squeak for most users. It runs in just about any web browser, which helps in schools that do not allow the installation of non-standard software.

>The germ of the SqueakJS project began not long after I was hired at Sun Microsystems. I felt I should learn Java; casting about for a suitable project, I naturally chose to implement a Squeak VM. This I did; the result still appears to run at http://weather-dimensions.com/Dan/SqueakOnJava.jar .

>This VM is known in the Squeak community as "Potato" because of some difficulty clearing names with the trademark people at Sun. Much later, when I got the Smalltalk-72 interpreter running in JavaScript, Bert and I were both surprised at how fast it ran. Bert said, "Hmm, I wonder if it’s time to consider trying to run Squeak in JavaScript." I responded with "Hey, JavaScript is pretty similar to Java; you could just start with my Potato code and have something running in no time."

>"No time" turned into a bit more than a week, but the result was enough to get Bert excited. The main weakness in Potato had been the memory model, and Bert came up with a beautiful scheme to leverage the native JavaScript storage management while providing the kind of control that was needed in the Squeak VM. Anyone interested in hosting a managed-memory language system in JavaScript should read his paper on SqueakJS, presented at the Dynamic Languages Symposium [Freudenberg et al. 2014].

>From there on Bert has continued to put more attention on performance and reliability, and SqueakJS now boasts the ability to run every Squeak image since the first release in 1996. To run the system live, visit this url: https://smalltalkzoo.thechm.org/HOPL-Squeak.html?launch

codefrau on Nov 5, 2021 | root | parent | next [–]

Dan published an updated version of that paper here:

https://smalltalkzoo.thechm.org/papers/EvolutionOfSmalltalk....

Would be great if you could cite that one next time. The main improvement for me is not being deadnamed. There are other corrections as well.

stevedekorte
·
11 months ago
·
[ - ]

"In JavaScript, neither weak references... is generally available". I think that was true with the old weak collation classes, but doesn't the newer JS WeakRef provide proper weak references?

DonHopkins
·
11 months ago
·
[ - ]

WeakRefs landed around 2020 (in Chrome 84), but SqueakJS was started in 2013 -- "November 2013 Project started (after seeing Dan's Smalltalk-72 emulator at Hackers)" -- and that paper was written in 2014.

https://developer.mozilla.org/en-US/docs/Web/JavaScript/Refe...

https://freudenbergs.de/vanessa/publications/Freudenberg-201...

https://squeak.js.org/

DonHopkins
·
11 months ago
·
[ - ]

Here's some stuff Vanessa and I discussed about Self and her SqueakJS paper:

DonHopkins 6 months ago | parent | context | favorite | on: Croquet: Live, network-transparent 3D gaming

Excellent article -- Liam Proven does it again! Speaking of a big Plate of Shrimp -- https://www.youtube.com/watch?v=rJE2gPQ_Yp8 ...

The incredible Smalltalk developer Vanessa Freudenberg -- who besides being Croquet's devops person, also developed Squeak Smalltalk, EToys, Croquet, and the SqueakJS VM written in JavaScript, and worked extensively with Alan Kay -- was just tweeting (yeah, it's ok to deadname Twitter!) about reviving Croquet from 20 years ago:

https://twitter.com/codefrau/status/1738778761104068754

Vanessa Freudenberg @codefrau

I've been having fun reviving the Croquet from 20 years ago using @SqueakJS . It's not perfect yet, but a lot of the old demos work (sans collaboration, so far). This is pretty close to the version Alan Kay used to give his Turing Award lecture in 2004:

https://github.com/codefrau/jasmine

Live version: https://codefrau.github.io/jasmine

This is a version of Croquet Jasmine running on the SqueakJS virtual machine. Here is an early demo of the system from 2003. Alan Kay used it for his Turing Award lecture in 2004. While working on that demo, David Smith posted some blog entries (1, 2, 3, 4, 5), with screenshots uploaded to his Flickr album.

This is work-in-progress. Contributions are very welcome.

— Vanessa Freudenberg, December 2023

Dan Ingalls @daningalls

Yay Vanessa! This is awesome. These are mileposts in our history that now live again!

https://twitter.com/codefrau/status/1526618670134308864

Vanessa Freudenberg @codefrau 7:40 PM · May 17, 2022

My company @CroquetIO announced #MicroverseBuilder today.

Each microverse is "just" a static web page that you can deploy anywhere, but it is fully 3D multiplayer, and can be live-coded. Portals show and link to other developer's worlds.

This is our vision of the #DemocratizedMetaverse as opposed to the "Megaverses" owned by Big Tech.

It runs on #CroquetOS inside your browser, which provides the client-side real-time synchronized JS VMs that you already know from my other posts.

#MicroverseBuilder is in closed alpha right now because we don't have enough #devrel people yet (we're hiring!) but you can join our Discord in the mean time and the open beta is not far away.

We are also looking for summer interns! #internships

https://www.youtube.com/watch?v=CvvuAbjh11U

And of course #CroquetOS itself is already available for you to build multiplayer apps, as is our #WorldcoreEngine, the game engine underlying #MicroverseBuilder.

Learn more at https://croquet.io/docs/ and let's get hacking :)

And as of today, #MicroverseBuilder is Open Source!

lproven 6 months ago | next [–]

Thanks Don! This is my original submission from back at the time:

https://news.ycombinator.com/item?id=35302162

HN really needs a better automatic-deduplication engine. E.g. If the same link is posted again months later, mark the original post as new again with an upvote, and the caption (if changed) as a comment...

codefrau 6 months ago | prev [–]

Haha, thanks for the plug, Don!

I just fleshed out the README for my Croquet resurrection yesterday so others may have an easier time trying it. It maybe even contribute :)

https://github.com/codefrau/jasmine

DonHopkins 6 months ago | parent [–]

Vanessa, it has always amazed me how you managed to square the circle and pull a rabbit out of a hat by the way you got garbage collection to work efficiently in SqueakJS, making Smalltalk and JavaScript cooperate without ending up with two competing garbage collectors battling it out. (Since you can't enumerate "pointers" with JavaScript references by just incrementing them.)

https://freudenbergs.de/vanessa/publications/Freudenberg-201...

>• a hybrid garbage collection scheme to allow Squeak object enumeration without a dedicated object table, while delegating as much work as possible to the JavaScript GC,

Have you ever thought about implementing a Smalltalk VM in WebAssembly, and how you could use the new reference types for that?

https://bytecodealliance.org/articles/reference-types-in-was...

codefrau 6 months ago | root | parent [–]

I would like to speed up some parts of SqueakJS using web assembly. For example BitBlt would be a prime target. For the overall VM, however, I’ll leave that to others (I know Craig Latta has been making progress).

I just love coding and debugging in a dynamic high-level language. The only thing we could potentially gain from WASM is speed, but we would lose a lot in readability, flexibility, and to be honest, fun.

I’d much rather make the SqueakJS JIT produce code that the JavaScript JIT can optimize well. That would potentially give us more speed than even WASM.

Peep my brain dumps and experiments at https://squeak.js.org/docs/jit.md.html

DonHopkins 6 months ago | root | parent | next [–]

>Where this scheme gets interesting is when the execution progressed somewhat deep into a nested call chain and we then need to deal with contexts. It could be that execution is interrupted by a process switch, or that the code reads some fields of thisContext, or worse, writes into a field of thisContext. Other “interesting” occasions are garbage collections, or when we want to snapshot the image. Let's look at these in turn. This sounds similar to Self's "dynamic deoptimization" that it uses to forge virtual stack frames representing calls into inlined code, for the purposes of the debugger showing you the return stack that you would have were the functions not inlined.

I always thought that should be called "dynamic pessimization".

Debugging Optimized Code with Dynamic Deoptimization. Urs Hölzle, Craig Chambers, and David Ungar, SIGPLAN Notices 27(7), July, 1992.

https://bibliography.selflanguage.org/dynamic-deoptimization...

That paper really blew my mind and cemented my respect for Self, in how they were able to deliver on such idealistic promises of simplicity and performance, and then oh by the way, you can also debug it too.

codefrau 6 months ago | root | parent | next [–]

Absolutely. And you know Lars Bak went from Self to Strongtalk to Sun’s Java Hotspot VM to Google’s V8 JavaScript engine. My plan is to do as little as necessary to leverage the enormous engineering achievements in modern JS runtimes.

DonHopkins 6 months ago | root | parent | prev [–]

Glad I asked! Fun holiday reading to curl up with a cat to read. Thanks!

I love Caffeine, and I use Craig's table every day! Not a look-up table, more like a big desk, which I bought from him when he left Amsterdam. ;)

---

Vanessa> Our guiding principle will be to keep our own optimizations to a minimum in order to have quick compiles, but structure the generated code in a way so that the host JIT can perform its own optimizations well.

Don> That's the beautiful thing about layering the SqueakJS VM on top of the JS VM: you've already paid for it, it works really well, so you might as well use it to its full extent!

Very different set of trade-offs than implementing Self in C++.

Vanessa> Precisely. My plan is to do as little as necessary to leverage the enormous engineering achievements in modern JS runtimes.

musicale
·
11 months ago
·
[ - ]

I always appreciated Dan Ingalls' Smalltalk Zoo:

https://smalltalkzoo.thechm.org