






A Tour of the Squeak Object Engine
A Tour of the Squeak Object Engine, by Tim Rowledge
Current draft: rowledge.pdf
Promised reviewers:
Ed Luwish (eluwish@uswest.com or ed@luwish.com)
Dwight Hughes (dwighth@ipa.net)
Helge Horch (heho@gmx.de)
Comments:
I just signed on as a reviewer, and will have lengthy comments to add very soon. It would be easier to mark it up, but PDF doesn't allow that. A few general comments first, after a first read -
1. Of course, it needs to be completed. The diagrams will be very important, the futures need to be fleshed out.
2. Whether the formatting will be retained, or changed by the publisher, there should be better indication of the section hierarchy of the article.
3. I'm not sure I agree fully with Tim's definition of a Virtual Machine, even though I know he is a leader in the field - it's a bit too broad in my view (we'll fight it out in a friendly manner) and I am not sure which of VM or Object Engine is the more inclusive term, or even if there might be some non-overlap between the two.
4. Bytecodes are not considered. If the VM were simply a matter of executing primitives it would a threaded interpretive engine like Forth, and it's more than that. The distinction I see is that primitives are for platform-specific stuff or acceleration, while the bytecodes properly are for the control of the VM itself. I know that this distinction has been breached numerous times since the Blue Book.
5. The absence of an OOP-to-pointer table is important enough to merit an explanation of why it was considered more efficient, and certainly a description of how all those object references are manipulated by the GC when an object is moved.
6. I will be reviewing the grammar and sentence structure as well, and would like to preserve Tim's wonderful English (as opposed to American) voice, but some of the peculiarities might actually be mistakes (heaven forbid!). I call for a Brit to volunteer to look at this, since my knowledge of English usage, while excellent (haha) is strictly American.
Overall, I love the article, which is why I chose to review it. Anyone who knows me is aware that the only use I have for complex systems is to figure out how they work. Practically the only Smalltalk I've ever written has been inspector refinements to better examine Smalltalk itself. [Although I may branch out a bit with Squeak and actually use it to make something useful]
Ed Luwish
Location and reason
Copied text
Suggested correction
Page 1 para 2 line 4 spelling
the langauge environment
language
Page 1 para 5 line 1 grammar
Many languages and some even applications
even some applications
Page 1 para 5 line 2 word choice
the most numerous VM.
most prevalent (?)
Page 1 para 8 line 1 grammar
All the rest is simply implementation details; however, those details are
Decide whether you want to use a singular or a plural and stick with it.
T.M.N.Irish@herts.ac.uk
2000 Mr 11 Sa
Helge Horch is fascinated by this "Inside Squeak" chapter. Its outline and focus are just right, and it's a good opportunity to set the perspective and to provide pointers for getting to know the object engine (VM, whatever).
He realizes that it's not easy to hit a moving target. We could provide more code references, snippets even, but risk being out of date when published.
He will be posting his - ever so slowly - accumulating comments and detailed remarks later. (As well as forward them to Tim, of course.)
Intriguing work, love to see more of it! Go, Tim, go!
Thanks for all the helpful comments folks. I have been working at incorporating Helge's suggestions over the last week or so and will get to others asap. I particularly apologise for overlooking bytecodes. How embarassing.
Nice chapter with lots of useful information.
The only information I would like to see in addition is an explanation of why some primitives are numbered instead of named and some hints on the creation of primitives. Also, is there a way to control garbage collection in order to avoid garbage collection during execution of time critical code.
Jerome E. Garcia
Mark's Review of "A Tour of the Squeak Object Engine"
Tim, I found your chapter really useful personally when I was preparing lectures on the Squeak VM. I particularly liked your descriptions at the beginning about what all the pieces were.
The main weaknesses I see of the chapter:
- Missing a few key ideas, like bytecodes, and perhaps a bit more analyses (e.g., VM vs. image level stuff, Java vs. Squeak VM design considerations)
- A few figures could go a long way in explaining this stuff. For example, I find that drawing memory "pictures" helps in explaining the GC algorithms.
Issues
- I loved the start and description of what a Virtual Machine is.
- What do you think about adding somewhere the contrast of what goes into the image (easily changeable but slow) and what goes into the VM (additional complexity of porting, fast)?
- Would some timings make sense to add somewhere? Just how fast/slow is the Squeak Object Engine with the current design? I know my students' jaws drop when I show them the bytecodes/sec and messages/sec figures for the Squeak VM. They tend to get hung up on "That's awfully inefficient – you've got two extra message sends there!" Then I show them the number, in hundreds-of-thousands, of messages/second that the VM handles, and they're willing to get back into talking about DESIGN before efficiency...
- "Message Sending" - I was expecting to see something about Bytecodes in here, to contrast with messages as primitives.
- "How the Squeak VM differs from the Blue Book Design" But our readers won't have read the Blue Book. You don't have to repeat the whole Blue Book design. Rather, I would describe the Blue Book design for the elements that are clearly DIFFERENT/CONTRASTING.
- CompiledMethod Format - "NCM, intended for Image 3" means what?
- I'm not sure how much you need to get into the generated/non-generated parts of the VM. Take a look at Ian's porting chapter – I think he deals with some of this. With Andy's chapter on Slang, I think the opening is there for you to focus on design level stuff – why was it done in this way?
- I do hope that you can get into the new pluginized stuff and how it differs
- What do you think about saying a word or two about the standard Java VMs and how they differ from Squeak's VM? When I was digging into it, I found some of the tradeoffs quite interesting. For example, exceptions are handled in the Java VM, which means that it's fast but continuing after an exception is impossible – all the contexts have been thrown away in the search for a handler. Also, the issue of placing contexts on the stack vs. inside the context is one that the Dolphin people found was really important for speed improvements. Would it be worthwhile to go into?
From Tim
Newer version in html is at http://sumeru.stanford.edu/tim/pooters/OE-Tour.html
Still needs diagrams and a distressing amount of other stuff, but it should answer some of the points offered by reviewers
tim
From Jörn Eyrich:
Note: I had to use http://sumeru.stanford.edu/tim/pooters/OE-Tour.HTML to access it
A very engaging read. I'm looking forward to the addition of the first figures.
Details
- 3. "all we do is...": don't we also assign variables, for example?
- 3. "calling primitives ro perform": _t_o perform
- 3.1. "three kinds of object": object_s_, but arent't there more? indexable vs. fixed pointers, combined, etc. (not to forget there are still those special compiled methods...)
- 3.2. (6.) "stack pointer (sp)": was it mentioned before that the VM uses a stack?
- 3.2. (6.) "sender and home": please explain these
- 3.2. (6.) " ... the new stack..": one period too much
- 3.2. reprobing in the method cache: I don't understand this; maybe you could elaborate on the example with the alternating Small- & LargeInteger collection
- 3.2. "dynamically translated methods": has this been explained before? in 3.3 you mention it; I would generally prefer to have the Bytecode part before the Messagesend part because I'm very much a Bottom-Up person; most people seem to prefer learning top-down though, so maybe you should keep it that way
- 3.4.1. "sq{platform}Windows.c": I would leave that out in this place, as you explain it in context later
- 3.4.3. "named instance ... and ... indexed variables": you should intoduce these terms in 3.1 where you list all possible object types
- 3.4.3. "at forst glance": first
- 3.4.3. "you might expcet": expect
- 3.4.4. "...protocol for examples": for example.
- 4.1. "Java": hm, my impression was that Java does not have the problems you describe above. it uses runtime type information to check if a cast is valid and throws an exsception if not
- 4.3. headline: please use "and" instead of "&"; i would prefer "Threads and control structures _are_ progarmmer-accessible" or "programmer-accessible threads and control structures", or even "access to t & cs"
- 5. maybe you should mention what the Blue Book is (maybe in 3.5, where you reference it)
- 5.1. "Three word": please also state the format of the third word (30b size, 2b header format acording to "Back to the future")
- 5.5. "BLT": I think Blt looks nicer
- 5.5. can also handle alpha blending in suitable depth bitmaps_, for example_ to give anti-aliasing: I think you can use it for other stuff, too
- 5.5 "Alice": add a reference to the appropriate chapter
- 6.1"the Ballon 2 and 3D graphics primitives are generated from avariety of classes in the Balloon3D hierarchy": the Ballon 2D stuff is generated from there, too? hm, maybe! just wondering...
Tuesday July 25 - newer version of OE-Tour.html is online. No piuctures yet.
From Ed Luwish:
It is the last minute, I know.
Fortunately, I found the document excellent - any suggestions I make can be safely ignored for the first edition of the book if it is too late to incorporate them.
The chapter is completely understandable. I am not knowledgeable enough to determine whether it is accurate or complete. Its completeness is dependent on the reader's reason for reading it, I suppose. It is more than adequate for someone who wants to understand the VM and basic object creation/destruction and message passing mechanisms. It seems to give enough pointers to Squeak sources for those who want to actually modify, debug or extend the VM.
Specific comments, musings and suggestions:
1) I confess that I do not understand the word "reify", which may or may not be a commonly understood term in OO circles. In context, I think it refers to the exact bidirectional mapping of internal structures to their humanly understandable models in the source. This enables the reconstruction of source from internal reality without going outside the system (e.g., without resorting to reverse engineering tools). Thus debuggers written in Squeak have access to the atomic structure of the system. If this is a complete misinterpretation of "reify", and if it is not commonly known to the likely reader of this book, you may wish to expand it a bit or use different wording.
2) The diagrams are great. Some of them seem to be taken directly from the Blue Book or the Squeak version of same, and contain more detail than is explained in the text. It is a bit annoying that the Large Context flag is never explained in the text, but I would not either add an explanation nor would I edit the diagram - maybe a footnote that some details are irrelevant to the chapter but are described in some cited source.
3) Musing - could the Smalltalk debugger someday be able to debug the VM?
4) I would like to know where the chain of MethodContexts begin - I have always been fascinated by the bootstrap process. I know that Smalltalk is started by freezing its contexts prior to shutdown, with a fixed object pointing to the currently executing MethodContext - but I would still like to know where it all begins. I loved Dan's (I think) section in the Green Book that describes the SystemTracer and how generations of Smalltalks bootstrapped themselves. Is there a chapter in the Squeak book that covers the startup/shutdown/snapshot process, complete with the known objects in "low core" and the ProcessScheduler, and how it all brings things up? If not I would like to know enough to start my own investigation - your chapter hints at it- every MethodContext has a pointer back to its sender's MethodContext, but what's at the end of the rainbow?
5) In 3.3.1 you refer to Interpreter class > initializeBytecodeTable. Is there a corresponding method that dumps the primitive codes?
6) Musing - has anyone determined the minimum set of primitives necessary to implement all the rest of Smalltalk? Many are simply there to accelerate frequently executed methods.
7) In 3.4.5 the sentence beginning with "Since changing structures such as..." does not seem to be grammatical, although I know exactly what you were trying to say.
8) Is there a chapter on the Foreign Function Interface? Or ImageSegments and loadable plugins for that matter?
9) General observation: I love all your references to methods in the source code - it saves a lot of writing on your part, is more accurate in this changing Squeak universe, and gets the reader in the habit of reading Smalltalk source. It gives the reader starting points for further study - the body of code is overwhelming without these.
10) In 3.5.1 it took me a while to figure out that "over-written objects" really means storing a different oop in an object's instance variable slot. Obviously you can't decrement the count of an object that doesn't exist anymore. The rest of the section, including the diagrams, makes everything clear.
11) Your section on Garbage Collection, in general, is the best I've ever read.
12) In Generation Scavenging, is an "old" object (i.e., a candidate for the Remembered Table) one that is resident in the Oldspace? Or can it be an object in a Survivor space? Does the Remembered Table simply document all objects that are older than the ones they reference? It seems that (unless combined with marking or reference counting) objects in Eden space can be scavenged while still in use if the criteria for membership in the Remembered Table are too restrictive. I wouldn't have asked this question if you had not written the section as well as you did, so take this as a compliment rather than a complaint.
13) Musing - is there any good reason why the header type code is duplicated in each word of the header? It seems that it only needs to be contained in the first word (the one that is -1 word offset from the oop). I doubt that there will ever be an object big enough or an image with so many classes that the additional two bits will ever be needed, but the redundancy bothers me.
14) After the introductory section on Garbage Collection, the discussion of Squeak's implementation was absolutely incredible. All I could come up with was praise for you and praise for the developers of this elegant system. It never occured to me that interactivity generates more garbage, hence is a good time to run the GC more frequently - I only thought it was opportune because the pauses will not be as noticeable, as, say, when playing a movie or soundbite. The only question I ask (I think I know the answer) is whether the old/new boundary is reset when an image is saved/restarted, or is it fixed in some way.
15) The "Things for the future" section was very thought provoking. Naturally I zeroed in on the Smaller/Embedded systems subsection [I am still attempting an EPOC port, which is requiring me to learn more C++ than I ever wanted to, hence the long delay]. You may want to cite the chapter in the Green Book on the SystemTracer, which is not documented anywhere else except the code, to the best of my knowledge. Also, where does InterpreterProxy (or is it InterpreterSimulator?) come in - isn't it capable of manipulating an image other than the running image? Not that I am adverse to "more sophisticated tools" - fun stuff to develop (at least to me).
a) For smaller systems, I would also want to remove unused primitives! It not only makes the VM smaller, but makes it easier to port and debug in an embedded environment.
b) The renaming of message selectors as integers (classes and named variables as well) has an additional benefit to the application developer - it provides a small but meaningful level of encryption to the code, making it less likely to be reverse-engineered.
At this point, all I can say is "Bravo!" (I attended an Opera performance yesterday). You wrote an outstanding contribution to VM literature, and hope to see more of the same in the future. Furthermore, it contributed to my understanding of a confusing but fascinating part of the Smalltalk universe. Since I am an inside-out type, it also increased my grasp of messaging and therefore OO in general. Your next mission, should you accept it, is to explain the Class/Metaclass conundrum - I think I understand it, but I couldn't communicate it as clearly as you.
Thank you for your contribution to the literature of Smalltalk.
Ed
Link to this Page