Sunday, December 27, 2020

How to build a laser printer from nothing at all

Foreward

This is completely from my perspective. Of course there was a lot more going on than just what I did and I don't want to diminish anybody else's experience. This was an accomplishment of a lifetime for a whole lot of people I suspect, and it's not my intention to take anything from that. Also: my timeline memory is horrible. There are bound to be mistakes.

The 1794 Laser Printer Controller

Gordian is born

In 1985 Gregg got a contract with a company named Talaris down in San Diego to build a laser printer controller. Gregg, myself, Andy and another named Paul then formed a new company called Gordian. We had a Microvax-1 and that was about it. Gregg and Andy were hardware engineers, and Paul was mainly absent so it was just me holding down the software front. The design was pretty ambitious with a dual processor design with a main processor to interpret printer languages (hppcl, postscript...) into graph primitives and a graphics processor to render the graphics into the bitmap and enough video ram to feed it to the printer. It was full of IO devices including a parallel port, a serial port and most importantly a SCSI bus which allowed us to attach things like disks and at first a SCSI ethernet port. That misadventure eventually led to adding a LANCE chip for native ethernet. There were also two ports on the printer to attach storage in the form of a card full of eproms which Talaris used to ship packs of fonts.

So there was a lot going on here and it's just me at first. Well, not exactly just me because the agreement we had is that we'd provide the infrastructure and API for the hardware and graphics, and they'd provide the printer language interpreters. There were no deep discussions between us hammering out API's or anything like that. I mostly decided what I was going to do, and they wrote to that. I'd end up in San Diego to talk with some of their engineers about what was going on -- usually Rick and Barry, but I knew quite a lot of them, and of course Cal the CEO was the tester and torturer in chief. It was fairly hard for them to argue on the main processor side though because I decided quite organically that the easiest thing to do was to just emulate libc and Unix IO, so that was something of a no-brainer. The graphics kernel was a different story: there was no standard graphics API at the time, though X Windows was starting to come into the picture. I didn't know enough about it though, and X11 was a couple of years away. Looking back, it seems hard to imagine a medium sized company would put so much trust into a barely formed startup and its one software guy to invent an API that they could write to with not much back and forth, but that's I guess just how it was back then: we were all just making things up as we went along none of us quite grasping that what we were doing was actually hard and would cause bigger companies to go into insane Mythical Man Month mode.

So Gregg and Andy went about designing the hardware while I, as the title states, started from nothing at all. We had compilers for the two processors (but just barely for the GSP since TI was still working on it), but other than that not a line of code that could be imported, and no internet to beg, borrow, or steal from. Another thing to realize is that the debugger, Punix, graphics kernel, and storage were the original deliverables and they were -- get this because it never happens -- on a tight deadline. I forget how long we were given, but a year sounds about right including designing and debugging the hardware as well as Andy designing a gate array. Gate arrays weren't what they are these days where you can rent one out by the hour on AWS. It kept Andy up many many nights. Both Andy and I were working 100 hour weeks. I don't know how we managed any amount of life at all.

The Debugger, monXX

While the hardware was being designed, I decided that I had a lot of work ahead of me and that I could use some of the lag time before board bring up to build a debugger since the tool chain obviously didn't have one. The obvious choice was to make use of the serial line. I went on to invent a protocol for debugging remotely for things like downloading code, examining memory, setting breakpoints, calling functions, performance analysis and the like. Most importantly I figured out how to digest the symbol table so that you could print not only variable, but entire structures using normal C syntax, like foo->bar.baz.

It turned out to be immensely helpful not just for me, but for Talaris too since they were in the same boat as me trying to figure out how to bootstrap themselves to write embedded code which their engineers had not done to my knowledge. There were actually two distinct debuggers, one for the 32000 (mon32) and one for the GSP (monGSP). For the GSP it had to be gatewayed through the 32000 to get to the GSP. This all happened behind the scenes for the serial line as it was still an operational serial port used to send the printer pages to be printed. 

The debugger was ported to many other processor architectures over time including the Mot 68k and the MIPS architecture. The most significant improvement, however, was using the network. At first I just used my own raw ethernet frames because that was the quickest and easiest way to do something on Vaxes running VMS. After a while, however, we got MIPS workstations which ran Unix and inexplicably they didn't support raw ethernet access which pissed me off. So I was forced to port it all over to IP as well. The first time I remotely debugged something over the internet -- I think it was in New Zealand -- was nothing short of amazing. Today you'd call that a backdoor, but there was no such thing as security back then.

Punix, the little OS that could

The National 32000

Every programmer wants to write their own multitasking operating system, right? Well at least back then it was on the todo list of many a programmer looking to feather their cap. What does Punix mean? It stands for Puny-Unix. Everything a normal Unix has that doesn't require an MMU (memory management unit). MMU's were rare in embedded software generally, and not terribly helpful when there is no guarantee of a disk. It would have been handy for the debugger for watchpoints, but we managed.

As I said above, I chose a Unix environment because it already had documentation, and even better allowed Talaris to start writing their code: if it was in libc, it was in Punix. That meant that I had to write libc from scratch. All of the string functions, malloc/free/alloca, stdio, ctype and the like. It wasn't a full implementation of libc since for one we didn't have a FPU (floating point unit), but it had quite a bit of it. If I or Talaris needed something, I'd divert whatever I was doing and write it. Suffice it to say that it was enough for our purposes, though I suspect that if Talaris found something they needed they'd often just roll their own instead of bugging me. Remember: no internet, no git pull request.

At the time I started writing Punix, I was actually not familiar with the actual Unix operating system. All of my experience was on VAX/VMS. So in the kernel itself, it looked a lot more like VMS than Unix but for the most part it looked like something entirely different and entirely me. The main thing I took from VMS was their class driver/port driver architecture. Port drivers drove the physical hardware for DMA, interrupts, etc and provided a standardized interface to the class drivers. A class driver on the other hand used port drivers to do the low level work, and they did the hardware independent work. Think file systems as an example. The little font cartridges used the same file system that I laid down on the SCSI disks, for example. The main class drivers are called out later.

VMS had another thing that I latched on to which was that they used process based helpers for networking. It was the perfect excuse to make the OS multi-tasking, or what today we would call multi-threaded since it was all running in the same address space.  Threads, the term, had not been invented yet. It was a fully preemptive OS in that there was a current process, a run queue, and a wait queue. Since there was no thread interface to emulate, I just emulated fork(3). This may sound sensible, but remember there was no MMU and thus the need to clone enough of the stack in the child process to allow the child process to return. The first time implementing this definitely caused many an explosion until I got it right. The  remaining difficulty was context switching. To get into the wait queue, eg block, there was a function called sched() that backed up the registers into the process header and selected the appropriate process from the run queue to make it the current process. When an interrupt came in from a device the port driver would know who was waiting for that device and move it over to the run queue to possibly be context switched. The last thing was the timer interrupts to switch out processes at equal priority so they could get cpu time too.

The last notable part is that it had resource locking (ResLock) which implemented a mutex in a critical section of code. I get the impression that a lot of kernels use very inspecific mutexes while it's in kernel mode to lock out other processes, but Punix was very fine grained in that each driver would have its own read and write mutex. That and we didn't have an MMU, so there was no kernel mode in the first place. This would go on to cause quite a bit of hilarity with race conditions but fortunately I have a knack for visualizing race conditions so they were usually dispatched relatively easily. Would that were the case of memory corruption and the Heisenbugs they caused. They were the source of much, much anguish and is a reason I would rather deal with garbage collected systems, and strings that don't overflow these days.

The Graphics Kernel

The TI GSP

The GSP (Texas Instruments Graphic System Processor) was an odd beast. First off it was bit addressable rather than the more customary byte addressable. It also had a bunch of graphic primitives in its microcode which were quite convenient like being able to move a character in a font onto the bitmap with one instruction, clipped and masked if needed. This was tremendously helpful for me as I had absolutely no background in graphics of any kind. It was also impressively fast for the times: a 50Mhz clock which caused Gregg to speculate that surely this madness cannot keep going on because there are laws of physics to be obeyed. Since TI expected it to be used as a co-processor mainly, it had a host interface where another processor could reach inside its memory space and read and write it as well as interrupt the GSP to tell it that something happened, which is what the 32000 did. For bitmap graphics you need to draw the pixels into something so it had a bank of memory the size of the bitmap rounded up to the next power of two. Since the GSP was bit addressed, you could treat the bitmap as a literal rectangle and the processor would figure out how that translates to actual ram addresses. 

Did I mention that I had no clue about graphics? Paul -- the one who was mostly absent -- had a bunch of literature on raster graphics and we would sometimes talk when he showed up, but I was mostly on my own. I had to figure out and write Bressenham's algorithm (drawing lines) and the various algorithms for curves like ellipses and circles. Getting to understand non-zero winding numbers for filling objects with patterns was another new thing, and of course just understand the tools that the GSP brought to bear. But the base level graphics were only part of the problem.

The first order of business was to get commands and data in and out of the GSP. Remote Procedure Calls ala Sun's NFS were all the rage at the time so I'm like what the heck. So I designed a queuing mechanism for the RPC calls that were being generated by the Talaris designed language interpreters. They'd keep piling data into the queue and the GSP code would busily empty the queue by executing the RPC's on the queue. If the queue got too full the 32000 would back off and wait for the GSP to tell it that it was ok to start sending again. This led to an ominous debugging message "Queue Stalled", but all working as intended -- it just meant that they they are sending me complicated stuff that takes a while to render. The front end of the RPC's were the actual API I made for Talaris to work against. There were a lot of primitives like line, polygon, text string, circle, change fonts, etc, etc, that would be immediately obvious to somebody who's worked on a html Canvas element.

Another aspect of the design involved fonts. Fonts can take up a lot of memory and you need to have them around so as to execute them. For this, I designed a faulting mechanism where the GSP determined if it didn't have font loaded and if not interrupt the 32000 to fetch the font from whatever storage it was stored on, either the eprom disks, or on a hard drive. This worked really well as the fonts on the GSP were really just a cache and it could decide if it wanted to toss a font to make room for another. Keeping track of that on the 32000 side would have been a nightmare regardless of who wrote it which most likely would have been me, so it saved me a lot of work.

One of the cute side effects of the GSP is that it since it treated memory for the bitmap as a rectangle, there was memory available on the side of the page depending on the width of the paper since the actual dimensions were rounded to the next power of 2.  This turned out to be an ideal place to stash the fonts and came to be known as "Condo Memory" which I believe that Barry at Talaris coined. And it could hold a lot of fonts at once and was perfectly suited for the GSP architecture to blit the characters into the bitmap at very high speed. 

Probably the most clever thing about the design was something I called the band manager. With a laser printer, you send each line bit by bit to the printer serially. There is a kind of ram called video ram that unsurprisingly does that for video displays so the deal was that you build the bitmap in regular memory and transfer a band at a time via the video memory which then serializes it and sends it to the laser's beam. Normally for high speeds you'd want two bitmaps so that when you're outputting the current page you're working on for the next page. But memory was expensive those days so that wasn't an option. 

I had gone skiing alone for my sanity one day and after I came back I decided to go to a bar and unwind. I started thinking about the double buffering problem (or lack thereof) and came upon the idea that I could actually write behind the beam of the laser printer to speed things up a lot. I literally used a cocktail napkins to sketch out the design so that I didn't forget the next day. It went like this: when a graphic drawing command from the queue was executed, the first thing it did was figure out the extent of the Y beginning and ending and queued it on the topmost band for drawing. If it ended up in a band that was closed because the laser hadn't got there yet, it just waited until the beam was past. The other key thing it did is that it didn't wait until it could draw the whole object, but instead drew what it could when a band became open as it was transferred to the video ram for output. It did this by setting up a clipping window that matched the part of the bitmap that was open to being drawn in. If it didn't complete, it was queued up for the next band that was closed to be executed again. This may sound wasteful but it wasn't: the first thing you need to do is clear the bitmap from the previous page, if you waited for it to be able to do that you it would defeat the entire point of writing behind the beam as it were. This sped  the printer up enormously and is why it could drive a 100 page per minute printer at full speed. 

So yeah, the GSP was a nifty little package and full of cool tools, but there is a lot more than just some graphics primitives to designing a graphics kernel on a co-processor. All of those I needed to work out on my own, and while I had help later for a bunch of other tasks, the GSP code was a no-go zone as far as everybody else was concerned. That was my baby and my problem alone.

Storage

The storage subsystem was relatively simple. Since it was mainly for fonts and things like that speed wasn't really much of an issue. The file system I created was a straightforward implementation of a hash table for the directory -- I didn't care about sorting because nobody was doing an ls -- and I don't think it allowed for sub directories. The main challenge was getting the SCSI driver to work. SCSI has what is called an initiator and target. The initiator is the OS code, and the target is the device. I had this notion that the initiator should be in control of the flow and that if the device didn't flow as expected something was wrong. It took me a long time to understand that the target was the one that controls the flow and that the initiator had to follow it. It wasn't until the printer was well into production that I figured out my error and things became a lot better. I did put some effort into speeding it, but it wasn't much of a priority. The only thing I didn't do is boot up from the disks. Maybe I'm misremembering with the flash based disk I designed, but a firmware upgrade involved putting a fresh set of eproms on the motherboard.

Networking

My old foe: the Kinetics Box
 

The really cool thing about the Talaris laser printer was the networking aspect. We designed it with networking in mind from the very beginning, and we were if not the first laser printer on the block with ethernet, we were second. For all I know, we were the first example of the Internet of Things. The networking part started as a next phase as I recall, but it was very soon after the initial shipment we started working on it. We didn't have an onboard ethernet solution at the time, but we found this box from a company called Kinetics that was a SCSI ethernet adapter about the size of a shoe box, with about the same amount of charm. It got the job done, but given my wrong interpretation of how SCSI is supposed to flow gave me fits. I'm sure it was buggy in its own right but I was not helping the situation. We eventually designed a version of the controller with the AMD LANCE chip onboard and much rejoicing ensued.

One of the interesting concepts I came up with was the idea of borrowing buffers from the OS. We were very focused on speed of the networking because although our initial targeted printer was 15 pages a minute we were planning to target a 100 page a minute printer that needed to be fed at a rate that could keep up with printer. So the networking needed be very fast and efficient so the interpreters had enough CPU ticks to get their job done too. Remember that the 32000 was basically a 1 MIPS box in a world of 500,000 MIPS processors now. Drivers would have a set of buffers big enough to receive an ethernet packet (1536 bytes), and you'd need to keep enough of them around to buffer any burst. The naive Unix way to transfer the payload is using read(2), but that incurs a copy of the buffer to the user's buffer, and of course the allocation of the user's buffer altogether. For the network drivers that were just ordinary processes, that seemed needlessly wasteful since they were just an intermediary. So in comes the readsysbf and freesysbf ioctl's to the rescue. The network driver would borrow the buffer to do its protocol things and give it back when it was properly queued up for delivery to the process doing the printer interpreter job.

One interesting thing is that apparently neither me nor Talaris was aware of BSD sockets, so firing up and connections and listening on a socket were done in completely different ways. Had I known about connect, listen, and bind I would have emulated them, but I didn't.

TLAP -- The Talaris Printer Protocol

Let's say I was sort of naive at first. Years before I had written networking for a point of sale terminal and its controller and knew that it required retransmit logic because the serial drivers (rs429, rest in hell) were flaky. For some reason I had forgotten that and we went some time before realizing that, golly, ethernet can drop packets too! Oops. I don't really remember a lot about the protocol other than it being pretty simple minded. Since I don't have the source any more, I don't have much to jog my memory. It did require protocol agreement between me and the engineers at Talaris since they were writing the host part of the protocol which attached into the print queuing systems for VMS and Unix. If I recall correctly, this was one of those cases where I mostly told them what I did and let them have at it after. Since there was no internet or email at the time, meetings required hour long trips to San Diego, so we tried to keep those as minimal as possible, usually once a month or so. Again in a world today where everybody gets their $.02 worth of kibbutzing, it pretty surprising how much was empowered to me to figure things out. The protocol ended up working just fine and drove that 100ppm laser printer like a champ. 

TCP/IP

 By the time we started working on a TCP/IP there really was a we. We had several customers at the time beyond Talaris and they wanted TCP for their application, in particular a terminal server we were working on. A terminal server is a box that takes user keystrokes on a keyboard of a computer terminal and sends them back to the host  which sends the output so you can do remote logins. Talaris didn't need that, but they were interested in compatibility with lpr which is the Unix command to send files to printers, and had a way to do that remotely using TCP. Lpr as a network protocol was rather clunky and not especially great at its task. I thought very hard about bringing this up at a new fangled thing called an IETF meeting which was happening at USC around that time we work working on it. I very nearly drove up there, but for reasons I don't remember -- inertia most likely -- I didn't go. I have no idea how receptive folks would have been at the time because there was so much to do. But it was pretty novel for a device that wasn't a host to be on the internet so they may well have been intrigued. Of course, IETF hadn't even got around to designing PPP and instead had SLIP for IP over serial lines so we're talking about a metric shitton of things that needed to be engineered, and fast.

My other engineers did a lot of the initial work, but I got my fingers in it too trying to deal with speed and many other problems. So it was definitely a collaborative affair which was sort of new for me. With one memorable bug we couldn't reproduce, so I got flown out to San Antonio to see it for myself. What was memorable is that the place was basically an airline retrofit shop and one of the retrofits in the hanger was one of the 747's that shuttled the space shuttle. I was very impressed. The bug, however, was less impressive: some edge case with a TCP checksum as I recall.

DEC LAT 

DEC's LAT protocol was mainly designed for terminal servers which we were working on at the time for another company (our terminal servers still sit in data centers to this day apparently). LAT was really optimized for terminals and the way they behaved, but they also had provisions for printers as well so this was a win for Talaris as the could natively support VMS without any need for a protocol driver as was the case for TLAP. LAT wasn't expecially fast though because terminals maxed out at about 19.2k baud so it wasn't a priority. TLAP still have a big advantage when you needed speed so it remained.

SMB, Novell Netware, and Appletalk

We built a lot of network stacks. I was involved with all of them, but the Novell one was mostly mine. Novell was a pretty simple minded protocol as I recall and it had an actual protocol to talk to printers which I implemented. I can't remember whether Talaris implemented it -- they must have because I don't think we'd do it on spec -- but the thing I do remember is that we licensed our code to Apple for their laserwriters. I was never quite clear whether that made it into shipping product.

Conclusion

It's sort of amazing what you can get done in a year or two when you are given free reign and few meetings. The first three sections of monXX, Punix and the graphics kernel were mostly up and running and out the door by about 1987 as I recall. There were a lot of things beyond the laser printer going on that took up my time including the terminal server I spoke of, but also a graphical terminal that was a knock off of a DEC VT340. The industry was transitioning at the time from terminals to X Window terminals, but very quickly to workstatons so that ended up being a huge amount of effort without much to show for it. The laser printer did really well though and sustained Talaris for years to come. Not bad for starting out with nothing at all.


No comments:

Post a Comment