Wednesday, December 30, 2020

Are Mailing Lists Toast?

Definitely Toast
 

From the very beginning when IIM (Cisco's email authentication draft) was merged with DK (Domain Keys from Yahoo!) to became DKIM, we both envisioned a sender signing policy module which allowed a domain so say "we sign all of our mail, so if you have unsigned mail or the signature is broken that's purportedly from us, that's not cool". Since we were all experienced with internet standards it was plain as day that there was a serious deployment problem since mailing lists mangle messages and thus broke signatures. Mailing lists would thus throw a wrench into the policy gears. This was 16 years ago.

Our effort at Cisco was driven primarily by phishing, and spear phishing in particular. We had heard tell of some exec at another company falling for a spear phishing attack, and we didn't figure our execs were any more clueful so that was pretty frightening. Since Cisco had exactly no presence in email at all, it also gave some plausible deniability as to what we were up to. We weren't looking to get into the email biz, but we weren't not looking either. 

We formed a group which included Jim Fenton and me, and created a project to sign all of the mail at Cisco with a goal of being able to annotate suspicious email purporting to be coming from Cisco employees. This required finding and signing all legitimate Cisco email. So off we went trying to find all of the sources of unsigned email in the company so that we could route it through the DKIM signing infrastructure. We didn't have the nifty reporting feature of DMARC so it wasn't the easiest thing to figure out. It was made much worse because Cisco had tons of acquisitions so there was a lot of legacy infrastructure left over from them, and who knows whether they were still using their mail servers or not. This was very slow going to say the least.

Most of the DKIM working group was pretty cavalier and dismissive about the mailing list problem. A DKIM signature from the mailing list would somehow solve the problem. We just needed to somehow trust that mailing list. Coming back after a dozen years and a lot of skiing under my belt, it seems that the previously unsolved problem remains unsolved: nobody knows how to "trust" a mailing list.

When I was still at Cisco I used a bunch of heuristics trying answer the question of whether the originating domain signature could actually be reconstructed and verified. It was met with a lot of derision and hysteria from the usual attack poodles, but I didn't care and kept trying to improve my recovery rate ultimately achieving northward of 90% recovery. It was interesting and tantalizing that the false positive rate was close enough to be worth considering marking suspicious mail up with warnings. The next step would have been to differentiate what that broken signature traffic was, where it was coming from, what it was doing that was not reversible and ultimately whether we cared about that case enough. There was no silver bullet that we could find, and we definitely didn't know how to  "trust" mailing lists.

So as we slogged on with hunting down our infrastructure and me still hacking on the heuristics, Cisco decided that it was interested in the email security angle that we had pioneered. We did diligence on a bunch of companies and settled on Ironport just down the peninsula from me. Cisco bought them and my group it was with decided -- without my input -- that they were switching to some wacky telephony thing that I had no interest in. When Ironport wouldn't transfer me, I wandered around a while and then decided it was time to quit and ski. The one thing I regret is that we didn't get a chance to finish off the research side of the problem, especially with mail lists to separate out the enormity of the remaining problem which seemingly nobody knows to this day.

Fast forward 12 years and I found this curious creature called and ARC-Signature and ARC-Seal in my email headers. The signature looked pretty identical to a DKIM signature which I found very odd, and it also had an ARC version of the Authentication-Results header. So what's going on? I found that ARC is an experimental RFC which was seemingly trying to solve the mailing list problem. Again. But using even more machinery in the form of the ARC-Seal. What is the Seal's purpose? I determined it is so that it can bind the ARC auth-res to the ARC signature. Why are they doing that? Because it's supposed to be of some value to the receiving domain to know what the original assessment of the originating DKIM signature was. But DKIM can already do that if a mailing list resigns the email. It was all a mystery.

So wondering whether there was some secret sauce that I had completely overlooked all those years ago, I posted on the DMARC working group list for the first time asking what that secret sauce might be. "It requires the receiver to trust the mailing list" I was told. I said that you can do that right now if the mailing list resigns the mail with DKIM, which I assume most do at this point (and screw them if they don't). Why does this require a different signature than a DKIM signature? They wanted to bind the ARC auth-res to the signature I was told. Why can't you just add a new tag to DKIM assuming that is a problem at all, which I am not very convinced it is? Never got a good answer for that. And most importantly why does the receiver care about the original auth-res in the first place? Never got a good answer for that either.

So this all boils down to trusting a mailing list at its heart. It's not clear to me whether some receivers are using the list's DKIM signature to bind it to some whitelist or reputation service. Somebody as big as Google could easily roll their own and just not tell anybody about it since it's completely proprietary just like the rest of the spam filtering. So on the outside we really don't know whether mailing list reputation is a Thing or not, but the assumption that the working group seems to be operating on is that it is not a Thing and remains a previously unsolved problem. I am willing to believe that assumption since it seemed like a hard problem all those years ago too. That and Google itself is participating with ARC, so that suggests they aren't any better off than anybody else. But who knows? That's part of the problem of being opaque is that nobody on the outside can scrutinize whether there is some magic thinking going, or whether there actually is some there three.

So here we are over a decade and half after DKIM's inception right back where we started. As far as I can tell, ARC doesn't bring much of anything new to the table and the original auth-res doesn't address the fundamental problem of trusting the mailing list or not. Whether the original signature verified on the list or not seems completely beside the point: if I trust the mailing list, why do I even care whether it verified or not? If they are forwarding messages without valid originating signatures, that is a very good reason to not trust them for starters. Any reputation system needs to take into account a lot of factors, and requiring signature verification at the mailing seems like table stakes.

Mailing lists. Again. We are completely wrapped around the axle of being able to provide pretty good forgery protection from mail from malicious domains, but domains can't seem to pull the trigger of asking receiving domains to toss mail without valid signatures because it would cause mailing list traffic to be discarded as well. There have been other drafts beyond my experiment on signature recovery that have not been well received, and have languished as probably they should. The worst part of this problem is that there is no way to determine what success looks like. How many false positives are acceptable? How can we assess quality and quantity? Cisco, for example, would grind to a halt if it's upper level engineers working on IETF standards ceased to work because they implemented a p=reject policy (I just checked, it's p=quarantine 0% percent which is basically p=none with a little bit of attitude). 

Been toast a little too long

So pretty much we're in the same trenches as we were from the beginning with no forward progress, and no forward progress on the horizon. It then caused me to question the unthinkable: are mailing lists actually worth saving? They are ancient technology that never had security constrains from the beginning and they are operating in a world where security is a Must requirement. Likewise, it's not like there aren't alternatives to mailing lists. Web based forums have been around for decades, so it doesn't even require new infrastructure. Lots of things that once were a Thing are now gone. Take Usenet for example. Usenet was revolutionary in many ways and provided the first thing we would recognize as social media on the nascent internet. But it's been sidelined by the new social media companies, but one of the biggest problems is that it couldn't adapt to spam for whatever reason. Probably not technical and more likely neglect, but it is basically dead now. The world moved on, and now it's just a relic of the past. It's not even that the new tech has to be particularly good: Reddit is the most similar to Usenet of the social media platforms and it is a terrible and buggy reimplementation of Usenet, yet it is popular and Usenet is done.

So are mailing lists a relic of the past too? Going by the total volume of email, it sounds like they're down in the noise from what I've heard -- like 1% -- but it would be good if some of the big providers stepped up with some concrete numbers. For many companies, perhaps most, losing access to external mailing lists wouldn't even be missed at all. Us in the internet community are profoundly attached to mailing lists because that's how business is done. But let's be serious here: we are complete outliers. Changing over to some off the shelf forum software while not trivial, is certainly well within the capability of internet community if it needed to. The same for other lists. It quite possible that a hybrid system could even be done in transition where mail is gateway'd to the forums or some such.

In conclusion the bad security properties that mailing lists and the like are causing better security to not be deployed. So yes we should just ignore mailing lists and let the people who run them them adapt however they feel fit. After 15+ years since the advent of DKIM-like technologies and the ability to determine who is sending mail and discover what that domain desires on receipt of mail that doesn't verify, we need to just move on and accept that the desire for verifiable mail from domains is more important than a legacy technology with ample alternatives. 

This is not to say that mailing lists need to be burned to the ground or anything drastic. I've noticed that some lists have taken to re-writing From headers seemingly from domains with a DMARC policy other than "none". This is pretty awful from a security standpoint because it further trains people to rely on the pretty name rather than the email address, but to be fair that is just building on an existing problem and MUA's are the real culprit of a lot of this since they do little to help people know who they are actually talking to. That said, none of this would be necessary if we used a more modern technology.

Sorry mailing lists. But it's come to this.








Sunday, December 27, 2020

How to build a laser printer from nothing at all

Foreward

This is completely from my perspective. Of course there was a lot more going on than just what I did and I don't want to diminish anybody else's experience. This was an accomplishment of a lifetime for a whole lot of people I suspect, and it's not my intention to take anything from that. Also: my timeline memory is horrible. There are bound to be mistakes.

The 1794 Laser Printer Controller

Gordian is born

In 1985 Gregg got a contract with a company named Talaris down in San Diego to build a laser printer controller. Gregg, myself, Andy and another named Paul then formed a new company called Gordian. We had a Microvax-1 and that was about it. Gregg and Andy were hardware engineers, and Paul was mainly absent so it was just me holding down the software front. The design was pretty ambitious with a dual processor design with a main processor to interpret printer languages (hppcl, postscript...) into graphics primitives and a graphics processor to render the graphics into the bitmap and enough video ram to feed it to the printer. It was full of IO devices including a parallel port, a serial port and most importantly a SCSI bus which allowed us to attach things like disks and at first a SCSI ethernet port. That misadventure eventually led to adding a LANCE chip for native ethernet. There were also two ports on the printer to attach storage in the form of a card full of eproms which Talaris used to ship packs of fonts.

So there was a lot going on here and it's just me at first. Well, not exactly just me because the agreement we had is that we'd provide the infrastructure and API for the hardware and graphics, and they'd provide the printer language interpreters. There were no deep discussions between us hammering out API's or anything like that. I mostly decided what I was going to do, and they wrote to that. I'd end up in San Diego to talk with some of their engineers about what was going on -- usually Rick and Barry, but I knew quite a lot of them, and of course Cal the CEO was the tester and torturer in chief. It was fairly hard for them to argue on the main processor side though because I decided quite organically that the easiest thing to do was to just emulate libc and Unix IO, so that was something of a no-brainer. The graphics kernel was a different story: there was no standard graphics API at the time, though X Windows was starting to come into the picture. I didn't know enough about it though, and X11 was a couple of years away. Looking back, it seems hard to imagine a medium sized company would put so much trust into a barely formed startup and its one software guy to invent an API that they could write to with not much back and forth, but that's I guess just how it was back then: we were all just making things up as we went along none of us quite grasping that what we were doing was actually hard and would cause bigger companies to go into insane Mythical Man Month mode.

So Gregg and Andy went about designing the hardware while I, as the title states, started from nothing at all. We had compilers for the two processors (but just barely for the GSP since TI was still working on it), but other than that not a line of code that could be imported, and no internet to beg, borrow, or steal from. Another thing to realize is that the debugger, Punix, graphics kernel, and storage were the original deliverables and they were -- get this because it never happens -- on a tight deadline. I forget how long we were given, but a year sounds about right including designing and debugging the hardware as well as Andy designing a gate array. Gate arrays weren't what they are these days where you can rent one out by the hour on AWS. It kept Andy up many many nights. Both Andy and I were working 100 hour weeks. I don't know how we managed any amount of life at all.

The Debugger, monXX

While the hardware was being designed, I decided that I had a lot of work ahead of me and that I could use some of the lag time before board bring up to build a debugger since the tool chain obviously didn't have one. The obvious choice was to make use of the serial line. I went on to invent a protocol for debugging remotely for things like downloading code, examining memory, setting breakpoints, calling functions, performance analysis and the like. Most importantly I figured out how to digest the symbol table so that you could print not only variable, but entire structures using normal C syntax, like foo->bar.baz.

It turned out to be immensely helpful not just for me, but for Talaris too since they were in the same boat as me trying to figure out how to bootstrap themselves to write embedded code which their engineers had not done to my knowledge. There were actually two distinct debuggers, one for the 32000 (mon32) and one for the GSP (monGSP). For the GSP it had to be gatewayed through the 32000 to get to the GSP. This all happened behind the scenes for the serial line as it was still an operational serial port used to send the printer pages to be printed. 

The debugger was ported to many other processor architectures over time including the Mot 68k and the MIPS architecture. The most significant improvement, however, was using the network. At first I just used my own raw ethernet frames because that was the quickest and easiest way to do something on Vaxes running VMS. After a while, however, we got MIPS workstations which ran Unix and inexplicably they didn't support raw ethernet access which pissed me off. So I was forced to port it all over to IP as well. The first time I remotely debugged something over the internet -- I think it was in New Zealand -- was nothing short of amazing. Today you'd call that a backdoor, but there was no such thing as security back then.

Punix, the little OS that could

The National 32000

Every programmer wants to write their own multitasking operating system, right? Well at least back then it was on the todo list of many a programmer looking to feather their cap. What does Punix mean? It stands for Puny-Unix. Everything a normal Unix has that doesn't require an MMU (memory management unit). MMU's were rare in embedded software generally, and not terribly helpful when there is no guarantee of a disk. It would have been handy for the debugger for watchpoints, but we managed.

As I said above, I chose a Unix environment because it already had documentation, and even better allowed Talaris to start writing their code: if it was in libc, it was in Punix. That meant that I had to write libc from scratch. All of the string functions, malloc/free/alloca, stdio, ctype and the like. It wasn't a full implementation of libc since for one we didn't have a FPU (floating point unit), but it had quite a bit of it. If I or Talaris needed something, I'd divert whatever I was doing and write it. Suffice it to say that it was enough for our purposes, though I suspect that if Talaris found something they needed they'd often just roll their own instead of bugging me. Remember: no internet, no git pull request.

At the time I started writing Punix, I was actually not familiar with the actual Unix operating system. All of my experience was on VAX/VMS. So in the kernel itself, it looked a lot more like VMS than Unix but for the most part it looked like something entirely different and entirely me. The main thing I took from VMS was their class driver/port driver architecture. Port drivers drove the physical hardware for DMA, interrupts, etc and provided a standardized interface to the class drivers. A class driver on the other hand used port drivers to do the low level work, and they did the hardware independent work. Think file systems as an example. The little font cartridges used the same file system that I laid down on the SCSI disks, for example. The main class drivers are called out later.

VMS had another thing that I latched on to which was that they used process based helpers for networking. It was the perfect excuse to make the OS multi-tasking, or what today we would call multi-threaded since it was all running in the same address space.  Threads, the term, had not been invented yet. It was a fully preemptive OS in that there was a current process, a run queue, and a wait queue. Since there was no thread interface to emulate, I just emulated fork(3). This may sound sensible, but remember there was no MMU and thus the need to clone enough of the stack in the child process to allow the child process to return. The first time implementing this definitely caused many an explosion until I got it right. The  remaining difficulty was context switching. To get into the wait queue, eg block, there was a function called sched() that backed up the registers into the process header and selected the appropriate process from the run queue to make it the current process. When an interrupt came in from a device the port driver would know who was waiting for that device and move it over to the run queue to possibly be context switched. The last thing was the timer interrupts to switch out processes at equal priority so they could get cpu time too.

The last notable part is that it had resource locking (ResLock) which implemented a mutex in a critical section of code. I get the impression that a lot of kernels use very inspecific mutexes while it's in kernel mode to lock out other processes, but Punix was very fine grained in that each driver would have its own read and write mutex. That and we didn't have an MMU, so there was no kernel mode in the first place. This would go on to cause quite a bit of hilarity with race conditions but fortunately I have a knack for visualizing race conditions so they were usually dispatched relatively easily. Would that were the case of memory corruption and the Heisenbugs they caused. They were the source of much, much anguish and is a reason I would rather deal with garbage collected systems, and strings that don't overflow these days.

The Graphics Kernel

The TI GSP

The GSP (Texas Instruments Graphic System Processor) was an odd beast. First off it was bit addressable rather than the more customary byte addressable. It also had a bunch of graphic primitives in its microcode which were quite convenient like being able to move a character in a font onto the bitmap with one instruction, clipped and masked if needed. This was tremendously helpful for me as I had absolutely no background in graphics of any kind. It was also impressively fast for the times: a 50Mhz clock which caused Gregg to speculate that surely this madness cannot keep going on because there are laws of physics to be obeyed. Since TI expected it to be used as a co-processor mainly, it had a host interface where another processor could reach inside its memory space and read and write it as well as interrupt the GSP to tell it that something happened, which is what the 32000 did. For bitmap graphics you need to draw the pixels into something so it had a bank of memory the size of the bitmap rounded up to the next power of two. Since the GSP was bit addressed, you could treat the bitmap as a literal rectangle and the processor would figure out how that translates to actual ram addresses. 

Did I mention that I had no clue about graphics? Paul -- the one who was mostly absent -- had a bunch of literature on raster graphics and we would sometimes talk when he showed up, but I was mostly on my own. I had to figure out and write Bressenham's algorithm (drawing lines) and the various algorithms for curves like ellipses and circles. Getting to understand non-zero winding numbers for filling objects with patterns was another new thing, and of course just understand the tools that the GSP brought to bear. But the base level graphics were only part of the problem.

The first order of business was to get commands and data in and out of the GSP. Remote Procedure Calls ala Sun's NFS were all the rage at the time so I'm like what the heck. So I designed a queuing mechanism for the RPC calls that were being generated by the Talaris designed language interpreters. They'd keep piling data into the queue and the GSP code would busily empty the queue by executing the RPC's on the queue. If the queue got too full the 32000 would back off and wait for the GSP to tell it that it was ok to start sending again. This led to an ominous debugging message "Queue Stalled", but all was working as intended -- it just meant that they they are sending me complicated stuff that takes a while to render. The front end of the RPC's were the actual API I made for Talaris to work against. There were a lot of primitives like line, polygon, text string, circle, change fonts, etc, etc, that would be immediately obvious to somebody who's worked on a html Canvas element.

Another aspect of the design involved fonts. Fonts can take up a lot of memory and you need to have them around so as to execute them. For this, I designed a faulting mechanism where the GSP determined if it didn't have font loaded and if not interrupt the 32000 to fetch the font from whatever storage it was stored on, either the eprom disks, or on a hard drive. This worked really well as the fonts on the GSP were really just a cache and it could decide if it wanted to toss a font to make room for another. Keeping track of that on the 32000 side would have been a nightmare regardless of who wrote it which most likely would have been me, so it saved me a lot of work.

One of the cute side effects of the GSP is that it since it treated memory for the bitmap as a rectangle, there was memory available on the side of the page depending on the width of the paper since the actual dimensions were rounded to the next power of 2.  This turned out to be an ideal place to stash the fonts and came to be known as "Condo Memory" which I believe that Barry at Talaris coined. And it could hold a lot of fonts at once and was perfectly suited for the GSP architecture to blit the characters into the bitmap at very high speed. 

Probably the most clever thing about the design was something I called the band manager. With a laser printer, you send each line bit by bit to the printer serially. There is a kind of ram called video ram that unsurprisingly does that for video displays so the deal was that you build the bitmap in regular memory and transfer a band at a time via the video memory which then serializes it and sends it to the laser's beam. Normally for high speeds you'd want two bitmaps so that when you're outputting the current page you're working on for the next page. But memory was expensive in those days so that wasn't an option. 

I had gone skiing alone for my sanity one day and after I came back I decided to go to a bar (a gay bar even! i was still single and mingling) and unwind. I started thinking about the double buffering problem (or lack thereof) and came upon the idea that I could actually write behind the beam of the laser printer to speed things up a lot. I literally used a cocktail napkins to sketch out the design so that I didn't forget the next day. It went like this: when a graphic drawing command from the queue was executed, the first thing it did was figure out the extent of the Y beginning and ending and queued it on the topmost band for drawing. If it ended up in a band that was closed because the laser hadn't got there yet, it just waited until the beam was past. The other key thing it did is that it didn't wait until it could draw the whole object, but instead drew what it could when a band became open as it was transferred to the video ram for output. It did this by setting up a clipping window that matched the part of the bitmap that was open to being drawn in. If it didn't complete, it was queued up for the next band that was closed to be executed again. This may sound wasteful but it wasn't: the first thing you need to do is clear the bitmap from the previous page, if you waited for it to be able to do that you it would defeat the entire point of writing behind the beam as it were. This sped  the printer up enormously and is why it could drive a 100 page per minute printer at full speed. 

So yeah, the GSP was a nifty little package and full of cool tools, but there is a lot more than just some graphics primitives to designing a graphics kernel on a co-processor. All of those I needed to work out on my own, and while I had help later for a bunch of other tasks, the GSP code was a no-go zone as far as everybody else was concerned. That was my baby and my problem alone.

Storage

The storage subsystem was relatively simple. Since it was mainly for fonts and things like that speed wasn't really much of an issue. The file system I created was a straightforward implementation of a hash table for the directory -- I didn't care about sorting because nobody was doing an ls -- and I don't think it allowed for sub directories. The main challenge was getting the SCSI driver to work. SCSI has what is called an initiator and target. The initiator is the OS code, and the target is the device. I had this notion that the initiator should be in control of the flow and that if the device didn't flow as expected something was wrong. It took me a long time to understand that the target was the one that controls the flow and that the initiator had to follow it. It wasn't until the printer was well into production that I figured out my error and things became a lot better. I did put some effort into speeding it, but it wasn't much of a priority. The only thing I didn't do is boot up from the disks. Maybe I'm misremembering with the flash based disk I designed, but a firmware upgrade involved putting a fresh set of eproms on the motherboard.

Networking

My old foe: the Kinetics Box
 

The really cool thing about the Talaris laser printer was the networking aspect. We designed it with networking in mind from the very beginning, and we were the first laser printer that could be accessed by ethernet. For all I know, we were the first example of the Internet of Things (IoT). The networking part started as a next phase as I recall, but it was very soon after the initial shipment we started working on it. We didn't have an onboard ethernet solution at the time, but we found this box from a company called Kinetics that was a SCSI ethernet adapter about the size of a shoe box, with about the same amount of charm. It got the job done, but given my wrong interpretation of how SCSI is supposed to flow gave me fits. I'm sure it was buggy in its own right but I was not helping the situation. We eventually designed a version of the controller with the AMD LANCE chip onboard and much rejoicing ensued.

One of the interesting concepts I came up with was the idea of borrowing buffers from the OS. We were very focused on speed of the networking because although our initial targeted printer was 15 pages a minute we were planning to target a 100 page a minute printer that needed to be fed at a rate that could keep up with printer. So the networking needed be very fast and efficient so the interpreters had enough CPU ticks to get their job done too. Remember that the 32000 was basically a 1 MIPS box in a world of 500,000 MIPS processors now. Drivers would have a set of buffers big enough to receive an ethernet packet (1536 bytes), and you'd need to keep enough of them around to buffer any burst. The naive Unix way to transfer the payload is using read(2), but that incurs a copy of the buffer to the user's buffer, and of course the allocation of the user's buffer altogether. For the network drivers that were just ordinary processes, that seemed needlessly wasteful since they were just an intermediary. So in comes the readsysbf and freesysbf ioctl's to the rescue. The network driver would borrow the buffer to do its protocol things and give it back when it was properly queued up for delivery to the process doing the printer interpreter job.

One interesting thing is that apparently neither me nor Talaris was aware of BSD sockets, so firing up and connections and listening on a socket were done in completely different ways. Had I known about connect, listen, and bind I would have emulated them, but I didn't.

TLAP -- The Talaris Printer Protocol

TLAP was the world's first ethernet protocol to drive a printer. Let's say I was sort of naive at first. Years before I had written networking for a point of sale terminal and its controller and knew that it required retransmit logic because the serial drivers (rs429, rest in hell) were flaky. For some reason I had forgotten that and we went some time before realizing that, golly, ethernet can drop packets too! Oops. I don't really remember a lot about the protocol other than it being pretty simple minded. Since I don't have the source any more, I don't have much to jog my memory. It did require protocol agreement between me and the engineers at Talaris since they were writing the host part of the protocol which attached into the print queuing systems for VMS and Unix. If I recall correctly, this was one of those cases where I mostly told them what I did and let them have at it after. Since there was no internet or email at the time, meetings required hour long trips to San Diego, so we tried to keep those as minimal as possible, usually once a month or so. Again in a world today where everybody gets their $.02 worth of kibbutzing, it pretty surprising how much was empowered to me to figure things out. The protocol ended up working just fine and drove that 100ppm laser printer like a champ. 

TCP/IP

 By the time we started working on a TCP/IP there really was a we. We had several customers at the time beyond Talaris and they wanted TCP for their application, in particular a terminal server we were working on. A terminal server is a box that takes user keystrokes on a keyboard of a computer terminal and sends them back to the host  which sends the output so you can do remote logins. Talaris didn't need that, but they were interested in compatibility with lpr which is the Unix command to send files to printers, and had a way to do that remotely using TCP. Lpr as a network protocol was rather clunky and not especially great at its task. I thought very hard about bringing this up at a new fangled thing called an IETF meeting which was happening at USC around that time we work working on it. I very nearly drove up there, but for reasons I don't remember -- inertia most likely -- I didn't go. I have no idea how receptive folks would have been at the time because there was so much to do. But it was pretty novel for a device that wasn't a host to be on the internet so they may well have been intrigued. Of course, IETF hadn't even got around to designing PPP and instead had SLIP for IP over serial lines so we're talking about a metric shitton of things that needed to be engineered, and fast.

My other engineers did a lot of the initial work, but I got my fingers in it too trying to deal with speed and many other problems. So it was definitely a collaborative affair which was sort of new for me. With one memorable bug we couldn't reproduce, so I got flown out to San Antonio to see it for myself. What was memorable is that the place was basically an airline retrofit shop and one of the retrofits in the hanger was one of the 747's that shuttled the space shuttle. I was very impressed. The bug, however, was less impressive: some edge case with a TCP checksum as I recall.

DEC LAT 

DEC's LAT protocol was mainly designed for terminal servers which we were working on at the time for another company (our terminal servers still sit in data centers to this day apparently). LAT was really optimized for terminals and the way they behaved, but they also had provisions for printers as well so this was a win for Talaris as the could natively support VMS without any need for a protocol driver as was the case for TLAP. LAT wasn't expecially fast though because terminals maxed out at about 19.2k baud so it wasn't a priority. TLAP still have a big advantage when you needed speed so it remained.

SMB, Novell Netware, and Appletalk

We built a lot of network stacks. I was involved with all of them, but the Novell one was mostly mine. Novell was a pretty simple minded protocol as I recall and it had an actual protocol to talk to printers which I implemented. I can't remember whether Talaris implemented it -- they must have because I don't think we'd do it on spec -- but the thing I do remember is that we licensed our code to Apple for their laserwriters. I was never quite clear whether that made it into shipping product.

Conclusion

It's sort of amazing what you can get done in a year or two when you are given free reign and few meetings. The first three sections of monXX, Punix and the graphics kernel were mostly up and running and out the door by about 1987 as I recall (we started in June 1985, so not bad all things considered). There were a lot of things beyond the laser printer going on that took up my time including the terminal server I spoke of, but also a graphical terminal that was a knock off of a DEC VT340. The industry was transitioning at the time from terminals to X Window terminals, but very quickly to workstatons so that ended up being a huge amount of effort without much to show for it. The laser printer did really well though and sustained Talaris for years to come. Not bad for starting out with nothing at all.


Thursday, May 7, 2020

HOBA Revisted with WebCrypto


The Hoba Meteorite in Namibia


Here's direct link to the running demo which explains in much more detail what's going on than in this post at the HOBA Demo Site

Years ago, I got really pissed off about LinkedIn doing something incredibly stupid, especially for a big company which was storing unsalted passwords on their servers and their subsequent leak. That got me to thinking about getting rid of passwords on the wire if at all possible. This led me to my work on a prototype that used public key crypto to join, login, and enrolling new devices. You can see my original posts  here and here, along with the resulting experimental HOBA RFC (RFC 7486)


I was really excited when I heard about the w3c WebAuthn work hoping it was the successor to our experiment. The reality was when I tried to get WebAuthn to work, it seems regrettably difficult to get up and running, especially without an external signing dongle. It is quite possible that my problems were completely wrapped up with not wanting to require a signing dongle. Chrome doesn't support local key stores at all with WebAuthn, and Firefox does so only by fiddling with about: flags. This is a real shame as I really hoped that WebAuthn could finally bend the curve against passwords being transmitted over the wire which is still a huge problem. Since HOBA was written a lot has changed. WebCrypto now contains solid crypto  functionality accessible to browsers, in comparison to the horrible javascript hackery that I used in the original HOBA RFC. Another thing that has changed for the better is that it is much more common for servers to require an out of band verification (email, sms) to enroll new devices. This was one of my big worries at the time because HOBA required those out of band mechanism for enrolling new devices. Thankfully I don't have to fight that social problem too... lots of somebodies have done that for me in the mean time.

So I decided to give my prototype another look, and see if I could make it into what I had hoped WebAuthn was. Happily, all of the algorithms and backend code are still relevant from my prototype, it was just a matter of replacing the javascript versions of crypto to the more civilized WebCrypto version. Most of the effort was just dusting the cobwebs off the code and stripping it to a bare minimum. In fact, refactoring the crypto code to allow both to run side by side as well as actually writing the WebCrypto driver took all of one day, and a lot of that due to some whacky to/from PEM that was getting me wrapped around the axle which had nothing to do with WebCrypto at all. I've put both versions of code up on GitHub as an example of how this problem space can be attacked in a much more straightforward way if you don't need the added security of crypto dongles. The server code is written in PHP. Sew me. It could trivially be ported to any other language, and the key issue is integration with your own enrollment and login code in the backend so it serves only as an example in the first place. The HOBA-related code is actually very small and pretty easy to understand. The new device enrollment is probably the hardest to understand, but the main takeaway is that out of band verification of ownership of email, phone numbers, etc is pretty common these days so lots of sites have experience with deploying that. When I first wrote my HOBA code, that was much less prevalent.

There are two pretty big open issues. The first is the most straight forward which is whether it should be using a nonce from the server validate freshness instead of time. My guess is that the answer is yes and the implementation of a Digest-like (RFC 7616), as well as the original time based replay protection. The second is how to get enough review to actually believe that it works and doesn't have holes. I've been thinking about writing an internet draft and floating it at IETF but I'm not sure they'd want to take it because the client and server code are definitionally controlled by the same entity so it would mainly be for security review, not protocol agreement across different vendors.

I have created a site to demo the HOBA demo site as well as a Github Repo. Give it a spin and take a look. The demo is stupidly simple: join the site, logout, login to the site. If you want to enroll a new device, either find another or just use another browser and try to login in with your username. The backend will send mail to verify the new login. The only difference with all of this is that there aren't any passwords.

Friday, May 1, 2020

The Water Cooler Problem

 

Telecommuting Works, but is Different


Having telecommuted for almost a couple of decades off and on (mostly on), the technology has come long way. As a software developer and a networking geek, it's pretty trivial to work from home these days. There are undeniable benefits to working from home like saving time, frustration and money not having to commute, the obvious environmental benefits, and not having to be planted in a god-awful "open plan" row crop. Even if telecommuting were a net neutral on the productivity front, it would be a massive win for everybody, even if it were just a day or two a week.

Telecommuting can work, and can be way more productive. It does take discipline and is an acquired skill, but it can be learned. There are lots of people who say that telecommuting doesn't work, but it begs the question of whether the reason telecommuting fails is because companies are shit to begin with. Telecommuting does give you more ways to give them the finger, after all. The very act of not trusting their workers is its own problem and says way more about the people who distrust than the ones who stand accused. There is a simple problem to people who goof off: you reprimand them, or fire them. Same as going into the office. Being there physically does not mean they are actually working. Their output tells you that, both in the office and at home. If you think you can tell just by watching people in person, you are deluded. 

It is true that some people have a hard time telecommuting though, and not all  jobs are as easy from home as being an anti-social software engineer. But telecommuting need not be an either-or proposition. I would often telecommute in the mornings and then arrive late in the office with much less traffic, and then leave relatively early to beat traffic again, so as to have 4 or so hours to interact in person. In other situations, I would almost completely telecommute unless there was some particular reason to come in, like a coworker flying in to have some high bandwidth time with.

That said, telecommuting has its share of problems, both social and technical. In the following sections I will attempt to create an inexhaustive list of problems which I'll bundle together as the "Water Cooler Problem". I'm not trying to solve any of these, just enumerate what they are. This is especially relevant given the giant social experiment happening right now with the covid-19 pandemic. They say that things like pandemics cause huge changes. Like the previous one that I've lived with, I expect this to be no exception. This is telecommuting's debutante ball, for better or worse.

The Actual Water Cooler  


The Water Cooler is an abstraction for a place that coworkers can casually meet. It could be the lunch room, kitchen or by an actual water cooler. Quite a bit has been written about the benefits of the Water Cooler such as making and keeping social bonds which are not necessarily work related. I'm not entirely sure I buy into this though, at least for the non-work related stuff. When I was at Cisco, I rarely talked about outside life with coworkers unless they were a close friend as well. I barely knew who had kids or not and what outside interests they had. It just wasn't relevant or important. Even over lunch it was more about what was going on work wise. Same with other places I've worked.

That said, there is definitely worth to having lunch together and informally talking about what the current problems are, various shitshows around the company, and other comings and goings. Virtually, this doesn't happen that I'm aware of. There isn't much to prevent it given current technology: you could just have a standing meeting called "Lunch?" at noon every work day for your group and you can gross out virtually at them chomping down a hideous pb&j sandwich. It's not the same, but it wouldn't be especially different. Given the current lockdown, maybe it will become more popular. When I mainly telecommuted, I really didn't miss it particularly. Other people may value it highly. So I have mixed feelings about this entire form of interaction.

The Over the Cubical Phenomenon


Sticking your head over your cubicle, or hanging out in the doorway of a coworker is a classic example of how workflow happens when you are in the office. You ask a quick question, and get a quick answer within a few exchanges, or it can turn into something bigger because you realize that you are both clueless and need to hash it out.

The main aspect of the over the cubical problem is that it cannot require a classic threeway handshake before information flows which is what current meeting tech requires (eg, a meeting invite...). Information must flow from the start of  your presence over the cubical, and not after a response. This implies that some amount of gate keeping needs to be available to limit who can look into your cubical, based on "location",  hierarchical distance, and social availability so as not to subject people to interruption denial of service attacks.

Another factor is appropriate interruption. Interrupting people has been measured to require a significant amount of time to get back to the previous task. A common mode in real life is to give a quick "give me a sec" or "can you come back in a few" so they can finish up what they are currently working on. The other thing that real life gives are hints that they are not available at all and that you should come back later. Like if they are already with somebody, or they look like they are really busy. That is a much more difficult task remotely. As in I have no clue how you'd do that.

Chat is Not a Substitute


Chat is sort of a half-way between email and a meeting. And while it's often good enough to hash out problem, it's not a panacea. The main thing I see is that it is rare for chat to upgrade to face to face style interaction when that is actually appropriate. Maybe it's just me being an old geezer and not knowing that the younguns do this all the time, but it's definitely not been my experience. It needs to be extremely seamless to work correctly. Like one click from the initiator and with little or no effort on the recipient.

Which points to the second problem: there seems to be social barriers to doing that. I really don't know why, but it would just never occur to me change to face to face midstream. If I am the norm, that means that it will take some amount of social training to make that an acceptable thing to do.

Spontaneity and Brainstorming


Part of the supposed benefit of the Water Cooler is spontaneity so that you can brainstorm. The chance meeting that turns into something bigger. Sometimes much bigger. When I was working at Cisco a small group of upper level engineers were tasked with what Cisco could to help with the spam problem. A chance meeting by myself and one of the others (Eliot Lear) allowed me to talk about something I was thinking about without having to show my cards to everybody else in case it was insane or useless. I might have done the same by email, but it was better in real life because the back and forth is faster. He ended up liking parts of it, and was dubious about other parts, which gave me confidence to explain what I was thinking to the larger group. The rest as they say, is RFC 4871 (DKIM) which signs almost every piece of email on the internet these days.

So that was definitely a success story, and I'm sure there are zillions of others just like that. It's hard to say that it was the thing though: I'm pretty brave at revealing my ignorance so it would likely have gotten out one way or the other. Others are not as brave (or reckless) as I am, so bouncing ideas off of others privately can be really important. The current state of tech with respect to conferencing is definitely not conducive to this kind of interaction. The main question is how much it matters.

Before and After Meetings


Before and after formal meetings is a place for the Water Cooler effect to happen. On conference calls, allowing people to chat amongst the other participants before the meeting is pretty common, but it seems like either it's the default or meeting owners choose to disable that feature which is puzzling to me. After conference calls, however, are completely different: people just hang up. In real life, it is much more nuanced. As people leave, they may linger either in the conference room, or out in the hall and chat either one on one, or in smaller groups. There may be more than one of these groups. It may be social, or more often it's a followup to the meeting itself but in a smaller setting. These are completely spontaneous and often more informative than the meeting itself, given structure. It is also a very good place for junior and senior workers to mix and especially more junior workers to be able to be more open with their ignorance than in an open meeting setting. It is super important that junior workers be comfortable knowing that they don't know it all and just ask and learn. That is a serious problem for telecommuting.

This to me is probably the most important problem. On the bright side, it seems like it could be amenable to technical solutions. But there are always the social implications that make it hard. How do I know if they have to jump onto another call? How do know that in real life? It's pretty obvious if they are scurrying to their next meeting. If they hang out a bit finishing up, that probably means they are amenable. So it's pretty easy gauge in real life. In virtual life? I don't know.

Mixing of Junior and Senior 


Cisco was/is a complete creature of mailing lists. It had a pretty unique mailing list though called clueless. It was populated by junior engineers all the way up to fellows. Though it was much more technically oriented, it had lots of participation from high level execs -- often geeks turned suits, but not all. I'm sure that Cisco is not unique with this kind of interaction, but it was a very curious creature in a way. It facilitated younger engineers to actually interact with people who are gods in the networking industry and see them as people rather than just technical specs, and delivered products. This one is cranky, that one is surprisingly social and very accessible. You find out what's going on around the company, what people are interested in, what people's hobby horses are, etc. Likewise, the senior engineers get to see the up and coming engineers and what their talents are, and how they might be worthy of helping them along to grow their talent.

For a large company that is almost impossible to do in meat space, so here virtual is actually a win in terms of in the large. In the small however, virtual doesn't work as well. It's much easier to see somebody who probably knows what you need to know, and ask them in a few minutes which otherwise may have taken hours or days. Email works to a degree, but in person it is better especially if a whiteboard is helpful (which it often is). And there is the social cuing that makes it easier to ask somebody something in person who you barely know, than sending off email that may go unread for a long time.

Whiteboards and Casual Meets


Often you'll meet up with somebody either by chance or by interrupting them, and find the need for a whiteboard. I've never used one of these virtual whiteboards in formal conference calls, but I am extremely dubious that using a mouse to draw something is any way a substitute for a marker. I suppose you could get a e-pen, but that's just one more thing on my desk. Maybe phones and/or tablets with touchscreens make this easier. But that too is problematic because one of the advantages of a whiteboard is that they are physically big, and phones and tablets are tiny. I use a 4k monitor which is attached to my Mac so I suppose that might be a possible compromise, but I'm not sure how it would work. Happily, this post is not about solution space. The problem remains a problem and this post remains something to point them out.

Customs are Different Online?


People do not seem to like video on conference call. It's been available on formal meetings for ages, but people don't seem to use it from what I've seen. Which is very peculiar because obviously in real life you can't do otherwise. There may be something deeper going on here though. When I am in a real life meeting there is no way I can forget what I'm doing when I'm constantly bombarded with the feedback of seeing and hearing the people around me. I had one mortifying incident happen to me as I was driving to San Jose from San Francisco on a conference call. Somebody cut me off, and as normal I screamed at him (even though he couldn't hear of course). It was not on mute. Oops! That would never happen in real life, so there may be a  rational reason we may not want to show our mugs to make certain we don't floss while sitting in a meeting.

It's all the more peculiar because humans are social animals and visual cues are extremely important. There seems to be a big difference when you get a bunch of people staring blankly at their computers than in real life. In real life, attention (or not!) is given to the person who is talking but can veer off to look at somebody else to judge what their reaction is, or to signal to another that a response is necessary, etc. Since you can't tell what the person on the other side is actually looking at remotely those sorts of interactions are not possible.

Lack of visual cues make another problem almost insufferable: blowhards can speak forever and you can't get a word in edgewise. I mean if I wanted a lecture, record it and I'll listen to it when I feel like it. If at all. This is in fact a technical problem largely of our own making, but it is also a social one in that the blowhard may believe that his prolific words of wisdom have captivated the audience. This simply doesn't go down that way in real life. Exasperation is a readily observable social cue as well as just signaling that you want to talk. Which isn't to say that they can't blow-hard in real life, but they can get non-verbal feedback too.

Etiquette  


There are a lot of questions about etiquette in the virtual realm, and it's not clear that they map 1:1 to real life. Or at the very least, they present new dimensions to how you map real life etiquette to a virtual one. Suppose I wanted to transition from text to a live conference.  If I offer and you decline, that is bad. If I offer and you accept unwillingly but socially required to, that is bad too. This is especially true of meeting/chance encounters. In real life, you can see somebody and guess whether they are really busy or whether they are coming up for  a breath of air. Or at least it's a lot easier, if imperfect. Likewise, even if you intend to interrupt, you can let your presence be known and let them tell you when it is OK to interrupt.

There are no doubt lots of other areas of etiquette which may well be different or new in a virtual setting.  As anybody who knows me, I am treading on thin ice with anything pertaining to etiquette so I'll stop while I'm behind.

Adding it Up: How Important is the Water Cooler?


As I've said, I've telecommuted a lot in the last 20 years. Even though I have a long, if incomplete,  list of differences they are not all equally important. It is manifestly the case that you can do good work completely from home not having any of these real life features outlined above. Or that you can get by mostly virtually, with the occasional meet up for high bandwidth interaction. That tells me that it is much more of a nice-to-have feature than a must-have feature. Some nice-to-haves are much more important than others. The End of Meeting problem is a pretty serious deficiency and especially for more junior coworkers. That is when items discussed are freshest on people's minds, but just hanging up abruptly stops those interchanges dead in their tracks.

The Over the Cubical problem is also pretty serious. For years, I didn't have anybody to just geek out with on the spur of the moment. Once you schedule an actual meeting, you've already lost any momentum toward just getting past something you're working on -- formality is the enemy. That I really missed. The Actual Water Cooler Problem is much more meh for me. I'm willing to be convinced that I'm the outlier but I'm suspicious that it is solution in search of a problem. I'm more than willing to not know about other coworkers lives if it saves me from being planted as an open office row crop. It is deeply satisfying that study after study shows that the benefits of collaboration with open offices is in fact negative and that it was always about saving money with a post hoc rationalization about its benefits.

Telecommuting can definitely be done successfully, but we are still quite a ways off from solving some serious downsides. We'll be learning for years to come given the Great Covid Experiment in Telecommuting. For one, we'll find out about the validity of baked in biases, given this was widespread without the self-selection problem. We should also be able to gauge what productivity is, and how it balances with all of the upsides. Since a large slice of the people who telecommuted are new to this, they can feed back what their hurdles were. It should also inform the industry what technical problems are out there for the new telecommuters, and the remaining problems for old hat telecommuters.

Interesting times we live in, fer shure.

Sunday, April 26, 2020

The Toxicity of Interview Programming Tests, PinchĂ© CabrĂ³ns



When I was at Cisco, my last project was about what Cisco could do about the email spam problem. Cisco had exactly no presence with email in any form, so this was as greenfield as it could get within the confines of $MEGACORP. We got chartered by Dave Rossetti and got together a bunch of senior engineers where we immediately started tapping our white canes in the email universe. I remember talking to Eliot Lear one day about how maybe we could affix a signature to each piece of email from a stable private key from the sender and let the magic of Bayesian filtering do its job. I don't think that Eliot was overly optimistic about unanchored keys -- although I don't think he laughed out loud either. I floating the idea with the rest of the group after that. I'm struggling to remember whether Jim Fenton (Jim, help me?) had been thinking down similar lines, or not, but the end result is that our white canes were now tapping at a furious rate at what would ultimately be called Internet Identified Mail (IIM). IIM had a shiny new thing we called a key distribution server (KDS) which bound the key to a given domain, and used HTTP to transport the keys to the receiving domain to verify the signature, so I'm sure that Eliot was assuaged.

We wrote an internet draft and started socializing it. In the mean time, I hacked up a sendmail milter (code that sits in the mail flow pipeline that can munge the email) and hashed out a lot of the boring message on the wire syntax mainly by needing to get down to that level to be able to code it up. Jim and I were much more interested about the semantics, after all. After having a working prototype, along with our socialization outside Cisco we ended up finding out that Mark Delaney at Yahoo down the street from us was working on his Domain Keys draft/implementation which looked different, but eerily similar too. We finally got together and made our cases to each other -- we were looking from the vantage point of enterprise, and Mark understandably was thinking service provider. After some soul searching Jim and I decided that the main differences were with syntax in way the signatures parameters were sent, canonicalization, and the use DNS vs HTTP for key lookup. Truly yawn inducing details thinking back about it.

So DKIM was born -- Domain Keys (Mark) Identified Mail (Jim and I). This lead to a very large push outside with lots of IETF folks. One the remarkable things about the experience is that the eventual working group had rough consensus and running code in spades, and well before the actual working group was spun up. I managed to eek out a small victory in being the first one to want to interop code with others, with Murray Kucherawy then at Sendmail following like the next day. Murray's worked a lot better than mine, but he had written the DK milter, so he was at a big advantage.

During the journey, we started talking to a company called Ironport who were also participating and knew what Murray and I had done. Jim and I were part of the due diligence team that vetted Ironport that Cisco went on to buy. In the mean time, internally we had started our own effort, and my DKIM code was put into Cisco's mail pipeline with a racked up box (Cisco is a hardware company... it's what you do). So not only did I write the code for DKIM, it was running in a Fortune 500 company's mail infrastructure, and for a company that lived and died by email, that was no small thing. It never had a hiccup.

So what does this have to do with Toxic Programming Tests you quite reasonable ask? All of the above should show to any idiot that I'm quite capable of writing solid code in short order. Once Ironport was part of Cisco, it was obvious that our project was done so I decided to try to jump over to the Ironport acquisition. When I finally interviewed, they gave me a programming test -- strstr as I recall. I wrote a shitty version of it but said that in real life I'd get out Knuth and lean on his genius. I mean, I had been out of school for 25 years by then... these algorithms are not on the tip of my tongue. Afterward, I was told that the universal reaction was that I couldn't write code. I'm like what in the fucking fuck? They reduced all of the evidence to the contrary to a single coding test that I wasn't even expecting! In another interview later I was vetoed because I couldn't recall off the top of my head what the Java keyword for a constant was (final) by a shitty junior engineer. I know lots of languages and it takes a little bit of time to swap them in and out. FWIW, I knew the answer but just couldn't remember it in the interview.

That is toxic. Coding tests have always been pretty close to useless because different people like to code in different ways. I like to be holed away and absolutely loath people staring over my shoulder. But guess what, that's what coding tests force you to do! So by all means, ignore your lying eyes and base everything on 25 year old memories of algorithms which in real life you'd be fired if you were to roll your own. Rinse repeat. Over and over. Interviewers are completely convinced that if you don't know whatever obscure algorithm they are throwing at you, you can't code. Research by Google of all people -- because they are the absolute best at this toxicity -- showed that coding tests and lot of the other toxic interviewing they did was not only useless, but were actively harmful. I interviewed once at Google before this revelation and got the same idiotic treatment and rejection. For years they would call back asking me to interview again, and every time I said no and the reason why. I finally started telling recruiters that if it involved programming tests, I wasn't interested.

The thing that's most stunning about all of this is that they never want to talk about what you've done in the past. The excuse that I've been given is that it could all be a lie. But memorizing algorithms is its own sort of lie. In my opinion, your past is an excellent place to quiz the interviewee because they better be able speak to the architecture, design and implementation with authority. A thing I would look for are the subconscious "we"'s which can mean that they are embellishing their part in the project. But that's a different rant.

After doing some research on this admitted hobby horse of mine is that a Fizzbuzz-like test might be ok, but treating even mid-career engineers the same as fresh college grads is lunacy. As I've written before, a lot of these interviewers are really just looking to have their penis extended so they are doing these kinds of interviews in bad faith in the first place. Yet way too many companies seem to rely on these kinds of tests as if they were delivered wisdom. Sorry, no I'm not going to re-read Knuth to get a job at your shitty-ass company that I don't know a damn thing about, let alone what I might be doing.


Tests, to god-damned hell with tests! We have no tests. In fact, we don't need tests. I don't have to show you any stinking tests, you god-damned cabrĂ³n and chinga tu madre!

Weaponizing PC Aspirations from Poorly Trained AI's [frank language]

I was banned from Reddit after a short stint of posting to r/askgaybros recently. The person to whom I was responding (1234ideclareworldwar) had just got done telling me that I either had AIDS-related dementia  or was mentally retarded because I somehow had a chip on my shoulder.  I have no clue how those even relate to each other. He had previously said that he wouldn't date somebody who was HIV positive because they were in effect reckless barebackers including all of the people who died at the beginning of the pandemic. My crime was to point out that there was no such concept of "barebacking" back then -- it was just gay men having sex with each other -- and that he'd either be the type of person who abandoned his friends as social pariahs to die a painful death alone, or he could be one of those who died a painful death alone himself.

Poof! That was it. Assumedly enough of the people of his persuasion (and there are lots of hateful young gay incels just like him) reported me and that was that. The content while somewhat graphic was certainly not harassment -- it was the literal truth. I was only trying to explain in a pointed way what the situation actually was to somebody who was clearly Monday morning quarterbacking, and full of the yummy privilege of hindsight.

The coup de gras, however, was me retelling this story on Facebook as a comment on a friend's posting, I was put in Facebook jail as well. I had just described what happened and my experience with the legions of gay incels that seem to populate that subreddit and their clueless hatefulness. Apparently as a gay man I am not allowed to use the F word (and I probably can't even say it here because Google's AI's are probably no better) in any context even though as a gay man I have been actively trying to reclaim that word as our word. It's not as easy to tell with Facebook, but I doubt that any of the original poster's friends reported me to Facebook. This was most likely Facebook acting as net.nanny on its own. When I appealed, it said that it might not get reviewed because of the Covid-19 pandemic, but in fact it "reviewed" it a few minutes later with the same results. That says that it was not, in fact, a human but some poorly trained bot (read: egrep) making the decision.

Ok, enough of the pity party, it's just a concrete example of something that is happening on a widespread basis without doubt. The larger problem is that these poorly trained bots (I hesitate to even call them AI's because they seem to be at the level of egrep) allow people with bad intentions to game the system. These poorly trained bots in fact are punishing the people they are intending to protect. Since they do not have the capability of understanding context -- and even human moderators generally just do peephole scanning -- they are enabling people to use that lack of context to retaliate against speech they do not like.

I should point out that this is fine for moderated groups/subreddits who have their own rules. Moderators can be a pissy bunch, but in the end it is their group to be pissy about. You are always free to create your own group with its own rules. The problem is with platform-wide moderation where it's it is painfully obvious that it is not up to the task of providing a fair and even moderation service. In the Reddit example, the user whom I supposedly harassed is still posting away with complete impunity. I was dished up more vile and harassing -ist (fill in the blank) in those 4 months by young gay men than I ever was by homophobes on Usenet's unmoderated soc.motss in the many years I participated. While Reddit does not disclose its moderation algorithms (security by obscurity!), it's pretty obvious that it is heavily influenced by the number of reports. While that may seem reasonable since homophobes coming into a gay group is not very desirable, it can have the perverse effect that the young gay men in my example who reflexively dislike older gay men -- this is common as dirt -- can game the system to get rid of them. The platform-wide bots that enforce this are clearly not up to the task. Yet enforce it they do anyway -- poorly and unevenly.

In the case of Facebook in particular, it is even more egregious. When a marginalized group cannot talk about their marginalization in frank terms, the platform is reinforcing that marginalization. As far as I can tell, anybody can report a comment if they can see it. While that is good for actual bad actors, it can be weaponized by bad actors to report content to retaliate against people they dislike, often for reasons of victim's marginalization. Facebook is in particular awful because you can't even try to give context while appealing the punishment. I suspect that it because either the bots cannot do anything useful with it, or it makes human moderation too costly. Reddit has pretty much admitted the latter. So basically this moderation is nothing more than a glorified egrep for the most widely used social media platforms on the planet.

Topically, I can almost guarantee that people have already been put into Facebook jail for making fun of Trump's dangerous and insane suggestion that people ingest cleaning products to protect or cure themselves from Covid-19. Since they can be trivially reported by Trump supporters as incitement  to harm or fake news, it is up to the bots to detect irony. Irony is extremely context sensitive and on Facebook writing on your own or a friend's wall it often comes down to actually knowing the parties of the conversation and whether it's irony or not: "of course he doesn't mean it literally, it's $FRIEND".  Bots or even human moderators surely have no clue. Since the jail message to me mentioned Covid-19 as being a reason for a possible delay for review, I'll bet a buck that it is because their bots cannot distinguish people rightfully lampooning a dangerous charlatan president from the morons who actually take his idiocy at face value and pass it along in all seriousness. Putting even one person in Facebook jail for spreading the word about yet another dangerous and incompetent thing that Trump is touting that should be avoided is bad. Very bad. Forbidding this kind of speech is an existential threat to our democracy as it gives the bad actors a trivial way to game the system by silencing the very people the platform claims to protect. Just as I am not allowed to call out ageism in the gay community on Reddit, people who fear that our democracy is coming apart in real time are silenced on Facebook by the people who cheer that on.

And that gets to the biggest problem of all. Platform-wide moderation is a cost center. There is little incentive to do anything to it other than reduce its cost. Being accurate and fair is almost certainly way down the list of priorities. Good moderation is extremely expensive because you have to hire and train people who are then given an endless supply of judgement calls -- a huge amount of which they are absolutely unqualified to judge. Do you think that people at moderation centers in Morocco have any clue about the subtleties of gay male culture in the US? Of course they don't, and that is putting aside the cultural biases of the moderators. Since even bad human moderation is expensive, social media has been deploying even more clueless "AI's" to keep costs down. The "AI's" deployed are even less equipped to deal with the subtleties of human speech and interaction. For all the hype, AI's are dumber than shit.

This sets up a huge dilemma: cyber-security -- and maybe security in general -- is asymmetric, where the bad guys have a huge upper hand. Bruce Schneier wrote a great blog post about exactly that asymmetry. It is trivial for attackers to slice and dice up Facebook's population -- that's the service for which they make their ad money after all -- and target them for reporting. Even assuming that there isn't an API to automate the reporting task, there is a huge effort difference for, say, a human given a list of general things to report the target for, than for the moderation task itself. Perversely, the more virtuous the social media platforms try to project, the easier it is for attackers to subvert its moderation since the bar is much lower, sweeping more and more people into the false positive pile.

The long and short of this is that while punishment for harassment might be a good idea in theory as with Potter Stewart's famous quip about pornography and "I know it when I see it",  "seeing it" does not scale to internet scales. It's also clear that we have no clue how to solve that any time soon. Given that it is trivial to subvert on a small scale, it should cause people to shudder at the thought of censoring weapons being used at a nation-state scale, either for its own population, rivals' populations, or more likely both. It would be ironic in a horrible way that the go-to way to stifle dissent is to is to use the tools of virtue as a weapon by those who have none.

The silver lining of all of this for me is that I have been cut off from the horrible people I have been dealing with, and it's feeling pretty good thus far. Fuck you Facebook. Fuck you Reddit. I am not your product anymore. I have no need for you. I have no use for your enabling hateful Trumpanzees who are the poster children for Dunning Kruger Syndrome. Nor do I have any use for hateful young ageist incel gay boys who think that it's a good thing we died of AIDS as they bask in the moral superiority of hindsight. The joke is on them: you'll end up being be old, gay,  and hated and wonder what happened. And best of all, the Trumpanzees will all be dead from Evolution in Action as they infect each other with the Covids, and munch on Clorox Chewables as a cure. Life is good.














Friday, April 24, 2020

On Second Thought... SIP Security

I have argued here that SIP's STIR/SHAKEN is misguided and is probably solving the wrong problem, and that the "right" problem is in fact the sip:mike@mtcc.com problem. But what if we are both wrong? The most obvious question is whether there is going to be anything resembling the PSTN at all in the future. Phones are increasingly not phones at all, but instead devices to access internet services. While email is probably bumping along at the same clip or growing, actually talking on a telephone is distinctly in decline especially among the youngins. They certainly use SMS texting, but there are any number of wholesale replacements for SMS-like texting. Given the lack of end-to-end privacy of SMS, apps like Whatsapp fill in that void and is very popular from everything I've heard. Given the heavily regulated PSTN and the tension with law enforcement, it seems highly unlikely that SMS will ever provide that sort of privacy.

So the obvious question here is whether in, oh say, 10 years legacy telephony (regardless of how it's transported) will be very important. My bet is that as a means of communication the answer is "no". Sure, old geezers like moi will continue to use the old fangled things, but for younger generations the decline will surely accelerate. Lest anybody think that I'm saying that in 10 years time that the PSTN will evaporate, I'm definitely not saying that. But my suspicion is that its raison d'etre will largely be overtaken by new technologies. Given that telephony is almost 150 years old there are definitely a lot of legacy things baked into everyday life that will still be needed for decades to come. But those needs are increasingly around the edges, and they are slowly but surely getting internet enabled analogs.

What that implies to me is that more and more people are going to just turn the telephony functionality off, or at least find ways to not have it annoy you. Even in my geezerhood, I am sorely tempted to do exactly that given the spam problem. All of this puts the telephants into an interesting situation: having to provide an expensive and heavily regulated service that is in free fall. Long gone are the days when telephony was a profit center. Mobile providers haven't charged for telephony in ages, and landlines are becoming  jokes to outwit clueless teenagers. One thing we can be sure of though: if something ain't a profit center but you can't get rid of it, you put exactly as little investment into it as possible.

The other thing that has been happening since I wrote the original post is the Covid-19 pandemic. They say these kinds of things have a way of really reshaping society. It was certainly true of the previous pandemic, especially for gay people. HIV and the corrupt and incompetent response to it shaped a generation of activists who had no other choice but to take things into their own hands to affect change. It also forced several generations worth of tireless work on anti-retrovirals and pushed the envelope of biology in general. We are surely reaping the rewards of all of that work, including the possibility that HIV drugs like Lopinavir may be helpful for Covid-19 too.

Since Covid-19 affects everybody, it is likely that the change is going to be enormous. Working at home as well as using things like Zoom for social interaction has become a major change in daily life. It is highly likely that this petri dish we've been thrown in to is going to force us to especially look at why we need to go into the office every day of every week. I could be wrong, but telephony is probably not the go-to answer for either telework or social interaction. This further contributes to its downward spiral and relevance.

While it seems to be a pretty safe bet to say that telephony qua telephony is in decline, it's still an open question in my mind whether that also applies to SIP qua SIP. The work on G.164 identities seems to me to be a lot of work for little long term gain. But I really don't know whether SIP is used much outside of telephony. Most of the new communication services don't seem to have any inter-provider needs, so SIP isn't a requirement. And if you take the inter-provider problem off the table, the spam problem is reduced to the more tractable intra-provider problem.

So is there actually a DKIM-like analog problem in SIP beyond telephony? I think that it's an open question. Centralization has become the watchword for the last several decades. On the other hand, centralization is starting to create backlash as nations and governments watch it wearily. A Bell-like breakup of, say, Facebook could happen. Or nations might take back messaging and video services and we'll need inter-provider connectivity after all. Who knows? I sure don't.

As always, one engineer, three opinions.

Saturday, February 15, 2020

SIP: what about the From: header? No love?

I posted a while ago with questions about SIP's STIR/SHAKEN stuff (RFC 7340 has a very good problem statement worth reading) that I became aware of. Well, it turns out that it's really trying to shore up the miserable P-Asserted-Identity mess. I actually kind of like saying I told you so, so if that makes me a horrible person I'll own it. For SIP, P-Asserted-Identity is really nothing more than passing on the PSTN identities (caller-id, e.164 addresses). Which begs the question of whether you can trust their contents. The answer now is the same as the answer 15 years ago, and that answer is... no. The only real surprise is how long it took for $EVIL to figure this out.

For many, many reasons trying to give some guarantees about whether somebody is allowed to assert a given e.164 address is a very hard problem. The new standards have had to deal with this and it's not pretty. Not a knock on the work, it's just that the problem is really, really awful and hard: the PSTN never, ever envisioned the sort of trust model that has become common, or the financial incentives to not care about the problem. This is going to take a great amount of effort to roll out and that's just the beginning. $EVIL is not a static thing, and if I understand correctly there are some pretty significant holes that can't really be plugged.

Which got me to thinking. Why in the hell do I even care about e.164 addresses in this day and age? They are, actually, quite a nuisance. I can barely remember my own phone number, let alone anybody else's. SIP from the very beginning didn't really envision co-existing with the PSTN. It was a new way to use internet mechanisms instead of the inadequate PSTN standards. SIP was just like email, in that it had headers, one of which is a From: header that is identical to email addresses. The idea is that if you wanted to email me, you'd use mike@mtcc.com. If you wanted to call me, you'd use mike@mtcc.com. Simple. It was't until telcos started getting interested in SIP that PSTN integration started rear its ugly head. And hence the sorry situation we're in today.

So let's go retro for a moment. Maybe the original idea of using From: addresses wasn't so bad an idea, and is certainly widely in use today. A lot has changed since the telcos have butted into the VoIP world. For one thing, it's practically extinct. If it weren't for cell phones and the last mile it probably would be extinct. From what I can tell, it's IP the second it hits telco equipment. My little provider here in the Sierra has a gadget that terminates POTS and sends it out as SIP and RTP,  has a DSLAM and backhauls IP over fiber and is battery backed up from CO. Pretty nifty that stuff I've worked on is a block or two away. I think it's pretty much the same for the cellular RAN networks. Since POTS is pretty much dead that just leaves cellular. And if you believe the hype about 5G it will be pretty redundant since it supposedly deals jitter, latency and other things that make VoIP a little dodgy on 4G. I'm not sure of the exact details, but I'll take them at their word that VoIP will be pretty acceptable on 5G. Update: found out that VoLTE is a Thing. So PSTN stuff is now almost completely redundant.

So it's a pretty SIP-y world, and it's about to get a lot more. If I have a SIP UA on my phone, I can completely decouple who provides the bits from who provides the rendezvous services. And I can guarantee you that the telcos are not going to be my first choice. So I may well get my wish that the From: address becomes what people expect on an incoming call, not PSTN anachronisms. So all is good, right? Well, no. Not quite. We still have the problem of spoofed addresses, but now it's put on the From: header instead of the P-Asserted-Identity header.

As far as I can tell (and i could be wrong because there's a mountain of SIP RFC's), there's really not a viable end to end or end to middle or middle to middle kind of way of asserting identity. Yes, I know there is an RFC for S/MIME, but client certs have never seen any wide adoption, and probably never will. And S/MIME is really about end to end crypto which while useful, is not exactly problem that SIP's version of the "caller id" problem is trying to solve.

What we learned with email back in the DKIM days is that end-to-end authentication is a hopeless task. Domain based aggregation, on the other hand, seemed quite tractable. That is a domain can claim responsibility for a particular message (email for DKIM) as having come or passed through its infrastructure. The way we characterized is that DKIM is a "blame me" mechanism if something malicious happened with one of its users. The tradeoff that DKIM  made, however, is that you really don't know if the user part of the email address is who they say they are. But for the purposes of reporting abuse that's not necessary: it's really the sending provider's problem to figure that out. As it turned out, a lot of providers and probably all of the major providers nowadays require SMTP auth. I'm not sure if there was any cause and effect from DKIM to adoption of SMTP auth, but it was certainly in the air at the time.

Now back to SIP. Given the spam we're seeing it sure would be nice to have a "blame me" mechanism to see who injected a particular piece of voice spam into the SIP legs of the INVITE. Reputations can be aggregated at a domain level, and signing policies can be advertised for evaluation of the message. While I might not trust my provider on every front, our interests are alined  when dealing with spam and misuse. Even if I can't verify the incoming INVITE directly (say, you're on a 4G phone), I do trust that my provider can verify it on my behalf and they could stuff the verified message's From: into the caller-id, or somesuch. With VoLTE, they're using SIP so you wouldn't even need to do anything heroic: just show the From: address.

A nice property of this is that the unpluggable holes with e.164 address security aren't a problem in a world that is becoming more and more native SIP. We should be looking forward to that future in addition to any backward looking legacy problems. DKIM has been amazingly successful and extremely widely deployed with tremendous volumes. And since email message structure is the template from many protocols including SIP, it should be pretty easily transferable. In fact, in the day I actually wrote a SIP DKIM signer and verifier just for fun, so mechanically there's not any problems.

There are definitely questions to be answered though: Should verbs other than INVITE be signed? Should the replies? I'm not sure of what benefit there would be to signing REGISTER, for example, but it may be just as well to sign everything regardless of whether it's useful. And then there is the every present  problem of B2BUA's (back to back UA's). Honestly, these aren't entirely different than the Mailing List Problem with DKIM. The answer there is that the entity in the middle that breaks the signature should resign it. And it's probably not as bad a problem as with mailing lists because if I understand correctly B2BUA's are mostly being used as session border controllers which are typically in the same domain as the sender which is typically not the case with mailing lists.

In conclusion while it might be worthwhile to solve the E.164 problem, we definitely need to look to a future where it eventually shrivels up and dies. The future is being able to verify the sending domain of SIP messages, and especially knowing whether the From: address checks out which should be the case in a large percentage of signaling traffic. That would greatly help the voice spam problem since we would be able to reliably blame the sending domain.