(Attempting to) Read Papers: Times, Clocks and Ordering of Events

DISCLAIMER: This is not a detailed explanation of the paper, but rather just my thoughts on it.

So I’ve been trying to get started on reading CS papers with a mediocre amount of success. I started with the Byzantine Generals paper but that sort of melted my brain. It was a case of me going “Holy shit, I get it!” followed by “Goddamnit, I don’t get it” again and again before I threw it away. I need more practice at reading papers and not allow myself to frustrated, is what I’m saying. I’ll go back and read the paper eventually. (This person has a nice explanation of Byzantine Generals that walks through and explains the algorithm. So if you’re like me, you might want to read it).

So the next attempt (and the subject of this post) was Times, Clocks and the Ordering of Events in a Distributed System by Leslie Lamport. I fared a bit better here, in that I got the main ideas the paper was trying to convey. But there were small pieces that I spent way too much time on, which I’ll get into.

A brief outline of the paper:

  • We need a way to sequentially order events occuring in different, independent processes (AKA a distributed system).
  • Hey, here’s this algorithm we can use. It uses logical clocks (really a counter that counts up for every event). It’s super simple! :)
  • But wait, events can end up out of order anyway because we didn’t take into account the things that happen outside of the distributed system. :(
  • Hey, we can fix that if we replace all the logical clocks with real world physical clocks. Problem solved! :)
  • But wait, these clocks tend to drift apart because they are not perfect. Things are going wrong again! :(
  • Hey, we can use a specialized version of our previous algorithm to synchronize the clocks. The clocks will still drift but we now know by how much it can drift. Now we can use physical clocks again to solve our original problem. Hurray! :) pops open the champagne

(Also there’s this theorem and proof at the end that relates the amount of drift can occur between two physical clocks to the distance between the clocks. I just took the theorem at face value and didn’t read the proof because by this time I had spent way too long and I was so done with the paper that I just wanted to finish).

The big ideas of the paper were conveyed easily I think and I was able to grasp them. But there were a couple of mathematical equations here and there that sniped me and I spent a long time banging my head on. I don’t know how it normally is with math in published papers. Maybe it’s because of the word or space limit but in some places Lamport would be like: “Here’s this thing. And from this we can get this other thing.” and I’m like: “Whoa, whoa there! Hold up! How’d we get to this other thing? Show your work, dude!“. I had to sit down with a pen and paper and try to figure out how he got the other thing, doing proper math and stuff (Hello calculus! Hello discrete math! Didn’t think I’d ever see you again). Even then, there was one thing that I still couldn’t derive so I just gave up, took Lamport’s word for it, and moved on. If I’m going to continue reading papers (a given considering that I’m starting my Masters soon) I can probably expect to see more of this. Oh boy!

This paper has a reputation of being one of the most cited papers in CS and with good reason. The ideas in this paper influenced every idea in distributed systems that followed it. That’s my belief anyway. I don’t have enough experience or knowledge in this field to back that claim so don’t hit me. I’m just going with the opinion of the crowd.

I wish I could say say something more profound or interesting or insightful about it. I can tell you that the impact it had on me was that it left me very tired. Maybe someday in the future, when I am a computery boffin who can do computers good, I’ll have something more interesting to say. Maybe not. We’ll see.