Intro
Being year's passover 2010/11 it may be late to comment on a paper I found in a pile of IEEE Computer magazines. The article dates back to 2006. The article is available on the net as a report from the University of California at Berkeley:
The Problem with Threads
Edward A. Lee
Electrical Engineering and Computer Sciences
University of California at Berkeley
Technical Report No. UCB/EECS-2006-1
http://www.eecs.berkeley.edu/Pubs/TechRpts/2006/EECS-2006-1.html (pdf)
January 10, 2006
IEEE Computer 39(5):33-42, May 2006That IEEE Computer magazine was themed "Future Operating Systems (and) Programming Paradigms" and the article was a cover article.
The quotes I use here are from the report (IEEE Computer have edited a little).
I advice you to read the well-written article. And it's outspoken and fun reading!
Disclaimer
This note's title uses plural "problems", not to make the titles equal - and it is a mix between a review and my own personal views and experience.
Over some years I have published along the line of that article, see http://www.teigfam.net/oyvind/pub/pub.html. I also have some posts here about the same theme, see http://oyvteig.blogspot.com/2009/07/013-overview.html.
"It is widely acknowledged that concurrent programming is difficult" (Lee)
The magazine article says "Concurrent programming is difficult,..". This is what caused my interest, in addition to the title. But this first sentence seems to contradict the title. Why doesn't he say "..that programming with threads is difficult"?
I have attended the WoTUG- and now CPA- series of conferences for quite some years. What I have learned over the years is the opposite:
"Contrary to the belief out there, concurrent programming is easy" (WoTUG/CPA)
Here is the semantics of that statement: "Programming with threads is difficult, but programming with processes is easy". Later in his paper, I think this is also what Lee says. But he feels bleak about the future, while the conferences (and Lee's paper) try to do something with it.
When I programmed in occam 2 for some ten years, concurrent programming was indeed easy. I had no interrupts, only processes, so asynchronous I/O was tamed before it could overwrite common data. I had fully encapsulated processes, with parallel usage rules checked by the compiler. Processes only communicated over zero-buffered synchronous unidirectional named channels. Buffering had to be coded explicitly. Functions were side effect free, as they were not allowed to use channels. And, the compiler checked aliasing errors (so linked lists were not possible..). Occam 2 has been developed since it died as a commercial language, and now holds a plethora of new primitives.
Yes, I did have to learn the art of the trade. Some has called it Process Oriented Programming.
"The problem is that we have chosen concurrent abstractions that do not even vaguely
resemble the concurrency of the physical world." (Lee)
Even if "not even vaguely" is Lee's falsification term here I would disagree.
The physical world is one axis, concretizations of abstract descriptions another. If my concurrent abstraction may be given a name so that I understand what it basically does, and may be seen as a cognitive entity, then I'm ok. I think this is what Wittgenstein in 1921 proposed in his Tractacus.
"Channels", "rendezvous", "port", "protocol", "process" and "alternation" are taken from daily vocabulary. These are good terms that I grasp, but they fit more or less to their physical (if any) counterpart. These are "vague" terms, really - so Lee might have his words ok.
The idea that "object" as used in "object orientation" (OO) should resemble physical world objects is not needed for me. It has been used to describe OO ("object car has wheels"), but after those examples then just forget: "window has buttons" or worse "window has drop-down menu" makes little sense for real windows.
By the way: "thread" is perhaps a good term! I have seen the result when a cat has played with a yarn of a long single thread. Maybe they should have used better weaving terms? Weaving could be seen as quite formal! And if the terms chosen had been better, maybe the solution would have been better, recursing into Lee's statement?
Process model
In my opinion the CSP "process model" is much easier to understand than "object model". The definition of an object these days emphasizes encapsulation more than before. Some years ago, inheritance and polymorphism were more "modern" to include in the object model.
"Nondeterminism of threads" (Freely from Lee)
I'll fill in with an example. The libcsp library by R. Beton furnished Posix light weight thread programmers with a clean CSP API. However, the struct CSP_Channel_t contained three pthread_mutex_t and two pthread_cond_t, in addition to a CSP_Alt_t that contained one of each. Reading it out aloud:
(*) I don't know how many of these are there to handle asynchronous I/O (like interrupt) or preemption.
You could argue that's it's the channel construct that's wrong, but then bear in mind that it is possible to build a "naked" CSP library without any mutexes, condition variables or critical regions. Again, asynchronous I/O of course needs to be handled. However, preemption has not been needed in any CSP-type embedded system I have seen, but I have learnt that it is necessary for an experimental language called Toc which handles deadline scheduling, see this paper by Korsgaard and Hendseth.
Back to nondeterminism. I learned something from Lee's description. I have always though of nondeterminsim as what (should) happen in the occam ALT struct (as opposed to PRI ALT), or what happens with Promela's :: operator. These nondeterminisms are wanted, and would explicitly read "if more than one choice are ready, select any one of them". If that behaviour is not what I want, I instead use the determinsitic construct.
But here Lee points out that it's unwanted altogether in the thread'ed world. (Later on he goes on to describe the "judicious nondeterminism" - which is the one I am used to.)
WYSIWYG semantics (Welch)
I have discussed this in note [007 - Synchronous and asynchronous]. Here is a quote from Welch et.al:
"This pattern can be made to work robustly in Java" (Lee)
Lee shows how easy it is to make erroneous multithreaded code in Java. The observer pattern for one thread, and then the obvious thread safe version is shown. Only, the "thread safe" is only an attempt at it, as Lee shows. There is a potential deadlock lurking.
A couple of memories spring to my mind.
First, before the WoTUG-19 conference I thought that it might be a good idea to present the commstime program (a very simple but effective benchmark program for concurrency) in the new Java language. I did, but concluded that the "Java 1.0's PipedInput/OutputStream classes seem impractical to use because of an implementation error at Sun". I had my fingers burnt on trial one.
But here was a language with some built-in mechanisms for concurrency, so the community started building CSP libraries in Java. Commercial occam was dying, and so was the transputer. There was a void. So JCSP (Communicating Sequential Processes for Java) was developed by University of Kent at Canterbury. JCSP is being updated to have become a viable concurrency platform for Java; I note that last public release is from August 2010 (this is written in Dec. 2010).
There were all these experts, learning all the intricacies of synchronized and Java's monitor. By 1998 all seemed to work fine. Until a student some two years and who knows how many uses later found a simple case where it deadlocked. The result was that parts of JCSP was formally verified and then fixed. Read about this exiting adventure here ("Formal Analysis of Concurrent Java Systems" by Peter H. Welch and Jeremy M.R. Martin). The error was subtle, but the fix was simple.
Second, about the same time, Per Brinch Hansen wrote a letter that floated around on the internet, later to have become seminal. It is very interesting reading, because what he's basically saying is that the Java designer's at Sun hadn't done their homework. Here's his ingress:
"Fixing threads by more aggressive pruning" (Lee)
Lee describes how they in the Ptolemy project ("an ongoing project aimed at modeling, simulating, and designing concurrent, real-time, embedded systems") used several established software engineering processes like a new "code maturity rating system", design reviews, code reviews, nightly builds, regression tests, and automated code coverage metrics.
Even with all this going on for them, Lee describes that "No problems were observed until the code deadlocked on April 26, 2004, four years later." Lee goes on to say that:
Lee points out that avoiding deadlocks in a system based on semaphores is just to follow some well-known rules:
Use of software patterns
Lee also discusses patterns. I have myself used patterns like the "The 'knock-come' deadlock free pattern", described in Note 007. But Lee has a point when he states that:
WYSIWYG semantics and timing constraints
This takes me to a point that I am not so sure about. WYSIWYG semantics will not include timing constraints? In such a system the protocol between processes describe all I need to know? Right? No, I don't think so. I think I need to describe (as a comment or whatever) what the maximum blocking time is. Or really, maximum time between any send and response (in any message based system).
Usually one just discards this problem by stating that an event-action takes zero time. For most systems this is ok. But if we may block 100 ms, this needs to be propagated, just like with the Toc language mentioned above by Korsgaard and Hendseth.
If a protocol is decorated with timing constraints and those are propagated, then You Can See What You Get. But only then.
Time messes up any good argument!
Real-time Java
Rounding the century my company (Navia/Autronica) was a member of the J Consortium - Real-Time Java™ Working Group. It was set up by NIST and backed by HP, Microsoft, Newmonics and others. These days, the www.j-consortium.com honours its ancestry, but now it's about "real-time medical application".
I was the one doing the work for Navia/Autronica, in the Real-Time Java Working Group (RTJWG).
I remember the endless discussions about how to handle "asynchronous I/O". I had been working occam for 10 years then, and that problem as a problem was non-existing. As mentioned earlier, interrupt handlers were proper occam processes, where the interrupt vector was mapped as a channel, possible to use in an ALT (to be able to receive outputs as well). Results were preferably sent on a channel that never was in a steady state to block (by use of an "overflow buffer" process (two processes in series)). This was so nice, and there was no new thinking about an interrupt process or any process. They were born equal. The process model stayed.
"Alternatives to threads" (Lee)
Lee discusses several alternatives, but for me the most interesting is the fact that the Ptolemy II project also has a "rendezvous domain CSP" API. Here is their class hierarchy. They state that it's based on C.A.R. Hoare's 1978 version of CSP. A process in the 1978 version sends to a named process, whereas in the 1985 version a process sends to a named channel, as in occam. I believe Hoare changed it after the Ada experience, in which his 1978 solution hindered building of precompiled (?) libraries. (Please send me a mail if Ptolemy II really sends on a named channel, so that I can fix.)
Using a CSP class library is A Quite Good Thing. This criticism falls in Ptolemy as well as JCSP (mentioned earlier). If a channel (or whatever well thought out concurrency communication primitive..) is not a "first rate citizen" of the language CSP (or..) will "never" succeed if this is all that's supplied in the future.
In a way it's "pruning of nondeterminstic" cognitive thinking, where each time something is read it has to be understood through a hierarchy. Every time! Not nice to say about OO which has its benefits and uses! A compiler would not know what a process is, it can't do parallel usage checks. At best it would be an add-on tool.
After Lee's 2006 article: "The Go programming language" (Google)
But A Good Thing, invented after Lee's paper, is Google's Go programming language. Its concurrency is based on the CSP model, and uses channels as interprocess communication. Even if it "does not provide any built-in notion of safe or verifiable concurrency", it is a step in the right direction. I let the Go people have the final words in this note:
.
.
I have attended the WoTUG- and now CPA- series of conferences for quite some years. What I have learned over the years is the opposite:
"Contrary to the belief out there, concurrent programming is easy" (WoTUG/CPA)
Here is the semantics of that statement: "Programming with threads is difficult, but programming with processes is easy". Later in his paper, I think this is also what Lee says. But he feels bleak about the future, while the conferences (and Lee's paper) try to do something with it.
When I programmed in occam 2 for some ten years, concurrent programming was indeed easy. I had no interrupts, only processes, so asynchronous I/O was tamed before it could overwrite common data. I had fully encapsulated processes, with parallel usage rules checked by the compiler. Processes only communicated over zero-buffered synchronous unidirectional named channels. Buffering had to be coded explicitly. Functions were side effect free, as they were not allowed to use channels. And, the compiler checked aliasing errors (so linked lists were not possible..). Occam 2 has been developed since it died as a commercial language, and now holds a plethora of new primitives.
Yes, I did have to learn the art of the trade. Some has called it Process Oriented Programming.
"The problem is that we have chosen concurrent abstractions that do not even vaguely
resemble the concurrency of the physical world." (Lee)
Even if "not even vaguely" is Lee's falsification term here I would disagree.
The physical world is one axis, concretizations of abstract descriptions another. If my concurrent abstraction may be given a name so that I understand what it basically does, and may be seen as a cognitive entity, then I'm ok. I think this is what Wittgenstein in 1921 proposed in his Tractacus.
"Channels", "rendezvous", "port", "protocol", "process" and "alternation" are taken from daily vocabulary. These are good terms that I grasp, but they fit more or less to their physical (if any) counterpart. These are "vague" terms, really - so Lee might have his words ok.
The idea that "object" as used in "object orientation" (OO) should resemble physical world objects is not needed for me. It has been used to describe OO ("object car has wheels"), but after those examples then just forget: "window has buttons" or worse "window has drop-down menu" makes little sense for real windows.
By the way: "thread" is perhaps a good term! I have seen the result when a cat has played with a yarn of a long single thread. Maybe they should have used better weaving terms? Weaving could be seen as quite formal! And if the terms chosen had been better, maybe the solution would have been better, recursing into Lee's statement?
Process model
In my opinion the CSP "process model" is much easier to understand than "object model". The definition of an object these days emphasizes encapsulation more than before. Some years ago, inheritance and polymorphism were more "modern" to include in the object model.
"Nondeterminism of threads" (Freely from Lee)
I'll fill in with an example. The libcsp library by R. Beton furnished Posix light weight thread programmers with a clean CSP API. However, the struct CSP_Channel_t contained three pthread_mutex_t and two pthread_cond_t, in addition to a CSP_Alt_t that contained one of each. Reading it out aloud:
one channel at most needed 4 mutexes and 3 condition variables (*)This, just to attempt to tame the "nondeterminism of threads" as Lee calls it.
(*) I don't know how many of these are there to handle asynchronous I/O (like interrupt) or preemption.
You could argue that's it's the channel construct that's wrong, but then bear in mind that it is possible to build a "naked" CSP library without any mutexes, condition variables or critical regions. Again, asynchronous I/O of course needs to be handled. However, preemption has not been needed in any CSP-type embedded system I have seen, but I have learnt that it is necessary for an experimental language called Toc which handles deadline scheduling, see this paper by Korsgaard and Hendseth.
Back to nondeterminism. I learned something from Lee's description. I have always though of nondeterminsim as what (should) happen in the occam ALT struct (as opposed to PRI ALT), or what happens with Promela's :: operator. These nondeterminisms are wanted, and would explicitly read "if more than one choice are ready, select any one of them". If that behaviour is not what I want, I instead use the determinsitic construct.
But here Lee points out that it's unwanted altogether in the thread'ed world. (Later on he goes on to describe the "judicious nondeterminism" - which is the one I am used to.)
WYSIWYG semantics (Welch)
"In fact, we have to know about all other threads that might execute (something that may not itself be well defined), and we would have to analyze all possible interleavings. We conclude that with threads, there is no useful theory of equivalence."Welch here has introduced the term WYSIWYG semantics. Lee in effect shows that threading does not have this type semantics.
...
"Threads, on the other hand, are wildly nondeterministic. The job of the programmer is to prune away that nondeterminism. We have, of course, developed tools to assist in the pruning. Semaphores, monitors, and more modern overlays on threads (discussed in the following section) offer the programmer ever more effective pruning. But pruning a wild mass of brambles rarely yields a satisfactory hedge." (Lee)
I have discussed this in note [007 - Synchronous and asynchronous]. Here is a quote from Welch et.al:
"One crucial benefit of CSP is that its thread semantics are compositional (i.e. WYSIWYG), whereas monitor thread semantics are context-sensitive (i.e. non-WYSIWYG and that's why they hurt!). Example: to write and understand one synchronized method in a (Java) class, we need to write and understand all the synchronized methods in that class at the same time -- we can't knock them off one-by-one! This does not scale!! We have a combinatorial explosion of complexity!!!"Observe that I in note 007 discuss this also relative to asynchronous "send and forget" message based systems.
(Letter to Edward A. Parrish, The Editor, IEEE Computer. Peter Welch (University of Kent, UK) et al. http://www.csp-consortium.org/images/IEEE_Computer.pdf (1997)
"This pattern can be made to work robustly in Java" (Lee)
Lee shows how easy it is to make erroneous multithreaded code in Java. The observer pattern for one thread, and then the obvious thread safe version is shown. Only, the "thread safe" is only an attempt at it, as Lee shows. There is a potential deadlock lurking.
A couple of memories spring to my mind.
First, before the WoTUG-19 conference I thought that it might be a good idea to present the commstime program (a very simple but effective benchmark program for concurrency) in the new Java language. I did, but concluded that the "Java 1.0's PipedInput/OutputStream classes seem impractical to use because of an implementation error at Sun". I had my fingers burnt on trial one.
But here was a language with some built-in mechanisms for concurrency, so the community started building CSP libraries in Java. Commercial occam was dying, and so was the transputer. There was a void. So JCSP (Communicating Sequential Processes for Java) was developed by University of Kent at Canterbury. JCSP is being updated to have become a viable concurrency platform for Java; I note that last public release is from August 2010 (this is written in Dec. 2010).
There were all these experts, learning all the intricacies of synchronized and Java's monitor. By 1998 all seemed to work fine. Until a student some two years and who knows how many uses later found a simple case where it deadlocked. The result was that parts of JCSP was formally verified and then fixed. Read about this exiting adventure here ("Formal Analysis of Concurrent Java Systems" by Peter H. Welch and Jeremy M.R. Martin). The error was subtle, but the fix was simple.
Second, about the same time, Per Brinch Hansen wrote a letter that floated around on the internet, later to have become seminal. It is very interesting reading, because what he's basically saying is that the Java designer's at Sun hadn't done their homework. Here's his ingress:
"The author examines the synchronization features of Java and finds that they are insecure variants of his earliest ideas in parallel programming published in 1972–73. The claim that Java supports monitors is shown to be false. The author concludes that Java ignores the last twenty-five years of research in parallel programming languages."Per Brinch Hansen then goes on to describe exactly what's wrong with Java. Observe that Brinch Hansen in 1993 invented the SuperPascal language, whose concurrency features are based on a subset of occam 2!-)
(Hansen, Per Brinch, "Java’s insecure parallelism" in ACM SIGPLAN Notices, V.34(4) pp.38-45, April 1999. http://brinch-hansen.net/papers/1999b.pdf)
"Fixing threads by more aggressive pruning" (Lee)
Lee describes how they in the Ptolemy project ("an ongoing project aimed at modeling, simulating, and designing concurrent, real-time, embedded systems") used several established software engineering processes like a new "code maturity rating system", design reviews, code reviews, nightly builds, regression tests, and automated code coverage metrics.
"The strategy was to use Java threads with monitors." (Lee)I wander if they had had access to Per Brinch Hansen's paper, or the Welch and Martin paper. Surely they must have been familiar with the problems.
Even with all this going on for them, Lee describes that "No problems were observed until the code deadlocked on April 26, 2004, four years later." Lee goes on to say that:
"Regrettably, I have to conclude that testing may never reveal all the problems in nontrivial multithreaded code" (Lee)When a new version of some software is released, there has to be thorough testing. Case closed. But are we some times testing too much? Relying on testing where relying on other means would have been better?
Lee points out that avoiding deadlocks in a system based on semaphores is just to follow some well-known rules:
"Of course, there are tantalizingly simple rules for avoiding deadlock. For example, always acquire locks in the same order [32]. However, this rule is very difficult to apply in practice because no method signature in any widely used programming language indicates what locks the method acquires. You need to examine the source code of all methods that you call, and all methods that those methods call, in order to confidently invoke a method. Even if we fix this language problem by making locks part of the method signature, this rule makes it extremely difficult to implement symmetric accesses (where interactions can originate from either end). And no such fix gets around the problem that reasoning about mutual exclusion locks is extremely difficult. If programmers cannot understand their code, then the code will not be reliable" (Lee, italics by me)In other words semaphores don't have WYSIWYG semantics!
Use of software patterns
Lee also discusses patterns. I have myself used patterns like the "The 'knock-come' deadlock free pattern", described in Note 007. But Lee has a point when he states that:
"Programmers will make errors, and there are no scalable techniques for automatically checking compliance of implementations to patterns." (Lee)This is very true. But the shortest programs that function well with a pattern may become part of a repository for it. Real coding undermines patterns! Lee goes on:
"More importantly, the patterns can be difficult to combine. Their properties are not typically composable, and hence nontrivial programs that require use of more than one pattern are unlikely to be understandable" (Lee)This is less universally true, I guess. If a pattern is based on composable components with WYSIWYG semantics, then combining these patterns should be ok semantically. Some times it's wanted to change Master and Slave processes in knock-come. Doing this has no other side effect than wanted. In a real-time system we need to know the timing constraints. If any process decides not to listen on a channel for some time, say 100 ms, then any one sending on that channel would have to block 100 ms. This blocking is no worse than an asynchronous message not been handled by some state for 100 ms. In fact it's better: the blocking doesn't fiddle with the toothwheels, since it would not even see a message that it would else have to schedule for itself some time later.
WYSIWYG semantics and timing constraints
This takes me to a point that I am not so sure about. WYSIWYG semantics will not include timing constraints? In such a system the protocol between processes describe all I need to know? Right? No, I don't think so. I think I need to describe (as a comment or whatever) what the maximum blocking time is. Or really, maximum time between any send and response (in any message based system).
Usually one just discards this problem by stating that an event-action takes zero time. For most systems this is ok. But if we may block 100 ms, this needs to be propagated, just like with the Toc language mentioned above by Korsgaard and Hendseth.
If a protocol is decorated with timing constraints and those are propagated, then You Can See What You Get. But only then.
Time messes up any good argument!
Real-time Java
Rounding the century my company (Navia/Autronica) was a member of the J Consortium - Real-Time Java™ Working Group. It was set up by NIST and backed by HP, Microsoft, Newmonics and others. These days, the www.j-consortium.com honours its ancestry, but now it's about "real-time medical application".
I was the one doing the work for Navia/Autronica, in the Real-Time Java Working Group (RTJWG).
I remember the endless discussions about how to handle "asynchronous I/O". I had been working occam for 10 years then, and that problem as a problem was non-existing. As mentioned earlier, interrupt handlers were proper occam processes, where the interrupt vector was mapped as a channel, possible to use in an ALT (to be able to receive outputs as well). Results were preferably sent on a channel that never was in a steady state to block (by use of an "overflow buffer" process (two processes in series)). This was so nice, and there was no new thinking about an interrupt process or any process. They were born equal. The process model stayed.
"Alternatives to threads" (Lee)
Lee discusses several alternatives, but for me the most interesting is the fact that the Ptolemy II project also has a "rendezvous domain CSP" API. Here is their class hierarchy. They state that it's based on C.A.R. Hoare's 1978 version of CSP. A process in the 1978 version sends to a named process, whereas in the 1985 version a process sends to a named channel, as in occam. I believe Hoare changed it after the Ada experience, in which his 1978 solution hindered building of precompiled (?) libraries. (Please send me a mail if Ptolemy II really sends on a named channel, so that I can fix.)
Using a CSP class library is A Quite Good Thing. This criticism falls in Ptolemy as well as JCSP (mentioned earlier). If a channel (or whatever well thought out concurrency communication primitive..) is not a "first rate citizen" of the language CSP (or..) will "never" succeed if this is all that's supplied in the future.
In a way it's "pruning of nondeterminstic" cognitive thinking, where each time something is read it has to be understood through a hierarchy. Every time! Not nice to say about OO which has its benefits and uses! A compiler would not know what a process is, it can't do parallel usage checks. At best it would be an add-on tool.
"Coordination languages" (Lee)
Finally Lee taks about coordination language, which would sit on top of a more a standard language (with new keywords?) The Ptolemy II rendezvous CSP "framework is properly viewed as a coordination language". There nondeterminsm is explicit when wanted, else it's determinstic.
After Lee's 2006 article: "The Go programming language" (Google)
But A Good Thing, invented after Lee's paper, is Google's Go programming language. Its concurrency is based on the CSP model, and uses channels as interprocess communication. Even if it "does not provide any built-in notion of safe or verifiable concurrency", it is a step in the right direction. I let the Go people have the final words in this note:
"Why build concurrency on the ideas of CSP?End
Concurrency and multi-threaded programming have a reputation for difficulty. We believe the problem is due partly to complex designs such as pthreads and partly to overemphasis on low-level details such as mutexes, condition variables, and even memory barriers. Higher-level interfaces enable much simpler code, even if there are still mutexes and such under the covers.
One of the most successful models for providing high-level linguistic support for concurrency comes from Hoare's Communicating Sequential Processes, or CSP. Occam and Erlang are two well known languages that stem from CSP. Go's concurrency primitives derive from a different part of the family tree whose main contribution is the powerful notion of channels as first class objects."
(Go programming language faq)
.
.