Last edit: 3.Jan.2012
On the airplane home, the other day, I came upon an idea that I thought would make life easier. When I told a guy at work, he said, "oh, it's already in Linux".
I assume that you're somewhat familiar with "channels". In this respect I was raised with the occam language. (You would find Wikipedia-articles for most of the names I'll throw here. So I won't get too deep into some details.)
Here, a channel is based on CSP (later), and it is one-directional, non-buffered and blocking. It's even one-to-one. It connects tight processes to each other. It also allows threads to be equally tight. So, the first process that comes to a channel (be it sender or receiver), must be descheduled and not ever run again before the second process arrives on the channel. At that time there's a memcpy of the correct data from the sender's private context, to the receiver's matching field. It is possible to make it type safe. The message is passed across, while both sender and receiver are still. In Ada it's called a rendezvous. In Go it has no name. When the memcpy is done, usually the second on the channel just runs on, while the first is put in the ready-to-be-scheduled queue.
There is a mechanism called on ALT or a "select", where a process may wait for a number of (input) channels. A channel may be with data, without data, or timeouts. In my world (opposite to Linux select), there is exactly one cause that may trigger the SELECT. I will use capitalized "SELECT" here to cover any such idiom, to differentiate from the Linux "select". The first cause that triggers the SELECT will be stored and cause a later rescheduling, but any potential causes that "lost", will not be allowed to cause an additional triggering. Only one cause per scheduling of a process. Very different from Linux select.
Said differently: the SELECT will put all the channels in the set to "first was here". When the first "second" contender on a channel in the set arrives, all the "first" attributes are removed from the rest of the set. This is a beautiful algorithm, that I think was first implemented in microcode on the first transputer processer in 1984. Inmos in Bristol, UK, made it, and David May and Tony Hoare (the father of CSP, "Communicating Sequential Processes" process algebra) were the main architects. The transputer basically was a machine designed to run the subset of CSP that was called occam. It's 28 years ago. Ada based its concurrency model on CSP. And 26 years after occam, Google designed the Go programming language, which also has based its concurrency model on CSP, just like the same guys had done at Lucent some 10 years before with their Limbo language. Except, at that time they didn't state the roots of their "chan of protocol". And transputer designer David May now is active with a company called XMOS, and a language called XC, based on the same ideas. So, CSP-based concurrency seems to be increasingly coming.
With this scheme communication equals synchronization.
In CSP they talk about "choice". It may be deterministic or "external" or nondeterministic or "internal" (often written as [] and |~|).
The SELECT is for the teacher who assigns work to his students. He then falls to sleep (or does other things) and is awaked when the first student is finished. The student is then served, and the SELECT is entered again. The new set may include the first student, when some fair scheme is used to avoid him spam the teacher. Or the SELECT may be non-deterministic. Read in the literature, f.ex. the Promela Wiki-article (more below).
With this scheme, the teacher may say that one of the students is not allowed to deliver his result before after two hours. So, there may be timers in the SELECT set. This would also make busy polling possible, should this be needed at the application level. But the run-time scheduler NEVER needs busy-polling. Another way to say this is that a process always has a cause when it's scheduled. Just think about Java's notify and notifyall, which may schedule a process for no reason!
Also, there may be an INPUT GUARD for each channel (applied to our special student for two hours). This is usually implemented as a conditional to each member of the SELECT set. Think of the SELECT GUARD set as a bit-mask, one bit per element in the set, just like in Linux select.
Now I'm getting closer to the theme of this note: Output guard vs. "channel ready" channel.
Some times it's required to buffer a message on a channel. In occam we used to do this by inserting a single buffer process, or a couple of processes to make one composite process called an "overflow buffer", which made it possible to detect a too eager producer and associated too slow consumer - at this application level. One time I needed a buffer of a hundred buffer positions, and I just started 100 buffers. Easy! The downside was that there would be 101 memory copies, so most often a ring buffer was administered inside the first of the two processes in the composite overflow buffer process.
The alternative could be for the eager producer (it has no idea itself how eager it is) to have something called an OUTPUT GUARD available. If you want to see it, have a look at the Promela process meta language (Wikipedia). This way, both inputs and outputs may be collected in the SELECT. So, having just one channel, plus an "else", then the potential receiver, if it is able to receive, will inform through the output guard if it's ok to send. If not, the else will be taken. The SELECT with both input and output looks like the Linux select, but that one doesn't have the synchronized mechanism in the bottom.
It's a nice mechanism. I have never had it in my world. It was not implemented on the transputer, since it was a multi-core programming language. And implementing it across a network (links) presumably was not worth it. There was another mechanism to get the same functionality. If you know what mechanism, please comment.
An output guard mechanism is good for detecting a broken line. To detected a connected line, there would have to be polling. Busy polling is most often necessary for this (assuming no hw edge interrupt). But the output guard seems not so relevant to handle the fast producer, slow consumer scenario - between internal processes. The same broken / connected scenario also would go for detecting that the other process has stopped, for some reason. Perhaps enough for a new blog note here.
So, here's the solution, to avoid the output guard, to avoid busy polling. The solution that also almost exists in Linux.
A standard channel has zero buffers. Have a look at Promela and Go. They both support buffered channels. So, why not add them to [1]? What should happen when such a channel is full? The channel send just returns a flag saying that it’s full, and the channel would not block in that situation. Observe that blocking is perfectly ok and does not impact what a system is able to do. With enough "parallel slackness" to get all the I/O going at required speed, having a process block (waiting in a non-busy way, for the second to arrive at the channel). But for a driver, close to the "asynchronous world" it's nice not to block to get rid of incoming messages. And it's nice not to send it into an asynchronous message system that may plainly overflow, crash and restart. We need a better system. We need the driver process to not block in that situation and itself handle overflows.
So, in the suggestion, after the return of the buffer full flag, the process knows this and if any incoming message arrives from the external asynchronous world, it may set a hold line, send a hold message or throw away the message - or whatever it decides to. It's up to the process.
In this state, the process will SELECT around a timeout channel and this channel that's triggered by the run-time system, that arrives when there is room in the channel. The receiver has picked out one message or all, that semantics is not so important here.
When this "channel ready" channel fires, the process can send off the last message, which may also be tagged with an overflow bit, on the channel. And it's guaranteed to succeed (unlike in Linux).
I was rather satisfied by inventing this on the plane home. I didn't like that it was already in Linux write and select. I have tried to describe some of the dissimilarities. See [2] for write, EAGAIN or EWOULDBLOCK and [3] for select (or pselect) and fd_set *writefds. It's in the context of writing to a Linux pipe.
But please, could anyone help me with describing how my solution is much "thread safer" than Linux? I need more than this:
I have more of a language, since I would do as described and reasoned above. Linux calls and Linux processes and threads are not understood by C or C++. Up to C++11 at least. And only so much understood by Linux. A C++ library for Linux needed several mutexes and semaphors to implement an ALT/SELECT in the CSP context [4]. So, are write and select usages in Linux really thread safe? I have a scheduler that is driven by the channels, and it's not preemptive. And why do Linux programmers brag that they "program single-threaded"? Is it an indication that the Linux process model is too coarse or difficult or expensive? So that they haven't learned to decompose into processes, when there is state or roles to hide away into a separate process? After object-orientation or OO, it's time for process-orientation as an additional idiom.
Should have been fun to set up a comparison table of the CSP type SELECT with channel ready channel and the similar Linux solution. Even if they are both fruits, I fear that one is apple and the other bananas?
A Linux pipe vs. channel trait
Observe that a Linux pipe is byte-wise. You send so many bytes and then so many bytes, and they are concatenated in the pipe. There has to be a common view of how to pick out each message or token.
This implies that you could read past a token, since a sender may have to break up a message into several chunks. There is no foldback, you cannot push anything back again. So, the receiver has to keep track of these things.
A channel is message based. A full message per chunk. Finito.
Common usage of Linux select
A collegue told me that the common usage of a Linux select is to remove active polling for events from remote machines from the application level, in say a http server (router). He said that he always used select to wait for events on already open sockets. One socket per bit in the select bit map.
Internally between threads he never had used select.
And he rarely used ip addresses internally between processes, so no select there either.
"My idea" - again
I found an interesting sentence in [5]. Here is its process-oriented specification of a buffer: If a buffer is empty it must be ready for input. If it contains some messages, then the buffer must be ready to output the next message required. It is also possible that it will accept further input, but it does not have to. (my comment: is this buffer full?) But here is the interesting sentence: "If further events are to be possible (such as a channel which can report on whether or not the channel is empty), then...". This is "my" channel! I am even more convinced now, that "my" idea was quite good!
---
[1] - http://www.teigfam.net/oyvind/pub/pub_details.html#NewALT
"New ALT for Application Timers and Synchronisation Point Scheduling"
Øyvind Teig and Per Johan Vannebo
Communicating Process Architectures 2009 (CPA-2009)
Peter H. Welch et. al. (Eds.)
IOS Press, 2009, ISBN 978-1-60750-065-0
[2] - http://linux.die.net/man/2/write
[3] - http://linux.die.net/man/2/select
[4] - Search for "C++CSP2"
[5] - Concurrent and real-time systems. The CSP approach. Steve Schneider. Wiley, 2000. Page 210, example about buffers (7.4.1).
.
.
Tech posts by aclassifier
Also see Designer's note page
http://oyvteig.blogspot.com
Full overview: here
Archive by month and about me: here
Last edit: 27.Feb.2012
Full overview: here
Archive by month and about me: here
Last edit: 27.Feb.2012
Saturday, December 3, 2011
Subscribe to:
Post Comments (Atom)
Popular Posts
-
Canal Digital og hjemmenett ("bredbånd") Dette er et lite notat om hvordan du kan konfigurere ruteren din dersom du får internett fra Cana...
-
Updated 5Nov2011. This post is a follow-up from my post 018 - From concrete CDs to abstract iTunes files. And now the old amp broke . Thin...
-
Goal This post is for you who want to move all your music, video, AppStore programs etc. - or rather, your iTunes library - to a network d...
-
1. From Apple Pages to iPhoto and your own paper book I have now for the second time used Apple Pages and iPhoto to make my own books. App...
Øyvind Teig
- aclassifier
- Trondheim, Norway
- Overview of all blog notes here (same number series across all)
My technology blog is at
http://oyvteig.blogspot.com
handicraft blog is at
http://oyvteig-2.blogspot.com
behind the collection blog is at
http://oyvteig-3.blogspot.com
home page is at
http://www.teigfam.net/oyvind (publications)
PS: just call me "Oyvind"
0 comments:
Post a Comment