"predictable scheduling behavior" in pthread_cond

Discussion:

"predictable scheduling behavior" in pthread_cond_broadcast()

Ersek, Laszlo

2010-03-08 23:30:50 UTC

Dear List,

wrt. the discussion started under [0], I respectfully ask for help
interpreting the following statement from the pthread_cond_broadcast()
specification [1]:

----v----
if predictable scheduling behavior is required, then that mutex shall be
locked by the thread calling pthread_cond_broadcast() or
pthread_cond_signal()
----^----

What does "predictable scheduling behavior" mean? What scheduling
characteristics may not hold if the mutex is not locked at the time of
signalling or broadcasting the condition variable?

Thank you very much,
Laszlo Ersek

[0] http://groups.google.com/group/comp.programming.threads/browse_thread/thread/c045203eae78d6ff/2e6b78fd2d43ce68#2e6b78fd2d43ce68
[1] http://www.opengroup.org/onlinepubs/9699919799/functions/pthread_cond_broadcast.html

David Butenhof

2010-03-09 15:52:15 UTC

Permalink

Post by Ersek, Laszlo
Dear List,
wrt. the discussion started under [0], I respectfully ask for help
interpreting the following statement from the pthread_cond_broadcast()
----v----
if predictable scheduling behavior is required, then that mutex shall be
locked by the thread calling pthread_cond_broadcast() or
pthread_cond_signal()
----^----
What does "predictable scheduling behavior" mean? What scheduling
characteristics may not hold if the mutex is not locked at the time of
signalling or broadcasting the condition variable?
Thank you very much,
Laszlo Ersek
[0] http://groups.google.com/group/comp.programming.threads/browse_thread/thread/c045203eae78d6ff/2e6b78fd2d43ce68#2e6b78fd2d43ce68
[1] http://www.opengroup.org/onlinepubs/9699919799/functions/pthread_cond_broadcast.html

Basically, it was there due to some concerns between the real time and
threads "gangs" in the original 1003.4 working group that developed
POSIX threads.

The problem (from the realtime side) with condition variables is that if
you can signal/broadcast without holding the mutex, and any thread
currently running can acquire an unlocked mutex and check a predicate
without reference to the condition variable, then you can have an
indirect priority inversion.

That is, high priority consumer thread A blocks on a condition variable
because there are no entries on the queue; while lower priority consumer
thread B continues processing some request. Producer thread C now adds
an entry to the queue, unlocks the mutex, and signals the associated
condition variable to unblock the highest priority waiter. But before
the thread can be unblocked, scheduled, and contend for the mutex,
thread B completes its work, acquires the mutex (which has no waiters),
and removes the next work item. Higher priority thread A, when it is
scheduled, either finds the mutex still locked by thread B and needs to
reblock, or finds the mutex unlocked and "its" queue item gone. (On top
of this, it may have preempted thread B to find this out, so that no
work is getting done while we do the scheduling dance.) Even on a strict
realtime scheduling uniprocessor this is perfectly reasonable; there
might be a timeslice (e.g., producer and consumer B are SCHED_RR of
equal priority) between the unlock and the signal or broadcast; the
higher priority thread remains blocked and can't be considered by the
scheduler. (Many realtime designers would consider this bad design; and
in many ways it certainly is; but take it as just a simplified
conceptual sketch. ;-) )

Now, in "high performance thread" terms, this whole concept is generally
bad design; if the threads were symmetric and priority was associated
with the work (that is, higher priority work was simply at the head of
the queue to be handled by whatever consumer gets to it next), it
wouldn't matter that "the wrong thread" had the work item. (There is no
"wrong thread", only "wrong queued work"; and that can easily be managed
at the application level.) Still, we can imagine cases where this might
not meet the intent of the application designer. (Almost everyone would
agree "thread per client" is a bad model; yet it's amazingly pervasive
and generates an enormous amount of confusion and most of the poorly
performing code purporting to demonstrate that threads don't work.)

The quoted statement simply indicates that if the producer continues to
hold the mutex while waking consumer thread A, it will be able to run
and block on the mutex in priority order before consumer thread B can
acquire the mutex. (With wait morphing, the condition variable operation
immediately transfers the highest priority waiter to the mutex with
minimal overhead; but on any strict realtime implementation the act of
unblocking a high priority thread will immediately preempt a lower
priority thread and allow it to block on the mutex immediately.) Now,
consumer thread B may not contend for the mutex before the unlock, in
which case the "right thread" goes next, or it may contend anywhere
during this sequence and be sorted into the proper priority order along
with consumer thread A; in either case, the higher priority thread will
have the first chance at the queue.

That's what "predictable scheduling behavior" means in this context. In
more pragmatic terms, I'm pretty sure this means that someone on the
realtime side of the virtual aisle thought it was a bad idea to
recommend signaling "outside" the mutex, and the working group
compromised by adding that statement, which was sufficient to ward off
(or possibly to remove) a formal objection to the draft standard. This
is how a lot of the text in the standard grew.

Ersek, Laszlo

2010-03-09 19:00:46 UTC

Permalink

[...]

You replied here first and then forwarded it to comp.programming.threads,
I replied there first and now forward it here. I apologize in advance if
my followup is not topical here; I'm posting it for completeness, please
excuse the noise.

I really appreciate your extremely detailed and helpful guidance.

Thank you very much,
Laszlo Ersek

--------------------------------------------------

X-News: ludens comp.programming.threads:57694
From: lacos-***@public.gmane.org (Ersek, Laszlo)
Subject: Re: basic question about concurrency
Date: 9 Mar 2010 19:37:46 +0100
Message-ID: <***@ludens>

(Sorry for keeping a long context!)

In article

If you mean by "worst case" that no thread can advance until I release
the lock after broadcasting, and that at that point they will all
thunder on the mutex, while in the other case, the thundering herd is
smaller, because threads not trying to re-lock from within the wait
could advance as soon as I release the mutex, then I have to agree.

Exactly. And in some implementations, threads will wake up from the
condvar just to go to sleep on the mutex.

Thanks for making me aware of this. I still find that the
broadcast-within-section approach creates a "leveler playing field" for
all consumer threads, but I admit that this may cause bigger spikes, and
also that -- perhaps -- a level playing field in this sense is not that
important.

I don't think it even actually does that.

SUS' remark on scheduling behaviour, I prefer the form with IN
defined".
However, since the standard, under my perception, seems to associate a
positive tone (however vague) with broadcasting/signalling in the
----v----
if predictable scheduling behaviour is required, then that mutex is
locked by the thread calling pthread_cond_signal() or
pthread_cond_broadcast()
----^----
----v----
Actually, I have always wondered what that meant. I don't see how the
scheduling behavior is any more or less predictable in either case with
pthread_cond_broadcast.
----^----
Given that the latest version (v4) of the SUS continues to contain
equivalent normative wording and no related rationale, and that you seem
to be a member of the Austin CSRG, would you please consider bringing up
this topic on the list? You can certainly argue better than I ever could
that this passage is confusing and/or meaningless. Thanks.

[snip]
The statement comes from realtime concerns about the advice put in (by
the "threadie" side of the virtual aisle) that signaling or broadcasting
"outside" the mutex makes for more efficient parallel processing.

Thanks for describing the back-and-forth involved in the standardization
process.

[snip]
we (the sub working group that was
trying to push threads into POSIX)

Please accept my gratitude for pushing threads into SUSv2 / UNIX 98.

[snip]
the realtime aspects of POSIX threads were absolutely critical
to the initial success of the standard document and working group.

I think this is reasonable -- a standardized threads interface that
turns out to preclude realtime needs can be rightfully considered a
trap, in my opinion, one which defeats the very purpose of a standard
(ie. one has to look for a different standard, and the landscape of
implementations remains fragmented).

[snip]
Basically, it was there due to some concerns between the real time and
threads "gangs" in the original 1003.4 working group that developed
POSIX threads.
The problem (from the realtime side) with condition variables is that
if you can signal/broadcast without holding the mutex, and any thread
currently running can acquire an unlocked mutex and check a predicate
without reference to the condition variable, then you can have an
indirect priority inversion.
That is, high priority consumer thread A blocks on a condition variable
because there are no entries on the queue; while lower priority consumer
thread B continues processing some request. Producer thread C now adds
an entry to the queue, unlocks the mutex, and signals the associated
condition variable to unblock the highest priority waiter. But before
the thread can be unblocked, scheduled, and contend for the mutex,
thread B completes its work, acquires the mutex (which has no waiters),
and removes the next work item. Higher priority thread A, when it is
scheduled, either finds the mutex still locked by thread B and needs to
reblock, or finds the mutex unlocked and "its" queue item gone. (On top
of this, it may have preempted thread B to find this out, so that no
work is getting done while we do the scheduling dance.) Even on a strict
realtime scheduling uniprocessor this is perfectly reasonable; there
might be a timeslice (e.g., producer and consumer B are SCHED_RR of
equal priority) between the unlock and the signal or broadcast; the
higher priority thread remains blocked and can't be considered by the
scheduler. (Many realtime designers would consider this bad design; and
in many ways it certainly is; but take it as just a simplified
conceptual sketch. ;-) )

Outstanding! Thank you so much for explaining this!

Correct me if I'm wrong (and if you care what I think :)), but I think
this is the exact situation I described in

X-News: ludens comp.programming.threads:57685
From: lacos-***@public.gmane.org (Ersek, Laszlo)
Subject: Re: basic question about concurrency
Date: 7 Mar 2010 23:13:16 +0100
Message-ID: <***@ludens>

with the "small" exception that I didn't consider either realtime
scheduling or thread priorities at all. As it turns out, inhomogeneous
thread priorities play a crucial role here.

Now, in "high performance thread" terms, this whole concept is generally
bad design; if the threads were symmetric and priority was associated
with the work (that is, higher priority work was simply at the head of
the queue to be handled by whatever consumer gets to it next), it
wouldn't matter that "the wrong thread" had the work item. (There is no
"wrong thread", only "wrong queued work"; and that can easily be managed
at the application level.)

(I remark in parentheses that the MWD (multiple workers decompressor) of
lbzip2 has a work queue with two priority bands: decompressing
reconstructed blocks is urgent, while reconstructing blocks for
decompression is not so urgent. A worker thread picks up a
reconstruction (scanning) task or continues scanning into the head of
the next input chunk only if there is no decompression task available.)

[snip]
The quoted statement simply indicates that if the producer continues to
hold the mutex while waking consumer thread A, it will be able to run
and block on the mutex in priority order before consumer thread B can
acquire the mutex. (With wait morphing, the condition variable operation
immediately transfers the highest priority waiter to the mutex with
minimal overhead; but on any strict realtime implementation the act of
unblocking a high priority thread will immediately preempt a lower
priority thread and allow it to block on the mutex immediately.) Now,
consumer thread B may not contend for the mutex before the unlock, in
which case the "right thread" goes next, or it may contend anywhere
during this sequence and be sorted into the proper priority order along
with consumer thread A; in either case, the higher priority thread will
have the first chance at the queue.

Here you also prove that wait morphing is not essential in the (strict)
realtime case, ie. for when the quoted statement was intended in the
first place.

[snip]

Again, thank you very much for explaining this! Now I'll be able to
follow, in the future, David Schwartz' advice for the usual case of no
different priorities, but I'll also know about the important exception,
realtime scheduling with different thread priorities.

Cheers,
lacos