Matthew Dempsky
2014-07-17 21:14:20 UTC
In just the past week, I've been pulled into two separate discussions
that hinged on how POSIX behaves under C11/C++11's memory model, both
of which ended somewhat unsatisfactorily by having to make
"reasonable" extrapolations from the current definitions.
I seem to recall reading somewhere that the plan is for a future
version of POSIX to align with C11, and it also seems like that will
require defining interactions between various POSIX functionality and
C11's atomic primitives. If so, I think there would be benefits to
starting on that process now so that implementations and applications
can start preparing for that future POSIX version.
What do others think? And if this is worth doing, what would be the
proper way to proceed? E.g., just discuss on this mailing list, or
maybe try to setup a subgroup to focus on this work area, or something
else altogether?
Some concrete examples of issues that have been brought up:
1. If one thread (successfully) calls mmap() and then passes the
return value pointer to another thread via relaxed atomic store/load
operations, what guarantees does the second thread have (if any) about
accessing the newly mapped memory? I reasoned that mmap() should be
thought of as a non-atomic memory write operation to the affected
pages, so the allocating thread needs to use a store-release operation
(or stronger), and the accessing thread needs a load-consume operation
(or stronger), for the second thread to safely access the mapped
memory. However, I can also imagine mmap() might guarantee that when
it has returned, the newly mapped pages (and any implied memory
initialization) must be globally visible to the process.
[This came up because LLVM's ThreadSanitizer uses the first
interpretation, whereas TCMalloc in some cases may mmap() some memory
and then communicate it to other threads via relaxed atomics assuming
the second interpretation.]
2. If a multi-threaded application calls fork(), what affordances are
allowed when reasoning about the state of the state of the child
process's address space? I'd reason that any private objects (e.g.,
objects in memory mmap()'d with MAP_PRIVATE) that are being
concurrently non-atomically modified when fork() is called will be
left in an unspecified state in the child process; but still those
objects will now refer to new memory locations, so there should be no
"conflict" (per C11 definition) by simply storing to them in the child
process.
[This came up because LibreSSL portable uses a pthread_atfork() hook
to mark its random number generator state as requiring a re-seed in
child processes, and arguably this could be seen as a data race
because the object being written to in the child handler might be
concurrently written to in the parent process when fork() was
invoked.]
3. If an application has concurrent calls to fork() and
pthread_atfork(), are memory operations performed prior to the
pthread_atfork() call guaranteed to be visible within the context of
the atfork callbacks? Looking more closely now, it actually seems
that currently POSIX doesn't make *any* guarantees about concurrent
fork() and pthread_atfork() being safe, which seems surprising.
Perhaps worth fixing, but then pthread_atfork() is scheduled for
possible deprecation anyway.
[Again, came up because of LibreSSL portable.]
that hinged on how POSIX behaves under C11/C++11's memory model, both
of which ended somewhat unsatisfactorily by having to make
"reasonable" extrapolations from the current definitions.
I seem to recall reading somewhere that the plan is for a future
version of POSIX to align with C11, and it also seems like that will
require defining interactions between various POSIX functionality and
C11's atomic primitives. If so, I think there would be benefits to
starting on that process now so that implementations and applications
can start preparing for that future POSIX version.
What do others think? And if this is worth doing, what would be the
proper way to proceed? E.g., just discuss on this mailing list, or
maybe try to setup a subgroup to focus on this work area, or something
else altogether?
Some concrete examples of issues that have been brought up:
1. If one thread (successfully) calls mmap() and then passes the
return value pointer to another thread via relaxed atomic store/load
operations, what guarantees does the second thread have (if any) about
accessing the newly mapped memory? I reasoned that mmap() should be
thought of as a non-atomic memory write operation to the affected
pages, so the allocating thread needs to use a store-release operation
(or stronger), and the accessing thread needs a load-consume operation
(or stronger), for the second thread to safely access the mapped
memory. However, I can also imagine mmap() might guarantee that when
it has returned, the newly mapped pages (and any implied memory
initialization) must be globally visible to the process.
[This came up because LLVM's ThreadSanitizer uses the first
interpretation, whereas TCMalloc in some cases may mmap() some memory
and then communicate it to other threads via relaxed atomics assuming
the second interpretation.]
2. If a multi-threaded application calls fork(), what affordances are
allowed when reasoning about the state of the state of the child
process's address space? I'd reason that any private objects (e.g.,
objects in memory mmap()'d with MAP_PRIVATE) that are being
concurrently non-atomically modified when fork() is called will be
left in an unspecified state in the child process; but still those
objects will now refer to new memory locations, so there should be no
"conflict" (per C11 definition) by simply storing to them in the child
process.
[This came up because LibreSSL portable uses a pthread_atfork() hook
to mark its random number generator state as requiring a re-seed in
child processes, and arguably this could be seen as a data race
because the object being written to in the child handler might be
concurrently written to in the parent process when fork() was
invoked.]
3. If an application has concurrent calls to fork() and
pthread_atfork(), are memory operations performed prior to the
pthread_atfork() call guaranteed to be visible within the context of
the atfork callbacks? Looking more closely now, it actually seems
that currently POSIX doesn't make *any* guarantees about concurrent
fork() and pthread_atfork() being safe, which seems surprising.
Perhaps worth fixing, but then pthread_atfork() is scheduled for
possible deprecation anyway.
[Again, came up because of LibreSSL portable.]