Discussion:
Use of LC_GLOBAL_LOCALE with *_l() functions
Geoff Clare
2010-07-08 11:18:06 UTC
Permalink
It's not clear to me whether the standard allows LC_GLOBAL_LOCALE
to passed to the *_l() functions (either directly or by using the
return value from uselocale(NULL)).

On the locale.h page it says:

The <locale.h> header shall define LC_GLOBAL_LOCALE, a special
locale object descriptor used by the uselocale() function.

This implies that LC_GLOBAL_LOCALE should not be passed directly
to the *_l() functions, only to uselocale(), but that still leaves
the possibility of passing the return value of uselocale(NULL) to
a *_l() function when the value happens to be LC_GLOBAL_LOCALE.

If applications are not supposed to do that, I would expect to
see statements on all the *_l() pages saying that if the locale
object has the value LC_GLOBAL_LOCALE the behaviour is undefined,
like there is on the newlocale() page.

Regardless of what the original intention was, I dislike forbidding
applications from doing this. Why should every application that
uses *_l() functions with a locale_t obtained from uselocale(NULL)
have to do things like:

if (locale == LC_GLOBAL_LOCALE)
... call isspace() ...
else
... call isspace_l() ...

everywhere, when the implementation could simply handle this inside
the *_l() functions?
--
Geoff Clare <g.clare-7882/***@public.gmane.org>
The Open Group, Apex Plaza, Forbury Road, Reading, RG1 1AX, England
Ulrich Drepper
2010-07-08 14:01:46 UTC
Permalink
Post by Geoff Clare
Regardless of what the original intention was, I dislike forbidding
applications from doing this. Why should every application that
uses *_l() functions with a locale_t obtained from uselocale(NULL)
if (locale == LC_GLOBAL_LOCALE)
... call isspace() ...
else
... call isspace_l() ...
everywhere, when the implementation could simply handle this inside
the *_l() functions?
Because this would slow down code using these interfaces in all
situations by factors of 10 or more. The is*() interfaces are always
assumed to be direct memory accesses. This is how programs are written,
the interfaces are used in inner loops. If LC_GLOBAL_LOCALE would be
used this wouldn't (in general) be possible.

You're also making up a story here. I have never seen the need for code
like the above. Either your entire code block wants to use a specific,
user-provided locale or you use the currently set locale. You don't
conditionalize very call.

I also don't agree that anything has to be changed in the text. The
definition of LC_GLOBAL_LOCALE clearly says (p 284):

The <locale.h> header shall define LC_GLOBAL_LOCALE, a special
locale object descriptor used by the uselocale() function.

How much clearer does it have to be?


- --
➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖
Geoff Clare
2010-07-08 14:44:32 UTC
Permalink
Post by Ulrich Drepper
Post by Geoff Clare
Regardless of what the original intention was, I dislike forbidding
applications from doing this. Why should every application that
uses *_l() functions with a locale_t obtained from uselocale(NULL)
if (locale == LC_GLOBAL_LOCALE)
... call isspace() ...
else
... call isspace_l() ...
everywhere, when the implementation could simply handle this inside
the *_l() functions?
Because this would slow down code using these interfaces in all
situations by factors of 10 or more. The is*() interfaces are always
assumed to be direct memory accesses. This is how programs are written,
the interfaces are used in inner loops. If LC_GLOBAL_LOCALE would be
used this wouldn't (in general) be possible.
Yes, I can see that it would have an effect on speed, although a
factor of 10 is surprising.
Post by Ulrich Drepper
You're also making up a story here. I have never seen the need for code
like the above. Either your entire code block wants to use a specific,
user-provided locale or you use the currently set locale. You don't
conditionalize very call.
I would have thought a likely application design would be to have a
bunch of functions that do locale-dependent things and take a locale_t
argument to tell them which locale to use. These might then be called
from some places with the locale_t returned by uselocale(NULL) and in
others with a locale_t obtained from newlocale(). (Perhaps not in one
program, but there might be multiple programs that use those same
functions from a library.)
Post by Ulrich Drepper
I also don't agree that anything has to be changed in the text. The
The <locale.h> header shall define LC_GLOBAL_LOCALE, a special
locale object descriptor used by the uselocale() function.
How much clearer does it have to be?
The problem is that uselocale() can also return LC_GLOBAL_LOCALE.
Thus applications can get that value in a locale_t without
actually having _used_ LC_GLOBAL_LOCALE.

It is not clear that applications can't just pass the value they
get from uselocale() to *_l() function without checking that the
value is not LC_GLOBAL_LOCALE.
--
Geoff Clare <g.clare-7882/***@public.gmane.org>
The Open Group, Apex Plaza, Forbury Road, Reading, RG1 1AX, England
Glenn Fowler
2010-07-19 15:34:40 UTC
Permalink
I agree with Geoff's scenario

applications and library providers could work around the LC_GLOBAL_LOCALE
anomaly if there were a way to convert LC_GLOBAL_LOCALE to a usable locale_t

is there a way?

I didn't see one, but
usable_lc_global_locale = duplocale(0)
or
usable_lc_global_locale = duplocale(LC_GLOBAL_LOCALE)
would seem a likely place

-- Glenn Fowler -- at&t Research, Florham Park NJ --
Post by Geoff Clare
Post by Ulrich Drepper
Post by Geoff Clare
Regardless of what the original intention was, I dislike forbidding
applications from doing this. Why should every application that
uses *_l() functions with a locale_t obtained from uselocale(NULL)
if (locale == LC_GLOBAL_LOCALE)
... call isspace() ...
else
... call isspace_l() ...
everywhere, when the implementation could simply handle this inside
the *_l() functions?
Because this would slow down code using these interfaces in all
situations by factors of 10 or more. The is*() interfaces are always
assumed to be direct memory accesses. This is how programs are written,
the interfaces are used in inner loops. If LC_GLOBAL_LOCALE would be
used this wouldn't (in general) be possible.
Yes, I can see that it would have an effect on speed, although a
factor of 10 is surprising.
Post by Ulrich Drepper
You're also making up a story here. I have never seen the need for code
like the above. Either your entire code block wants to use a specific,
user-provided locale or you use the currently set locale. You don't
conditionalize very call.
I would have thought a likely application design would be to have a
bunch of functions that do locale-dependent things and take a locale_t
argument to tell them which locale to use. These might then be called
from some places with the locale_t returned by uselocale(NULL) and in
others with a locale_t obtained from newlocale(). (Perhaps not in one
program, but there might be multiple programs that use those same
functions from a library.)
Post by Ulrich Drepper
I also don't agree that anything has to be changed in the text. The
The <locale.h> header shall define LC_GLOBAL_LOCALE, a special
locale object descriptor used by the uselocale() function.
How much clearer does it have to be?
The problem is that uselocale() can also return LC_GLOBAL_LOCALE.
Thus applications can get that value in a locale_t without
actually having _used_ LC_GLOBAL_LOCALE.
It is not clear that applications can't just pass the value they
get from uselocale() to *_l() function without checking that the
value is not LC_GLOBAL_LOCALE.
--
The Open Group, Apex Plaza, Forbury Road, Reading, RG1 1AX, England
Geoff Clare
2010-07-19 16:44:51 UTC
Permalink
Post by Glenn Fowler
applications and library providers could work around the LC_GLOBAL_LOCALE
anomaly if there were a way to convert LC_GLOBAL_LOCALE to a usable locale_t
is there a way?
This ought to work, although it's rather cumbersome:

currloc = setlocale(LC_CTYPE, NULL);
locale = newlocale(LC_CTYPE_MASK, currloc, 0);
currloc = setlocale(LC_COLLATE, NULL);
locale = newlocale(LC_COLLATE_MASK, currloc, locale);
... repeat for the other categories ...

and I don't think the standard guarantees that what is returned
by setlocale() is a locale name acceptable to newlocale().
(The value returned by setlocale(LC_ALL, NULL) is certainly not
required to be acceptable since it has to encode multiple locale
names.)
Post by Glenn Fowler
I didn't see one, but
usable_lc_global_locale = duplocale(0)
or
usable_lc_global_locale = duplocale(LC_GLOBAL_LOCALE)
would seem a likely place
Using duplocale() would be much simpler than all those setlocale()
and newlocale() calls.
--
Geoff Clare <g.clare-7882/***@public.gmane.org>
The Open Group, Apex Plaza, Forbury Road, Reading, RG1 1AX, England
Glenn Fowler
2010-07-19 20:03:29 UTC
Permalink
Post by Geoff Clare
Using duplocale() would be much simpler than all those setlocale()
and newlocale() calls.
since an implementation may have extensions with new categories
there should be a way to dup a locale and make modifications
without specific mention of those extensions, otherwise a supposedly
portable library could end up clobbering any extensions in the duped locale

and it must be a mistake that a setlocale() category locale string could
be different from the newlocale() locale string
Post by Geoff Clare
Post by Glenn Fowler
applications and library providers could work around the LC_GLOBAL_LOCALE
anomaly if there were a way to convert LC_GLOBAL_LOCALE to a usable locale_t
is there a way?
currloc = setlocale(LC_CTYPE, NULL);
locale = newlocale(LC_CTYPE_MASK, currloc, 0);
currloc = setlocale(LC_COLLATE, NULL);
locale = newlocale(LC_COLLATE_MASK, currloc, locale);
... repeat for the other categories ...
and I don't think the standard guarantees that what is returned
by setlocale() is a locale name acceptable to newlocale().
(The value returned by setlocale(LC_ALL, NULL) is certainly not
required to be acceptable since it has to encode multiple locale
names.)
Post by Glenn Fowler
I didn't see one, but
usable_lc_global_locale = duplocale(0)
or
usable_lc_global_locale = duplocale(LC_GLOBAL_LOCALE)
would seem a likely place
Using duplocale() would be much simpler than all those setlocale()
and newlocale() calls.
Loading...