Discussion:
date %N
Rob Landley
2014-05-22 12:26:43 UTC
Permalink
So I'm implementing a fresh set of posix commands
(http://landley.net/toybox/roadmap.html) and everything does nanoseconds
these days, and the gnu/dammit version has %N in the date format... but
they didn't extend strptime(). (Because why would you do a generic
solution when you can hack in a special case if you're the FSF?)

The fundamental problem seems to be there isn't a nanoseconds field in
struct tm. It's easy enough to append one, but if you then appended
something else in a future version it would break.

Just thought I'd mention it. I keep hitting fun little things like
wanting to make tr support UTF-8 but there's no way to get a count or
list of of isspace() or isalpha() entries for things like:

echo "one two three" | tr [:space:][:alpha:] abcdefghijklmnopqrstuvwxyz
zzzfzzzfzzzzz

Or the way Linux started using cpio heavily about the time you removed
it from the standard (it's the basis of RPM and initramfs) but _they_
use the version extended to 8 hex digits instead of 6 (which is still a
32 bit time_t but it's unsigned so won't blow up for most of a
century... Then again, you standardized "pax" which nobody uses, but not
"tar" which everybody does, so what posix has to say about archivers is
sort of a moot point really.)

And what _does_ happen to getline()'s lineptr field when you feed it
NULL requesting it allocate memory, but the read fails? Does it
reliably stay NULL or does it return a memory allocation you need to
free even in the failure case? (In theory it could even change the
pointer but free the memory itself so you DON'T free it for failure. I'd
say "no libc author is that sloppy" but dietlibc exists. Then again, I
don't care about supporting dietlibc...)

Rob

(Oh, speaking of date: I'm implementing 50 years back/forward from today
for 2 digit years, not from some magic date. I'm aware this is a
deviation from the standard, and am documenting it as such, but the
standard itself says it's going to change, so...)
Szabolcs Nagy
2014-05-23 10:24:44 UTC
Permalink
Post by Rob Landley
So I'm implementing a fresh set of posix commands
(http://landley.net/toybox/roadmap.html) and everything does nanoseconds
these days, and the gnu/dammit version has %N in the date format... but
they didn't extend strptime(). (Because why would you do a generic
solution when you can hack in a special case if you're the FSF?)
The fundamental problem seems to be there isn't a nanoseconds field in
struct tm. It's easy enough to append one, but if you then appended
something else in a future version it would break.
strptime seems to have underspecified behaviour in various cases so
nanoseconds is not the most interesting issue here

what if %C is negative and how should %C and %y be combined exactly?

is overflow ub or failure when a number with large field width is parsed?
(eg. "%999C" or "%999Y")

the F specifier is referenced, but that's not defined for strptime:
".. for any conversion specifier other than C, F, or Y."

what if contradicting information is parsed?

how to update tm struct fields for %W and %U?

there is no exception for 'implementation defined' specifiers, but such
extensions are widely used (eg glibc supports scanning %s, %z, %Z, %F)

and there is no way to parse timezone information or nanoseconds
Post by Rob Landley
(Oh, speaking of date: I'm implementing 50 years back/forward from today
for 2 digit years, not from some magic date. I'm aware this is a
deviation from the standard, and am documenting it as such, but the
standard itself says it's going to change, so...)
why would you do that?
0..49 / 50..99 split is not easier to code or use than a 0..68 / 69..99 one
Rob Landley
2014-05-23 13:28:19 UTC
Permalink
Post by Szabolcs Nagy
Post by Rob Landley
(Oh, speaking of date: I'm implementing 50 years back/forward from today
for 2 digit years, not from some magic date. I'm aware this is a
deviation from the standard, and am documenting it as such, but the
standard itself says it's going to change, so...)
why would you do that?
0..49 / 50..99 split is not easier to code or use than a 0..68 / 69..99 one
Why I would do it "from today"? Because in 2014 it would be 1964 to
2063. In 2015 it would be 1965 to 2064. Without me having to change the
code again.

Rob

P.S. Because negative integers go one farther than zero from positives,
so it's slightly less surprising to break the tie that way.
Dan Douglas
2014-05-23 14:05:39 UTC
Permalink
Post by Szabolcs Nagy
there is no exception for 'implementation defined' specifiers, but such
extensions are widely used (eg glibc supports scanning %s, %z, %Z, %F)
and there is no way to parse timezone information or nanoseconds
I would like to see both %N and %z in part because they are required to
produce and parse some valid RFC3339 and the other closely related formats
like w3cdtf.

Sadly there are already contradictory implementations. RFC3339 needs a
delimiter inserted before the timezone.

# AST date using the `%_z` modifier
$ ksh -c 'builtin date; date "+%Y-%m-%d %H:%M:%S%_z"'
2014-05-23 08:47:32-05:00

# GNU date failing at this, because it uses `%:z` instead
$ date "+%Y-%m-%d %H:%M:%S%_z"
2014-05-23 08:47:59 -500

# GNU
$ date "+%Y-%m-%d %H:%M:%S%:z"
2014-05-23 08:48:10-05:00

# AST (fail)
ksh -c 'builtin date; date "+%Y-%m-%d %H:%M:%S%:z"'
2014-05-23 08:49:12%:z

Granted I'm using date(1) to demonstrate and both these implementations have a
special --rfc3339 option just for this, but that doesn't help you with
strftime / strptime. I imagine C libraries are very inconsistent on these
extensions if they support them.
--
Dan Douglas
Joerg Schilling
2014-05-23 15:06:50 UTC
Permalink
Post by Dan Douglas
# AST (fail)
ksh -c 'builtin date; date "+%Y-%m-%d %H:%M:%S%:z"'
2014-05-23 08:49:12%:z
looks like a typo....

ksh93 -c 'builtin date; date "+%Y-%m-%d %H:%M:%S%z"'
2014-05-23 17:06:10+0200

Jörg
--
EMail:joerg-3Qm2Liu6aU2sY6utFDHCwYAplN+***@public.gmane.org (home) Jörg Schilling D-13353 Berlin
js-CFLBMwTPW48UNGrzBIF7/***@public.gmane.org (uni)
joerg.schilling-8LS2qeF34IpklNlQbfROjRvVK+***@public.gmane.org (work) Blog: http://schily.blogspot.com/
URL: http://cdrecord.berlios.de/private/ ftp://ftp.berlios.de/pub/schily
Dan Douglas
2014-05-23 15:43:02 UTC
Permalink
Post by Joerg Schilling
Post by Dan Douglas
# AST (fail)
ksh -c 'builtin date; date "+%Y-%m-%d %H:%M:%S%:z"'
2014-05-23 08:49:12%:z
looks like a typo....
ksh93 -c 'builtin date; date "+%Y-%m-%d %H:%M:%S%z"'
2014-05-23 17:06:10+0200
That was just to demonstrate that one supports only "_" and the other supports
only ":" as a modifier to %z to include the colon delimiter between the hours
and minutes of the local time offset.

According to the spec, all punctuation is required as it was designed to allow
using strcmp() in the C locale to sort date-times chronologically, so you
actually need that modifier to produce or parse a valid time offset.
--
Dan Douglas
Rich Felker
2014-05-24 15:51:25 UTC
Permalink
Post by Szabolcs Nagy
strptime seems to have underspecified behaviour in various cases so
nanoseconds is not the most interesting issue here
what if %C is negative and how should %C and %y be combined exactly?
As I read the spec now, it's as if the characters accepted by %C and
those accepted by %y are concatenated and processed by %Y.

Rich

Geoff Clare
2014-05-23 11:40:14 UTC
Permalink
Post by Rob Landley
So I'm implementing a fresh set of posix commands
(http://landley.net/toybox/roadmap.html) and everything does nanoseconds
these days, and the gnu/dammit version has %N in the date format... but
they didn't extend strptime(). (Because why would you do a generic
solution when you can hack in a special case if you're the FSF?)
The fundamental problem seems to be there isn't a nanoseconds field in
struct tm. It's easy enough to append one, but if you then appended
something else in a future version it would break.
Adding a nanoseconds field to struct tm makes no sense unless you also
add variants of localtime(), localtime_r(), gmtime(), gmtime_r() and
mktime() that operate on struct timespec instead of time_t.

The way to use strftime() with a struct timespec is to convert the
tv_sec to a struct tm and pass that to strftime() with a format that
ends in "%S", and then append the decimal point and nanoseconds (from
tv_nsec, formatted with "%09ld") to the string. Doing the inverse
with strptime() would mean finding the decimal point in the seconds
independently of strptime(), so it seems to me that what's lacking is
any information from strptime() about where in the input string it
matched up to (i.e. something like strtol()'s endptr parameter).
--
Geoff Clare <g.clare-7882/***@public.gmane.org>
The Open Group, Apex Plaza, Forbury Road, Reading, RG1 1AX, England
Szabolcs Nagy
2014-05-23 12:12:18 UTC
Permalink
Post by Geoff Clare
ends in "%S", and then append the decimal point and nanoseconds (from
tv_nsec, formatted with "%09ld") to the string. Doing the inverse
with strptime() would mean finding the decimal point in the seconds
independently of strptime(), so it seems to me that what's lacking is
any information from strptime() about where in the input string it
matched up to (i.e. something like strtol()'s endptr parameter).
strptime returns the position where the matching ends so that's fine
Rob Landley
2014-05-23 13:31:07 UTC
Permalink
Post by Szabolcs Nagy
Post by Geoff Clare
ends in "%S", and then append the decimal point and nanoseconds (from
tv_nsec, formatted with "%09ld") to the string. Doing the inverse
with strptime() would mean finding the decimal point in the seconds
independently of strptime(), so it seems to me that what's lacking is
any information from strptime() about where in the input string it
matched up to (i.e. something like strtol()'s endptr parameter).
strptime returns the position where the matching ends so that's fine
Which implies that if I feed the gnu/dammit version a literal "%N" in
the date string where the pattern has a %N, it would accept it and not
set the field?

Rob
Geoff Clare
2014-05-23 13:32:52 UTC
Permalink
Post by Szabolcs Nagy
Post by Geoff Clare
ends in "%S", and then append the decimal point and nanoseconds (from
tv_nsec, formatted with "%09ld") to the string. Doing the inverse
with strptime() would mean finding the decimal point in the seconds
independently of strptime(), so it seems to me that what's lacking is
any information from strptime() about where in the input string it
matched up to (i.e. something like strtol()'s endptr parameter).
strptime returns the position where the matching ends so that's fine
Doh! So it does. Shows how often I use it :-)
--
Geoff Clare <g.clare-7882/***@public.gmane.org>
The Open Group, Apex Plaza, Forbury Road, Reading, RG1 1AX, England
Loading...