Discussion:
use of ;; as terminator, request for grammar help
Eric Blake
2014-04-03 16:18:54 UTC
Permalink
Hello GNU awk readers,

On today's Austin Group call (the people in charge of POSIX), we visited
http://austingroupbugs.net/view.php?id=226.

This is in regards to the POSIX awk specification at:
http://pubs.opengroup.org/onlinepubs/9699919799/utilities/awk.html

Among other things, there were two action items pointed out that this
list might be able to help with:

1. GNU awk has a bug regarding ;; as a terminator. The POSIX grammar
allows for:
awk '{print};;{print}'
but gawk rejects this case. This was deemed to be a bug in gawk, since
POSIX was based on the nawk behavior at the time POSIX was standardized,
and nawk has always supported this. Remember, the grammar specified in
terminator : terminator ';'
| terminator NEWLINE
| ';'
| NEWLINE
;
which allows two ';' in a row, and nothing else in the normative text
mentions that an empty statement must be rejected.

2. Based on existing implementations, there is consensus that the POSIX
grammar is overly restrictive, and that we should change it to permit:
awk '{print} {print}'
and:
awk '/foo/; {print}'

since existing implementations all support it. But to do that, we need
someone with help in writing grammars to propose the changes to the one
appearing on the POSIX page. Any input would be appreciated.
--
Eric Blake eblake redhat com +1-919-301-3266
Libvirt virtualization library http://libvirt.org
Richard Hansen
2014-04-11 22:38:33 UTC
Permalink
Post by Eric Blake
2. Based on existing implementations, there is consensus that the POSIX
awk '{print} {print}'
awk '/foo/; {print}'
since existing implementations all support it. But to do that, we need
someone with help in writing grammars to propose the changes to the one
appearing on the POSIX page. Any input would be appreciated.
Attached are two files:
* awk-posix.y: the current POSIX awk grammar (Issue 7 2013 Edition)
* awk-proposed.y: my attempt at modifying the grammar to accept the
above examples and to reject actionless BEGIN and END patterns (to
match the normative text)

For convenience you will find a unified diff of the two files below.

I am not an expert in either awk or yacc, so reviews would be appreciated.

Thanks,
Richard


diff --git a/awk.y b/awk.y
index b12ecd9..21f7357 100644
--- a/awk.y
+++ b/awk.y
@@ -49,23 +49,18 @@


program : item_list
- | actionless_item_list
+ | item_list actionless_item
;


-item_list : newline_opt
- | actionless_item_list item terminator
- | item_list item terminator
- | item_list action terminator
+item_list : terminator_opt
+ | item_list item terminator_opt
+ | item_list actionless_item terminator
;


-actionless_item_list : item_list pattern terminator
- | actionless_item_list pattern terminator
- ;
-
-
-item : pattern action
+item : action
+ | pattern action
| Function NAME '(' param_list_opt ')'
newline_opt action
| Function FUNC_NAME '(' param_list_opt ')'
@@ -73,6 +68,10 @@ item : pattern action
;


+actionless_item : normal_pattern
+ ;
+
+
param_list_opt : /* empty */
| param_list
;
@@ -83,13 +82,20 @@ param_list : NAME
;


-pattern : Begin
- | End
- | expr
+pattern : normal_pattern
+ | special_pattern
+ ;
+
+normal_pattern : expr
| expr ',' newline_opt expr
;


+special_pattern : Begin
+ | End
+ ;
+
+
action : '{' newline_opt '}'
| '{' newline_opt terminated_statement_list '}'
| '{' newline_opt unterminated_statement_list '}'
@@ -103,6 +109,11 @@ terminator : terminator ';'
;


+terminator_opt : /* empty */
+ | terminator
+ ;
+
+
terminated_statement_list : terminated_statement
| terminated_statement_list terminated_statement
;
Aharon Robbins
2014-04-17 08:29:55 UTC
Permalink
Hi Eric and Austin Group folks,

I apologize for the delay in replying. Real Life(tm) gets in the way
of these things.

I am cc'ing Brian Kernighan for his opinion on these issues as well.
Date: Thu, 03 Apr 2014 10:18:54 -0600
Subject: [bug-gawk] use of ;; as terminator, request for grammar help
Hello GNU awk readers,
On today's Austin Group call (the people in charge of POSIX), we visited
http://austingroupbugs.net/view.php?id=226.
http://pubs.opengroup.org/onlinepubs/9699919799/utilities/awk.html
Among other things, there were two action items pointed out that this
1. GNU awk has a bug regarding ;; as a terminator. The POSIX grammar
awk '{print};;{print}'
but gawk rejects this case. This was deemed to be a bug in gawk, since
POSIX was based on the nawk behavior at the time POSIX was standardized,
and nawk has always supported this.
I'm not convinced this is a real bug. In particular, accidents of the
Unix awk implementation should not necessarily be formally codified
in the standard. mawk, which was written based on the 1988 awk book,
also does not support this.

If there are awk programs that use this, they should best be changed to
have only one ';', in my humble opinion; there's no real added value
to codifying this into the language.
2. Based on existing implementations, there is consensus that the POSIX
awk '{print} {print}'
awk '/foo/; {print}'
since existing implementations all support it. But to do that, we need
someone with help in writing grammars to propose the changes to the one
appearing on the POSIX page. Any input would be appreciated.
I disagree with the first desired change. The ground I'm standing on here is
firmer. The 1988 awk book disallowed rules without any separators, on the
grounds that rules and statements within them should be syntactically
consistent (a semicolon is required when multiple Xs [rules or statments] appear
on one line). And the very early released versions of nawk in fact enforced
this rule. (I remember testing against it.)

Later on, after the awk book, Brian changed his awk. If you look at his FIXES
file, you will see:

Nov 27, 1988:
With fear and trembling, modified the grammar to permit
multiple pattern-action statements on one line without
an explicit separator. By definition, this capitulation
to the ghost of ancient implementations remains undefined
and thus subject to change without notice or apology.
DO NOT COUNT ON IT.

The sentiment here is quite clear - while it might work, it should
not be formalized.

The gawk documentation follows this example, documenting clearly that
a semicolon is required between multiple rules on one line, and NOT
documenting that it can be left off. I do not plan to change this, either.

The second change (awk '/foo/; { print }') should be supported by the POSIX
grammar, since that is clearly two different rules.

As an aside, there are one or two other areas where gawk implements
undocumented (= unspecified) behavior for compatibility with Unix awk,
but those remain purposely undocumented in the gawk manual; the case
I'm thinking about even has this comment in the code:

/*
* A simple_stmt exists to satisfy a constraint in the POSIX
* grammar allowing them to occur as the 1st and 3rd parts
* in a `for (...;...;...)' loop. This is a historical oddity
* inherited from Unix awk, not at all documented in the AK&W
* awk book. We support it, as this was reported as a bug.
* We don't bother to document it though. So there.
*/

In my humble opinion, the ';;' issue is so trivial that it's not even worth
the effort I put in for simple statements in for loops.

I hope all this helps. Further discussion is welcome.

Arnold
Andrew Josey
2014-04-17 13:14:47 UTC
Permalink
hi,
This did not make it to the list ….
regards
Andrew
Subject: Re: [bug-gawk] use of ;; as terminator, request for grammar help
Date: 17 April 2014 09:45:07 BST
X-Diagnostic: Not on the accept list
Hi Eric and Austin Group folks,
I apologize for the delay in replying. Real Life(tm) gets in the way
of these things.
I am cc'ing Brian Kernighan for his opinion on these issues as well.
Date: Thu, 03 Apr 2014 10:18:54 -0600
Subject: [bug-gawk] use of ;; as terminator, request for grammar help
Hello GNU awk readers,
On today's Austin Group call (the people in charge of POSIX), we visited
http://austingroupbugs.net/view.php?id=226.
http://pubs.opengroup.org/onlinepubs/9699919799/utilities/awk.html
Among other things, there were two action items pointed out that this
1. GNU awk has a bug regarding ;; as a terminator. The POSIX grammar
awk '{print};;{print}'
but gawk rejects this case. This was deemed to be a bug in gawk, since
POSIX was based on the nawk behavior at the time POSIX was standardized,
and nawk has always supported this.
I'm not convinced this is a real bug. In particular, accidents of the
Unix awk implementation should not necessarily be formally codified
in the standard. mawk, which was written based on the 1988 awk book,
also does not support this.
If there are awk programs that use this, they should best be changed to
have only one ';', in my humble opinion; there's no real added value
to codifying this into the language.
2. Based on existing implementations, there is consensus that the POSIX
awk '{print} {print}'
awk '/foo/; {print}'
since existing implementations all support it. But to do that, we need
someone with help in writing grammars to propose the changes to the one
appearing on the POSIX page. Any input would be appreciated.
I disagree with the first desired change. The ground I'm standing on here is
firmer. The 1988 awk book disallowed rules without any separators, on the
grounds that rules and statements within them should be syntactically
consistent (a semicolon is required when multiple Xs [rules or statments] appear
on one line). And the very early released versions of nawk in fact enforced
this rule. (I remember testing against it.)
Later on, after the awk book, Brian changed his awk. If you look at his FIXES
With fear and trembling, modified the grammar to permit
multiple pattern-action statements on one line without
an explicit separator. By definition, this capitulation
to the ghost of ancient implementations remains undefined
and thus subject to change without notice or apology.
DO NOT COUNT ON IT.
The sentiment here is quite clear - while it might work, it should
not be formalized.
The gawk documentation follows this example, documenting clearly that
a semicolon is required between multiple rules on one line, and NOT
documenting that it can be left off. I do not plan to change this, either.
The second change (awk '/foo/; { print }') should be supported by the POSIX
grammar, since that is clearly two different rules.
As an aside, there are one or two other areas where gawk implements
undocumented (= unspecified) behavior for compatibility with Unix awk,
but those remain purposely undocumented in the gawk manual; the case
/*
* A simple_stmt exists to satisfy a constraint in the POSIX
* grammar allowing them to occur as the 1st and 3rd parts
* in a `for (...;...;...)' loop. This is a historical oddity
* inherited from Unix awk, not at all documented in the AK&W
* awk book. We support it, as this was reported as a bug.
* We don't bother to document it though. So there.
*/
In my humble opinion, the ';;' issue is so trivial that it's not even worth
the effort I put in for simple statements in for loops.
I hope all this helps. Further discussion is welcome.
Arnold
--------
Andrew Josey The Open Group
Austin Group Chair Apex Plaza, Forbury Road
Email: a.josey-7882/***@public.gmane.org Reading,Berks.RG1 1AX,England
Tel:+44 118 9023044 US fax: +1 415 276 3760
Mobile:+44 774 015 5794 UK fax: +44 870 131 0418
Brian Kernighan
2014-04-18 13:03:51 UTC
Permalink
Hi, all --

Arnold kindly linked me in on this conversation, since we talk about
compatibility issues regularly.

Should multiple semicolons should be legal between pattern-action
statements? They are legal in my current version of Awk, but it's
entirely an artifact of implementation; I'm pretty sure that Al and
Peter and I would never have written an Awk program to use that
flexibility. And it seems unlikely that typical Awk programmers
would write code that way either; one semicolon seems like just the
right number.

Should one semicolon be required between pattern-action statements? I
agree strongly with Arnold on this one: yes. The language is already
entirely too sloppy in how it uses adjacency to mean something, and
adding more cases seems like a bad idea. The explicit semicolon between
p-a statements is consistent with the action language, and makes it
clear what's going on when code is written on a single line (as in a
short command-line sequence). The FIXES note from 1988 makes it clear
that we were very uneasy about allowing an optional separator at the
time; if I were faced with the same decision today, I would require a
single semicolon.

Hope this helps your deliberations a bit. Thanks for all your good work
on the standarization effort.

Brian
Post by Aharon Robbins
Hi Eric and Austin Group folks,
I apologize for the delay in replying. Real Life(tm) gets in the way
of these things.
I am cc'ing Brian Kernighan for his opinion on these issues as well.
Date: Thu, 03 Apr 2014 10:18:54 -0600
Subject: [bug-gawk] use of ;; as terminator, request for grammar help
Hello GNU awk readers,
On today's Austin Group call (the people in charge of POSIX), we visited
http://austingroupbugs.net/view.php?id=226.
http://pubs.opengroup.org/onlinepubs/9699919799/utilities/awk.html
Among other things, there were two action items pointed out that this
1. GNU awk has a bug regarding ;; as a terminator. The POSIX grammar
awk '{print};;{print}'
but gawk rejects this case. This was deemed to be a bug in gawk, since
POSIX was based on the nawk behavior at the time POSIX was standardized,
and nawk has always supported this.
I'm not convinced this is a real bug. In particular, accidents of the
Unix awk implementation should not necessarily be formally codified
in the standard. mawk, which was written based on the 1988 awk book,
also does not support this.
If there are awk programs that use this, they should best be changed to
have only one ';', in my humble opinion; there's no real added value
to codifying this into the language.
2. Based on existing implementations, there is consensus that the POSIX
awk '{print} {print}'
awk '/foo/; {print}'
since existing implementations all support it. But to do that, we need
someone with help in writing grammars to propose the changes to the one
appearing on the POSIX page. Any input would be appreciated.
I disagree with the first desired change. The ground I'm standing on here is
firmer. The 1988 awk book disallowed rules without any separators, on the
grounds that rules and statements within them should be syntactically
consistent (a semicolon is required when multiple Xs [rules or statments] appear
on one line). And the very early released versions of nawk in fact enforced
this rule. (I remember testing against it.)
Later on, after the awk book, Brian changed his awk. If you look at his FIXES
With fear and trembling, modified the grammar to permit
multiple pattern-action statements on one line without
an explicit separator. By definition, this capitulation
to the ghost of ancient implementations remains undefined
and thus subject to change without notice or apology.
DO NOT COUNT ON IT.
The sentiment here is quite clear - while it might work, it should
not be formalized.
The gawk documentation follows this example, documenting clearly that
a semicolon is required between multiple rules on one line, and NOT
documenting that it can be left off. I do not plan to change this, either.
The second change (awk '/foo/; { print }') should be supported by the POSIX
grammar, since that is clearly two different rules.
As an aside, there are one or two other areas where gawk implements
undocumented (= unspecified) behavior for compatibility with Unix awk,
but those remain purposely undocumented in the gawk manual; the case
/*
* A simple_stmt exists to satisfy a constraint in the POSIX
* grammar allowing them to occur as the 1st and 3rd parts
* in a `for (...;...;...)' loop. This is a historical oddity
* inherited from Unix awk, not at all documented in the AK&W
* awk book. We support it, as this was reported as a bug.
* We don't bother to document it though. So there.
*/
In my humble opinion, the ';;' issue is so trivial that it's not even worth
the effort I put in for simple statements in for loops.
I hope all this helps. Further discussion is welcome.
Arnold
Loading...