[libre-riscv-dev] [Bug 280] POWER spec parser needs to support fall-through cases

bugzilla-daemon at libre-riscv.org bugzilla-daemon at libre-riscv.org
Sun Apr 5 23:12:26 BST 2020


http://bugs.libre-riscv.org/show_bug.cgi?id=280

--- Comment #5 from Luke Kenneth Casson Leighton <lkcl at lkcl.net> ---
(In reply to Jacob Lifshay from comment #4)

> Um, I don't think the pseudocode in the Power spec was ever supposed to be
> Python,

it's not, however it turns out to be a whitespace-indented one, probably
through expediency and a desire for compact clarity.

i *began* with GardenSnake.py and rapidly morphed it to match the PDF
pseudocode grammar.

it *outputs* python AST because that was what GardenSnake did, and it's worked
well.

however python lacks switch/case about which i always wondered "wth?" and it
makes sense now: case fall-through seriously messes with an already complex
three-pass process.

> so I have no issues with changing the grammar file to more
> accurately match the spec pdf even if the grammar doesn't match Python as
> closely.

the "hard" way is to try and mess with one of the existing stages.

the pseudo syntax does this:

do while test
    foo

by recognising the keyword "while" i managed to insert an extra token ":" when
the newline occurs.

this trick means that the tokens *look* like they were this:

do while test:
    foo

and of course that now looks exactly like python "while" and it was easy to
add.

i therefore have absolutely no problem messing with the token stream to "get
the desired outcome" :)


> > i've made quite a few minor changes, some of them necessary (to support
> > syntax such as CR0[x] rather than CR0_subscript_x, some of them, well,
> > not being lazy, just "trying to get it working fast"
> > 
> > yet-another-preprocessor-stage - even if it is line-based rather than
> > ply-lexer-based, looking for 2 "case" statements (or case followed by
> > default) which are lined up, and inserting the ghost word "fallthrough"
> > would "do the trick"
> 
> I think having the grammar correctly reflect the actual syntax that is used
> is probably a better way to go than adding preprossing cludges to add
> `fallthrough` everywhere.

i'd put it as an actual keyword, which is silently added by the lexer and
silently removed by the yacc parser, if it made any difference.

however by matching against NAME it makes life simpler and easier.

i think what is happening is that

case(5): somename

is matching against "small single line suite".

by then detecting "if len(statements) == 1 and statements[0] == "fallthrough""
that does the trick.

i know - it's awful.  

> The space-counting done by the lexer translates the spaces into the INDENT
> and DEDENT tokens.

yes.  the person who wrote GardenSnake.py copied the algorithm from the python
c interpreter and transliterated it to python ply lexer.

i had a loovely time getting it to recognise whitespace, properly.  changed the
actual lexer context to trick it into swapping fron whitespace-aware to
non-aware :)


> The following algorithm should work to translate lines:
> 
> def lexer(lines):
>     """ lines is a iterator that returns each line
>         of text without the trailing newline """

unfortunately, stripping the newlines removes vital information, particularly
when discerning small statements after a colon from "suites".

a suite is defined as the following tokens

NEWLINE INDENT stmts DEDENT

where the algorithm you wrote (minus NEWLINE stripping) identifies indent and
dedent

oh... er, 1 sec...

>         yield from tokenize(line)
>         yield NL

ah interesting.  so the NL *isn't* removed?

-- 
You are receiving this mail because:
You are on the CC list for the bug.


More information about the libre-riscv-dev mailing list