[libre-riscv-dev] [Bug 280] POWER spec parser needs to support fall-through cases

Sun Apr 5 22:43:12 BST 2020

http://bugs.libre-riscv.org/show_bug.cgi?id=280

--- Comment #4 from Jacob Lifshay <programmerjake at gmail.com> ---
(In reply to Luke Kenneth Casson Leighton from comment #3)
> (In reply to Jacob Lifshay from comment #2)
> > (In reply to Luke Kenneth Casson Leighton from comment #0)
> > > this is very tricky to get working at the *lexer* level and still support
> > > whitespace indentation.
> > > 
> > > switch (n)
> > >     case(1): x <- 5
> > >     case(2): # here
> > >     case(3):
> > >         x <- 3
> > >     default:
> > >         x <- 9
> > > print (5)
> > 
> > How about switching the grammar to parse a case-sequence instead of a single
> > case, that way multiple cases before a statement block would be correctly
> > handled.
> 
> annoyingly, the more changes that are made to the grammar, the
> less "like the spec" the grammar becomes, with the implication
> that further manual intervention stages are required when it
> comes to verifying against the 3.0B spec and, in future, against
> 3.0C and other releases.

Um, I don't think the pseudocode in the Power spec was ever supposed to be
Python, so I have no issues with changing the grammar file to more accurately
match the spec pdf even if the grammar doesn't match Python as closely.

> i've made quite a few minor changes, some of them necessary (to support
> syntax such as CR0[x] rather than CR0_subscript_x, some of them, well,
> not being lazy, just "trying to get it working fast"
> 
> yet-another-preprocessor-stage - even if it is line-based rather than
> ply-lexer-based, looking for 2 "case" statements (or case followed by
> default) which are lined up, and inserting the ghost word "fallthrough"
> would "do the trick"

I think having the grammar correctly reflect the actual syntax that is used is
probably a better way to go than adding preprossing cludges to add
`fallthrough` everywhere.

The space-counting done by the lexer translates the spaces into the INDENT and
DEDENT tokens.

The following algorithm should work to translate lines:

def lexer(lines):
    """ lines is a iterator that returns each line
        of text without the trailing newline """

    indent_depth_stack = [0]
    for line in lines:
        # assume we don't have to worry about tabs in string literals
        expanded_line = line.expandtabs()
        line = expanded_line.lstrip()
        # count indent depth
        depth = len(expanded_line) - len(line)
        if line == "" or line[0] == "#":
            # empty lines don't have to match depth
            # don't yield repeated NL tokens
            continue
        if depth > indent_depth_stack.top():
            yield INDENT
            indent_depth_stack.append(depth)
        else:
            while depth < indent_depth_stack[-1]:
                yield DEDENT
                indent_depth_stack.pop()
            if depth > indent_depth_stack[-1]:
                raise IndentDepthMismatch("indent depth doesn't match!")
        yield from tokenize(line)
        yield NL
    while indent_depth_stack[-1] != 0:
        yield DEDENT
        indent_depth_stack.pop()

-- 
You are receiving this mail because:
You are on the CC list for the bug.