Skip to content

ungron: accept tabs as whitespace alongside spaces#132

Open
ChrisJr404 wants to merge 1 commit into
tomnomnom:masterfrom
ChrisJr404:ungron-accept-tabs
Open

ungron: accept tabs as whitespace alongside spaces#132
ChrisJr404 wants to merge 1 commit into
tomnomnom:masterfrom
ChrisJr404:ungron-accept-tabs

Conversation

@ChrisJr404

Copy link
Copy Markdown

Closes #129.

The reporter wants ungron to handle input where statements are aligned with tabs, e.g. output from a tool that pads keys to a common column:

foo.bar    = "val1";
foo.hijklm = "val2";
foo.quxz   = "val3";

On master, ungronning that input fails — the tab between the path and = makes the lexer fall through lexStatement's switch into the typError default, because the dispatch only matches ' ' and '='.

This:

  • Adds '\t' to the lexStatement dispatch case so a path followed by a tab routes to lexValue (the same way a space-separated path does).
  • Changes the three l.acceptRun(" ") whitespace-skip calls in lexValue to l.acceptRun(" \t") so tabs around = and before ; are also consumed.
  • Updates the grammar comment from Path Space* "=" Space* to Path WS* "=" WS* and defines WS ::= ' ' | '\t'.
  • Adds two TestLex table cases: a tab-only "json\t=\t1;" and a mixed-whitespace "json.foo \t = \t \"bar\"\t;".

Verified ./script/test and a manual round-trip on the reporter's example:

$ printf 'foo.bar\t= "val1";\nfoo.hijklm\t= "val2";\nfoo.quxz\t= "val3";\n' | ./gron -u
{
  "foo": {
    "bar": "val1",
    "hijklm": "val2",
    "quxz": "val3"
  }
}

Issue tomnomnom#129 asks for ungron to accept input where the equals sign is
preceded (or surrounded) by tabs instead of plain spaces, so output from
tools that align statements like

    foo.bar    = "val1";
    foo.hijklm = "val2";
    foo.quxz   = "val3";

can be round-tripped.

Update the three acceptRun(" ") whitespace-skip points in lexValue to
also accept '\t', and update the lexStatement dispatch so a path
followed by a tab routes to lexValue (otherwise the lexer emits typError
on the tab before ever reaching lexValue). Update the grammar comment
to document that whitespace is "WS ::= ' ' | '\t'".

Add lex test cases for tab-only and mixed-whitespace inputs.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Let ungron speak tabs too, not just spaces

1 participant