11. Dealing with invalid input

So far we have been focusing on parsing valid user input. However, users of our parsers will make mistakes and we should help them finding the source of the problem. And we should make this process not too painful.

The major difficulty in error reporting is that we have no direct way of showing error messages to the user. The parsers are template metaprograms. When they detect that the input is invalid, they can make the compilation fail and the compiler (running the metaprogram) display an error message. What we can do is making those error messages short and contain all information about the parsing error. We should make it easy to find this information in whatever the compiler displays.

So let's try to parse some invalid expression and let's see what happens:

> exp_parser19::apply<BOOST_METAPARSE_STRING("hello")>::type
<< compilation error >>

You will get a lot (if you have seen error messages coming from template metaprograms you know: this is not a lot.) of error messages. Take a closer look. It contains this:

x__________________PARSING_FAILED__________________x<
  1, 1,
  boost::metaparse::v1::error::literal_expected<'('>
>

You can see a formatted version above. There are no line breaks in the real output. This is relatively easy to spot (thanks to the ____________ part) and contains answers to the main questions one has when parsing fails:

where is the error? It is column 1 in line 1 (inside BOOST_METAPARSE_STRING). This is the 1, 1 part.
what is the problem? literal_expected<'('>. This is a bit misleading, as it contains only a part of the problem. An open paren is not the only acceptable token here, a number would also be fine. This misleading error message is our fault: we (the parser authors) need to make the parsing errors more descriptive.

11.1. Improving the error messages

So how can we improve the error messages? Let's look at what went wrong in the previous case:

The input was hello.
plus_exp2 tried to parse it.
plus_exp2 tried to parse it using mult_exp5 (assuming that this is the initial mult_exp in the list of + / - separated mult_exps).
so mult_exp5 tried to parse it.
mult_exp5 tried to parse it using unary_exp2 (assuming that this is the initial unary_exp in the list of * / / separated unary_exps).
so unary_exp2 tried to parse it.
unary_exp2 parsed all of the - symbols using minus_token. There were none of them (the input started with an h character).
so unary_exp2 tried to parse it using primary_exp2.
primary_exp2 is: one_of<int_token, paren_exp2>. It tried parsing the input with int_token (which failed) and then with paren_exp2 (which failed as well). So one_of could not parse the input with any of the choices and therefore it failed as well. In such situations one_of checks which parser made the most progress (consumed the most characters of the input) before failing and assumes, that that is the parser the user intended to use, thus it returns the error message coming from that parser. In this example none of the parsers could make any progress, in which case one_of returns the error coming from the last parser in the list. This was paren_exp2, and it expects the expression to start with an open paren. This is where the error message came from. The rest of the layers did not change or improve this error message so this was the error message displayed to the user.

We, the parser authors know: we expect a primary expression there. When one_of fails, it means that none was found.