/wakamoleguy

Advent of Code 2022 - Day 4 Bonus: Parsing

In Day 4, I had a strong hunch that the parseRange function could be improved. For me, parsing tends to be one of the slowest parts of the Advent of Code challenges, and so I'm always looking for ways to build a better intuition for parsing in Haskell.

Here's the original:

parseRangePair :: GenParser Char u (Range, Range)
parseRangePair = do
  r1 <- parseRange
  _ <- char ','
  r2 <- parseRange
  return (r1, r2)
  where
    parseRange = do
      a <- read <$> many1 digit
      _ <- char '-'
      b <- read <$> many1 digit
      return $ Range a b

Parsec defines parser combinators that can be used to compose powerful grammars. My original code does not use much composition, so maybe I'm doing something wrong!

Use Your Words, Human

One trick that helps in these situations is to try describing what we want in plain language. (This apparently works with computers, too.)

We want:

  • Two Ranges, separated by a comma
  • Each Range is two Ints, separated by a hyphen

Following this, we actually come across the sepBy and sepBy1 combinators:

parseRanges :: GenParser Char u [Range]
parseRanges = parseRange `sepBy1` (char ',')
  where parseRange = ...

While this looks promising, it doesn't compile. sepBy1 parses a list. Our Range constructor expects exactly 2 Ints, and our part1, part2 functions expect exactly 2 ranges. Let's go a different way, but I'll remember sepBy in later challenges.

Do or Do Not

Using do notation is resulting in verbose code. We have to bind the separator characters just to discard them, and we need a separate line just to return the final result.

Monads, Applicatives, and Functors are often described as being values wrapped in some abstract 'box'. In our last Day 3 bonus post, our box was functions; we wanted to compose values that still needed a String passed to them. Now we want to compose values that haven't yet been parsed. They are still in the Parser box, if you will.

We can "lift" our combining functions to work on boxed values. To discard the separator in between, we can use the operator (>>). This operator is like bind (>>=) except that it discards the first first result value.

Putting it all together, we have:

parseRangePair :: GenParser Char u (Range, Range)
parseRangePair =
  let parseInt = read <$> many1 digit
      parseRange = liftA2 Range parseInt (char '-' >> parseInt)
   in liftA2 (,) parseRange (char ',' >> parseRange)

Further refactoring?

Not right now, thank you. parseInt definitely belongs in my Input module, and the internal structure of parseRange and parseRangePair could also be useful to reuse. But that's a problem for a future day. Maybe even tomorrow!

Advent of Code 2022 Series

This post is part of a series describing my Haskell solutions to Advent of Code 2022.

Main Post: Day 4: Camp Cleanup

Cheers!