Advent of Code 2022 - Day 4 Bonus: Parsing
In Day 4, I had a strong hunch that the parseRange
function could be improved. For me, parsing tends to be one of the slowest parts of the Advent of Code challenges, and so I'm always looking for ways to build a better intuition for parsing in Haskell.
Here's the original:
parseRangePair :: GenParser Char u (Range, Range)
parseRangePair = do
r1 <- parseRange
_ <- char ','
r2 <- parseRange
return (r1, r2)
where
parseRange = do
a <- read <$> many1 digit
_ <- char '-'
b <- read <$> many1 digit
return $ Range a b
Parsec defines parser combinators that can be used to compose powerful grammars. My original code does not use much composition, so maybe I'm doing something wrong!
Use Your Words, Human
One trick that helps in these situations is to try describing what we want in plain language. (This apparently works with computers, too.)
We want:
- Two Ranges, separated by a comma
- Each Range is two Ints, separated by a hyphen
Following this, we actually come across the sepBy
and sepBy1
combinators:
parseRanges :: GenParser Char u [Range]
parseRanges = parseRange `sepBy1` (char ',')
where parseRange = ...
While this looks promising, it doesn't compile. sepBy1
parses a list. Our Range
constructor expects exactly 2 Ints, and our part1, part2
functions expect exactly 2 ranges. Let's go a different way, but I'll remember sepBy
in later challenges.
Do or Do Not
Using do
notation is resulting in verbose code. We have to bind the separator characters just to discard them, and we need a separate line just to return the final result.
Monads, Applicatives, and Functors are often described as being values wrapped in some abstract 'box'. In our last Day 3 bonus post, our box was functions; we wanted to compose values that still needed a String passed to them. Now we want to compose values that haven't yet been parsed. They are still in the Parser box, if you will.
We can "lift" our combining functions to work on boxed values. To discard the separator in between, we can use the operator (>>). This operator is like bind (>>=) except that it discards the first first result value.
Putting it all together, we have:
parseRangePair :: GenParser Char u (Range, Range)
parseRangePair =
let parseInt = read <$> many1 digit
parseRange = liftA2 Range parseInt (char '-' >> parseInt)
in liftA2 (,) parseRange (char ',' >> parseRange)
Further refactoring?
Not right now, thank you. parseInt
definitely belongs in my Input module, and the internal structure of parseRange
and parseRangePair
could also be useful to reuse. But that's a problem for a future day. Maybe even tomorrow!
Advent of Code 2022 Series
This post is part of a series describing my Haskell solutions to Advent of Code 2022.
Main Post: Day 4: Camp Cleanup
Cheers!