Advent of Code 2022 - Day 7: No Space Left On Device
I took a break for a few days before tackling Day 7: No Space Left On Device. In this challenge, we are tasked with reconstructing a file system structure from a list of terminal output. I decided I wanted to solve this with two rounds of parsing:
- Parse the raw lines into structured TerminalOutput commands.
- Parse the list of TerminalOutput commands into a FileSystem tree.
First Parse - String to Terminal Output
First things first, we needed to convert the strings into something more useful:
data TerminalOutput
= LS
| Cd String
| CdUp
| LSDir String
| LSFile Int String
deriving (Eq, Show)
parseLine :: ParsecT [Char] u Identity TerminalOutput
parseLine = parseCdUp <|> parseCd <|> parseDir <|> parseFile <|> parseLS
where
parseCdUp = try $ string "$ cd .." $> CdUp
parseCd = try $ fmap Cd (string "$ cd " >> many1 (noneOf "\n"))
parseDir = try $ fmap LSDir (string "dir " >> many1 (noneOf "\n"))
parseFile = try $ liftA2 LSFile (parseInt <* string " ") (many1 (noneOf "\n"))
parseLS = try $ string "$ ls" $> LS
The mishmash of punctuation and parentheses isn't the easiest to read, but it generally parses the line left-to-right, discarding any cruft.
Parsing the Terminal Output Again?
With a list of TerminalOutput commands, I now wanted to parse a FileSystem tree. Sure, I could have just used recursion, but I didn't want to pass around the state of what directory I was in. Plus, the point of this is to learn, right?
To write a custom Parsec parser, I looked to the Text.Parsec.Prim
module, specifically tokenPrim
. We can define a combinator for each of our tokens like so:
ls :: ParsecT [TerminalOutput] u Identity ()
ls = tokenPrim show nextPos test
where
test LS = Just ()
test _ = Nothing
cdup :: ParsecT [TerminalOutput] u Identity ()
cdup = tokenPrim show nextPos test
where
test CdUp = Just ()
test _ = Nothing
cd :: ParsecT [TerminalOutput] u Identity TerminalOutput
cd = tokenPrim show nextPos test
where
test (Cd s) = Just (Cd s)
test _ = Nothing
lsdir :: ParsecT [TerminalOutput] u Identity TerminalOutput
lsdir = tokenPrim show nextPos test
where
test (LSDir s) = Just (LSDir s)
test _ = Nothing
lsfile :: ParsecT [TerminalOutput] u Identity TerminalOutput
lsfile = tokenPrim show nextPos test
where
test (LSFile i s) = Just (LSFile i s)
test _ = Nothing
Note that ls, cdup
parse a unit ()
type. They could return the TerminalOutput, but we want to ignore them in our final FileSystem anyways.
nextPos
is a function that updates our "source position", stepping through our list of TerminalOutput commands as we parse. Since all of our tokens are a single command, they can share the same implementation:nextPos :: SourcePos -> b1 -> b2 -> SourcePos
nextPos p = const $ const $ newPos (sourceName p) (sourceLine p) (sourceColumn p + 1)
Using Our Parsers to Make a FileSystem
Now that we have our primitives, we can define a parser for a FileSystem. A FileSystem is either a single File (with a name and size) or a Directory (with a name and a list of files or sub-directories).
data FileSystem = Directory String [FileSystem] | File String Int deriving (Eq, Show)
We can parse it like so:
parseFiles :: ParsecT [TerminalOutput] u Identity [FileSystem]
parseFiles = mapMaybe parseListing <$> many (try lsdir <|> try lsfile)
where
parseListing (LSFile i s) = Just $ File s i
parseListing _ = Nothing
parseFileSystem :: ParsecT [TerminalOutput] u Identity FileSystem
parseFileSystem = do
Cd name <- cd
ls
files <- parseFiles
subdirs <- many parseFileSystem
cdup <|> eof
return $ Directory name (files ++ subdirs)
Note that we terminate our parse with an option of cdup <|> eof
. This sidesteps the issue where you may not return to the root directory at the end of the input.
Wait, What Was The Question?
After this really-way-too-involved setup, we can now go about solving Part 1 and Part 2. These are both pretty simple functions now that we have our FileSystem object.
size :: FileSystem -> Int
size (File _ i) = i
size (Directory _ fs) = sum $ map size fs
listAll :: FileSystem -> [FileSystem]
listAll f@(File _ _) = [f]
listAll d@(Directory _ fs) = d : concatMap listAll fs
isDir :: FileSystem -> Bool
isDir (Directory _ _) = True
isDir _ = False
part1 :: FileSystem -> Int
part1 = sum . filter (<= 100000) . fmap size . filter isDir . listAll
diskSize, spaceNeeded :: Int
diskSize = 70000000
spaceNeeded = 30000000
part2 :: FileSystem -> Int
part2 root = minimum $ filter (>= spaceToFree) $ fmap size $ filter isDir $ listAll root
where
spaceToFree = spaceNeeded + size root - diskSize
This is doing a lot of extra work in calculating the size for each of the directories and sub-directories. A more efficient approach would precalculate the size of each directory when parsing it, so that it can be retrieved in constant time and avoid recomputing the size of subtrees.
Full Code
Here's the full code! The imports are more verbose than usual, to disambiguate which module the Parsec and Applicative functions came from.
module AOC2022.Day07 (spec) where
import Control.Applicative (liftA2)
import Data.Functor (($>))
import Data.Functor.Identity (Identity)
import Data.Maybe (mapMaybe)
import Input (parseInt, readDay)
import Test.Hspec (describe, hspec, it, shouldBe)
import Text.Parsec.Pos
( SourcePos,
newPos,
sourceColumn,
sourceLine,
sourceName,
)
import Text.Parsec.Prim
( ParsecT,
many,
parse,
tokenPrim,
try,
(<|>),
)
import Text.ParserCombinators.Parsec
( ParseError,
eof,
many1,
noneOf,
string,
)
input :: IO (Either ParseError FileSystem)
input = do
raw <- lines <$> readDay 2022 7
return $ do
output <- traverse (parse parseLine "") raw
parse parseFileSystem [] output
spec :: IO ()
spec = hspec $ do
describe "Part 1" $ do
it "runs a test" $ do
myInput <- input
fmap part1 myInput `shouldBe` Right 0 -- redacted
describe "Part 2" $ do
it "runs a test" $ do
myInput <- input
fmap part2 myInput `shouldBe` Right 0 -- redacted
data TerminalOutput
= LS
| Cd String
| CdUp
| LSDir String
| LSFile Int String
deriving (Eq, Show)
parseLine :: ParsecT [Char] u Identity TerminalOutput
parseLine = parseCdUp <|> parseCd <|> parseDir <|> parseFile <|> parseLS
where
parseCdUp = try $ string "$ cd .." $> CdUp
parseCd = try $ fmap Cd (string "$ cd " >> many1 (noneOf "\n"))
parseDir = try $ fmap LSDir (string "dir " >> many1 (noneOf "\n"))
parseFile = try $ liftA2 LSFile (parseInt <* string " ") (many1 (noneOf "\n"))
parseLS = try $ string "$ ls" $> LS
data FileSystem = Directory String [FileSystem] | File String Int deriving (Eq, Show)
nextPos :: SourcePos -> b1 -> b2 -> SourcePos
nextPos p = const $ const $ newPos (sourceName p) (sourceLine p) (sourceColumn p + 1)
ls :: ParsecT [TerminalOutput] u Identity ()
ls = tokenPrim show nextPos test
where
test LS = Just ()
test _ = Nothing
cdup :: ParsecT [TerminalOutput] u Identity ()
cdup = tokenPrim show nextPos test
where
test CdUp = Just ()
test _ = Nothing
cd :: ParsecT [TerminalOutput] u Identity TerminalOutput
cd = tokenPrim show nextPos test
where
test (Cd s) = Just (Cd s)
test _ = Nothing
lsdir :: ParsecT [TerminalOutput] u Identity TerminalOutput
lsdir = tokenPrim show nextPos test
where
test (LSDir s) = Just (LSDir s)
test _ = Nothing
lsfile :: ParsecT [TerminalOutput] u Identity TerminalOutput
lsfile = tokenPrim show nextPos test
where
test (LSFile i s) = Just (LSFile i s)
test _ = Nothing
parseFiles :: ParsecT [TerminalOutput] u Identity [FileSystem]
parseFiles = mapMaybe parseListing <$> many (try lsdir <|> try lsfile)
where
parseListing (LSFile i s) = Just $ File s i
parseListing _ = Nothing
parseFileSystem :: ParsecT [TerminalOutput] u Identity FileSystem
parseFileSystem = do
Cd name <- cd
ls
files <- parseFiles
subdirs <- many parseFileSystem
cdup <|> eof
return $ Directory name (files ++ subdirs)
size :: FileSystem -> Int
size (File _ i) = i
size (Directory _ fs) = sum $ map size fs
listAll :: FileSystem -> [FileSystem]
listAll f@(File _ _) = [f]
listAll d@(Directory _ fs) = d : concatMap listAll fs
isDir :: FileSystem -> Bool
isDir (Directory _ _) = True
isDir _ = False
part1 :: FileSystem -> Int
part1 = sum . filter (<= 100000) . fmap size . filter isDir . listAll
diskSize, spaceNeeded :: Int
diskSize = 70000000
spaceNeeded = 30000000
part2 :: FileSystem -> Int
part2 root = minimum $ filter (>= spaceToFree) $ fmap size $ filter isDir $ listAll root
where
spaceToFree = spaceNeeded + size root - diskSize
Advent of Code 2022 Series
This post is part of a series describing my Haskell solutions to Advent of Code 2022.
Next: Day 8: Treetop Tree House Previous: Day 6: Tuning Trouble
Cheers!