/wakamoleguy

Advent of Code 2022 - Day 7: No Space Left On Device

I took a break for a few days before tackling Day 7: No Space Left On Device. In this challenge, we are tasked with reconstructing a file system structure from a list of terminal output. I decided I wanted to solve this with two rounds of parsing:

  1. Parse the raw lines into structured TerminalOutput commands.
  2. Parse the list of TerminalOutput commands into a FileSystem tree.

First Parse - String to Terminal Output

First things first, we needed to convert the strings into something more useful:

data TerminalOutput
  = LS
  | Cd String
  | CdUp
  | LSDir String
  | LSFile Int String
  deriving (Eq, Show)

parseLine :: ParsecT [Char] u Identity TerminalOutput
parseLine = parseCdUp <|> parseCd <|> parseDir <|> parseFile <|> parseLS
  where
    parseCdUp = try $ string "$ cd .." $> CdUp
    parseCd = try $ fmap Cd (string "$ cd " >> many1 (noneOf "\n"))
    parseDir = try $ fmap LSDir (string "dir " >> many1 (noneOf "\n"))
    parseFile = try $ liftA2 LSFile (parseInt <* string " ") (many1 (noneOf "\n"))
    parseLS = try $ string "$ ls" $> LS

The mishmash of punctuation and parentheses isn't the easiest to read, but it generally parses the line left-to-right, discarding any cruft.

Parsing the Terminal Output Again?

With a list of TerminalOutput commands, I now wanted to parse a FileSystem tree. Sure, I could have just used recursion, but I didn't want to pass around the state of what directory I was in. Plus, the point of this is to learn, right?

To write a custom Parsec parser, I looked to the Text.Parsec.Prim module, specifically tokenPrim. We can define a combinator for each of our tokens like so:

ls :: ParsecT [TerminalOutput] u Identity ()
ls = tokenPrim show nextPos test
  where
    test LS = Just ()
    test _ = Nothing

cdup :: ParsecT [TerminalOutput] u Identity ()
cdup = tokenPrim show nextPos test
  where
    test CdUp = Just ()
    test _ = Nothing

cd :: ParsecT [TerminalOutput] u Identity TerminalOutput
cd = tokenPrim show nextPos test
  where
    test (Cd s) = Just (Cd s)
    test _ = Nothing

lsdir :: ParsecT [TerminalOutput] u Identity TerminalOutput
lsdir = tokenPrim show nextPos test
  where
    test (LSDir s) = Just (LSDir s)
    test _ = Nothing

lsfile :: ParsecT [TerminalOutput] u Identity TerminalOutput
lsfile = tokenPrim show nextPos test
  where
    test (LSFile i s) = Just (LSFile i s)
    test _ = Nothing

Note that ls, cdup parse a unit () type. They could return the TerminalOutput, but we want to ignore them in our final FileSystem anyways.

nextPos is a function that updates our "source position", stepping through our list of TerminalOutput commands as we parse. Since all of our tokens are a single command, they can share the same implementation:
nextPos :: SourcePos -> b1 -> b2 -> SourcePos
nextPos p = const $ const $ newPos (sourceName p) (sourceLine p) (sourceColumn p + 1)

Using Our Parsers to Make a FileSystem

Now that we have our primitives, we can define a parser for a FileSystem. A FileSystem is either a single File (with a name and size) or a Directory (with a name and a list of files or sub-directories).

data FileSystem = Directory String [FileSystem] | File String Int deriving (Eq, Show)

We can parse it like so:

parseFiles :: ParsecT [TerminalOutput] u Identity [FileSystem]
parseFiles = mapMaybe parseListing <$> many (try lsdir <|> try lsfile)
  where
    parseListing (LSFile i s) = Just $ File s i
    parseListing _ = Nothing

parseFileSystem :: ParsecT [TerminalOutput] u Identity FileSystem
parseFileSystem = do
  Cd name <- cd
  ls
  files <- parseFiles
  subdirs <- many parseFileSystem
  cdup <|> eof
  return $ Directory name (files ++ subdirs)

Note that we terminate our parse with an option of cdup <|> eof. This sidesteps the issue where you may not return to the root directory at the end of the input.

Wait, What Was The Question?

After this really-way-too-involved setup, we can now go about solving Part 1 and Part 2. These are both pretty simple functions now that we have our FileSystem object.

size :: FileSystem -> Int
size (File _ i) = i
size (Directory _ fs) = sum $ map size fs

listAll :: FileSystem -> [FileSystem]
listAll f@(File _ _) = [f]
listAll d@(Directory _ fs) = d : concatMap listAll fs

isDir :: FileSystem -> Bool
isDir (Directory _ _) = True
isDir _ = False

part1 :: FileSystem -> Int
part1 = sum . filter (<= 100000) . fmap size . filter isDir . listAll

diskSize, spaceNeeded :: Int
diskSize = 70000000
spaceNeeded = 30000000

part2 :: FileSystem -> Int
part2 root = minimum $ filter (>= spaceToFree) $ fmap size $ filter isDir $ listAll root
  where
    spaceToFree = spaceNeeded + size root - diskSize

This is doing a lot of extra work in calculating the size for each of the directories and sub-directories. A more efficient approach would precalculate the size of each directory when parsing it, so that it can be retrieved in constant time and avoid recomputing the size of subtrees.

Full Code

Here's the full code! The imports are more verbose than usual, to disambiguate which module the Parsec and Applicative functions came from.

module AOC2022.Day07 (spec) where

import Control.Applicative (liftA2)
import Data.Functor (($>))
import Data.Functor.Identity (Identity)
import Data.Maybe (mapMaybe)
import Input (parseInt, readDay)
import Test.Hspec (describe, hspec, it, shouldBe)
import Text.Parsec.Pos
  ( SourcePos,
    newPos,
    sourceColumn,
    sourceLine,
    sourceName,
  )
import Text.Parsec.Prim
  ( ParsecT,
    many,
    parse,
    tokenPrim,
    try,
    (<|>),
  )
import Text.ParserCombinators.Parsec
  ( ParseError,
    eof,
    many1,
    noneOf,
    string,
  )

input :: IO (Either ParseError FileSystem)
input = do
  raw <- lines <$> readDay 2022 7
  return $ do
    output <- traverse (parse parseLine "") raw
    parse parseFileSystem [] output

spec :: IO ()
spec = hspec $ do
  describe "Part 1" $ do
    it "runs a test" $ do
      myInput <- input
      fmap part1 myInput `shouldBe` Right 0 -- redacted
  describe "Part 2" $ do
    it "runs a test" $ do
      myInput <- input
      fmap part2 myInput `shouldBe` Right 0 -- redacted

data TerminalOutput
  = LS
  | Cd String
  | CdUp
  | LSDir String
  | LSFile Int String
  deriving (Eq, Show)

parseLine :: ParsecT [Char] u Identity TerminalOutput
parseLine = parseCdUp <|> parseCd <|> parseDir <|> parseFile <|> parseLS
  where
    parseCdUp = try $ string "$ cd .." $> CdUp
    parseCd = try $ fmap Cd (string "$ cd " >> many1 (noneOf "\n"))
    parseDir = try $ fmap LSDir (string "dir " >> many1 (noneOf "\n"))
    parseFile = try $ liftA2 LSFile (parseInt <* string " ") (many1 (noneOf "\n"))
    parseLS = try $ string "$ ls" $> LS

data FileSystem = Directory String [FileSystem] | File String Int deriving (Eq, Show)

nextPos :: SourcePos -> b1 -> b2 -> SourcePos
nextPos p = const $ const $ newPos (sourceName p) (sourceLine p) (sourceColumn p + 1)

ls :: ParsecT [TerminalOutput] u Identity ()
ls = tokenPrim show nextPos test
  where
    test LS = Just ()
    test _ = Nothing

cdup :: ParsecT [TerminalOutput] u Identity ()
cdup = tokenPrim show nextPos test
  where
    test CdUp = Just ()
    test _ = Nothing

cd :: ParsecT [TerminalOutput] u Identity TerminalOutput
cd = tokenPrim show nextPos test
  where
    test (Cd s) = Just (Cd s)
    test _ = Nothing

lsdir :: ParsecT [TerminalOutput] u Identity TerminalOutput
lsdir = tokenPrim show nextPos test
  where
    test (LSDir s) = Just (LSDir s)
    test _ = Nothing

lsfile :: ParsecT [TerminalOutput] u Identity TerminalOutput
lsfile = tokenPrim show nextPos test
  where
    test (LSFile i s) = Just (LSFile i s)
    test _ = Nothing

parseFiles :: ParsecT [TerminalOutput] u Identity [FileSystem]
parseFiles = mapMaybe parseListing <$> many (try lsdir <|> try lsfile)
  where
    parseListing (LSFile i s) = Just $ File s i
    parseListing _ = Nothing

parseFileSystem :: ParsecT [TerminalOutput] u Identity FileSystem
parseFileSystem = do
  Cd name <- cd
  ls
  files <- parseFiles
  subdirs <- many parseFileSystem
  cdup <|> eof
  return $ Directory name (files ++ subdirs)

size :: FileSystem -> Int
size (File _ i) = i
size (Directory _ fs) = sum $ map size fs

listAll :: FileSystem -> [FileSystem]
listAll f@(File _ _) = [f]
listAll d@(Directory _ fs) = d : concatMap listAll fs

isDir :: FileSystem -> Bool
isDir (Directory _ _) = True
isDir _ = False

part1 :: FileSystem -> Int
part1 = sum . filter (<= 100000) . fmap size . filter isDir . listAll

diskSize, spaceNeeded :: Int
diskSize = 70000000
spaceNeeded = 30000000

part2 :: FileSystem -> Int
part2 root = minimum $ filter (>= spaceToFree) $ fmap size $ filter isDir $ listAll root
  where
    spaceToFree = spaceNeeded + size root - diskSize

Advent of Code 2022 Series

This post is part of a series describing my Haskell solutions to Advent of Code 2022.

Next: Day 8: Treetop Tree House Previous: Day 6: Tuning Trouble

Cheers!