Zipping Trees, Part 2

Lastmod: 2020-03-23

In part 1 of our tree zipping series, we identified three "problems" with our solution:

Using unnamed tuples and product types with no record syntax can obscure the meaning of each field and requires some extra typing in each function for elements of the zipper that remain unchanged during a particular transformation.
The Tree type from containers uses lists, which is useful for infinite trees. However, our trees are guaranteed to be finite. Moreover, we have to append to the end of a list or drop its last element when visiting or inserting nodes, which takes linear time in the size of the list. We are told that any given node will have at most 10 children, so this isn't a huge issue, but it would be a serious performance problem with a larger branching factor.
All of the visit functions are partial and will crash if there is an invalid operation in the instruction list, such as visiting the child of a leaf node or the parent of the tree root. Hackerrank guarantees that all operations will be valid, so crashing on what should be unreachable cases is reasonable. We could return Maybe Zipper from all of our functions, but that would be more cumbersome to deal with. Perhaps the best solution would be to explicitly call error with an informative message rather than getting something about "irrefutable patterns."

Initially, I was going to address 2. using sequences from Data.Sequence, but as u/ryani commented on reddit, we should be storing the left siblings reversed, since we always care most about the immediate left sibling, not the left-most sibling. This is also how zippers on lists work, so it is only natural to use this representation in a blog series about zippers. How does this change our navigation functions?

In visitChild, we have to reverse the left half of the current focus's children when making a Crumb out of them. We could call reverse after calling splitAt, but instead we just use our own splitRev function.

splitRev :: Int -> [a] -> ([a], a, [a])
splitRev n = loop n [] where
  loop 0 left (focus   : right) = (left, focus, right)
  loop m left (sibling : right) = loop (m - 1) (sibling : left) right
  loop _ _ [] = error "splitRev called with too large an index"

visitChild now becomes

visitChild :: Int -> Zipper -> Zipper
visitChild n (Node x children, crumbs) =
  let (left, focus, right) = splitRev n children
  in  (focus, Crumb x left right : crumbs)

When visiting the parent, node, instead of concatenating the left and right siblings, we have to make sure to re-reverse the left. We can use the report prelude's foldl¹ implementation of reverse but with the right siblings as the starting accumulator.

visitParent :: Zipper -> Zipper
visitParent (focus, Crumb parent left right : cs) =
  (Node parent (foldl' (flip (:)) (focus:right) left), cs)

Deleting the current focus, as before, is almost identical.

delete :: Zipper -> Zipper
delete (_, Crumb parent left right : cs) =
  (Node parent (foldl' (flip (:)) right left), cs)

Visiting the left and right siblings are now both O(1), as is inserting new nodes at those positions.

visitLeft :: Zipper -> Zipper
visitLeft (focus, Crumb parent (l : ls) right : cs) =
  (l, Crumb parent ls (focus : right) : cs)

visitRight :: Zipper -> Zipper
visitRight (focus, Crumb parent left (r : rs) : cs) =
  (r, Crumb parent (focus : left) rs : cs)

insertLeft :: Int -> Zipper -> Zipper
insertLeft x (focus, Crumb parent left right : cs) =
  (focus, Crumb parent (Node x [] : left) right : cs)

insertRight :: Int -> Zipper -> Zipper
insertRight x (focus, Crumb parent left right : cs) =
  (focus, Crumb parent left (Node x [] : right) : cs)

change, insertChild, and main all remain identical.

The only operation that doesn't run in constant time now is visitParent, which takes linear time in the number of left siblings. However, since this is never more than 9, it's basically still a constant. Given the small branching factor, it would be interesting to see what impact using sequences or vectors in place of lists would have on performance. Sequences guarantee O(log(min(|left|, |right|))) time concatenation, which is slightly faster than O(|left|) for lists whereas unboxed vectors have such good cache locality that the copy overhead they would induce could be negligible on modern CPUs. Stay tuned for future posts for a look at actually benchmarking these ideas and for a look at solving problems 1 and 3 that we mentioned in part 1!

Actually we're going to use foldl' from Data.List which is semantically the same but performs better because it is more strict.