# Cover trees

Cover a tree with paths or caterpillars.

## Cover with descending paths

Fix an arbitrary root in the given tree. Now for every vertex $v$ we have the notion of its *subtree* $T_v$ rooted at $v$.

We say that a path is descending, if for one of its extreme points $v$, the path is contained in $T_v$.

If we want to cover a tree with descending paths, then this is quite easy. For example there is the trivial solution consisting for every vertex $v$ of the singleton vertex path ${v}$. Alternatively we could choose for every inner vertex $v$, an arbitrary descendent $u$. The set of selected edges $(u,v)$ would form again a solution.

So usually we impose some additional conditions.

## Heavy-light decomposition

The original definition is slightly different, but this one is easier to implement, and has the same desired property.

Here we want to choose for every inner vertex $v$, a descendent $u$ which maximizes $|T_u|$. Ties can be broken arbitrarily. These edges are called *heavy*. All other edges are called *light*. Now we have the property, that for every pair of vertices $u,v$ with lowest common ancestor $a$, both paths $u-a$ and $a-v$ traverse at most $\log n$ heavy paths. This is because every light edge $(u,v)$ — with $u$ descendant of $v$ — has the property that $T_u$ is at most half as big as $T_v$. Hence each of the two paths can contain at most a logarithmic number of light edges.

This is interesting when we have to maintain a datastructure on the tree, such vertices are labeled with numbers, and we want to add a value $k$ to every vertex along the path between two given vertices $u,v$, or return the sum of these labels along the path. If the tree were a line graph, we could use a segment tree for this purpose. Here we use a segment tree for each heavy path. In fact we can use one big segment tree, with portions of it corresponding to heavy paths.

Here is an excellent detailed explanation.

## Maximizing total path length

Here we want to remove a minimum number of edges in the tree, such that the result consists of a collection of paths. In other words, for every vertex $v$ of degree $d$ at least $3$, we have to remove $d-2$ adjacent edges. Call this number the *deletion number* of vertex $v$.

When removing an edge, it could be between to vertex of positive deletion number, or adjacent only to one. Ideally we prefer the first type of edges. But we cannot remove them greedily, see the example below.

```
(a) (b) (c)
*---1---1---1---1---*
| | | |
* * * *
```

If we remove the middle edge (b), then we need to remove 2 addition edges. However the optimum here is to remove edges (a) and (c).

We can solve the problem greedily using a different approach. For every vertex $v$, let $A_v$ be the maximum number of edges in a covering of $T_v$ with paths. Let $B_v$ be the same number but with the restriction, that $v$ is the extreme point of a path. This includes the case when $v$ is covered by a singleton vertex path.

Now we say that vertex $u$ is *interesting* if $A_u=B_u$.

For a leaf $v$ we have $A_v=B_v=0$, and $v$ is interesting.

Consider a vertex $v$ and all its descendants $u$.

- If none of the descendants is interesting, then $A_v=B_v=\sum A_u$, and $v$ is interesting.
- If a single descendant $u_0$ is interesting, then $A_v=B_v=1+\sum A_u$, and again $v$ is interesting.
- If there are at least two interesting descendants, then $A_v=2+\sum A_u$, $B_v=1+\sum B_u$, and $v$ is not interesting.

We observe that $B_v$ is either $A_v$ or $A_v-1$.

## Cover a tree with caterpillars

A caterpillar is a tree, which consists of a single path, with leafs attached to it. In other words, every vertex of degree at least $3$, can have at most 2 non-leaf neighbors. The goal is to cover the tree with caterpillars, maximizing the total number of edges in these caterpillars.

The optimal covering with caterpillars can be computed with dynamic programming. But we could only find a tedious solution. Which we sketch here.