# Suffix Array

Given a string s, sort all cyclic shifts of s. Formally produce a table p such that p[j]=i if s[i:]+s[:i] has rank j among all cyclic shifts.

## Example

On input “bobocel”, here are all cyclic shifts, together with by how much they are shifted.

```
bobocel 0
obocelb 1
bocelbo 2
ocelbob 3
celbobo 4
elboboc 5
lboboce 6
```

And the same list, but lexicographically sorted. Together with the rank in the new order.

```
0 bobocel 0
1 bocelbo 2
2 celbobo 4
3 elboboc 5
4 lboboce 6
5 obocelb 1
6 ocelbob 3
```

Hence on this example we should output `p=[0, 2, 4, 5, 6, 1, 3]`

.

## Complexity

The naïve algorithm for this problem stores all cyclic shifts in a list and sorts it. This takes time $O(n^2 \log n)$, because the list has size $n$, and comparing two strings of length $n$ takes time $O(n)$.

The problem can be solved in time $O(n)$, under some conditions on the alphabet. But we present an $O(n \log^2 n)$ implementation, which is good enough for most programming contests.

## The key operation

The algorithm relies on a simple sorting function `sort_class`

, which not only sorts a given string or list `s`

, but also returns additional informations. It returns two tables `p`

and `c`

such that

- p[j]=i if s[i] has rank j in
`sorted(s)`

. - c[j]=i if s[i] has rank j in
`sorted(set(s))`

.

Note that the second table groups identical elements in s, and gives a rank only to the equivalence classes. For example

```
index = 0123456
input s = "bobocel"
```

after sorting s we have for example (because identical letters can be ordered arbitrarily within each other)

```
original index = 0245613
sorted s = "bbceloo"
```

If we rank all distinct letters of the input string we have

```
rank = 0 1 2 3 4
sorted(set(s)) = b c e l o
```

Hence our function returns

```
p = [0, 2, 4, 5, 6, 1, 3]
c = [0, 4, 0, 4, 1, 2, 3]
s = b o b o c e l # for comparison
```

## The cyclic version

We present here the cyclic version of the problem. If we want to sort the suffixes of a given string s, then we can just sort the cyclic shifts of s + special, where special is a dummy character, smaller than all characters in s.

## Iteration

The idea is that for K being every integer power of 2, we want to sort the K-lengths prefixes of all cyclic shifts. For example for K=2 and s=”bobocel” we want to sort the following strings.

```
bo
ob
bo
oc
ce
el
lb
```

But we already have the order and equivalence classes of all (K/2)-length prefixes of all cyclic shifts. Now every K-lengths prefix is the concatenation of two (K/2)-length prefixes, say `xy`

. Let i be the equivalence class of `x`

and j be the equivalence class of `y`

. Then the pair (i,j) represents the equivalence class of `xy`

. We can use `sort_class`

to translate the pairs into rank integers.

The strings of the example above correspond to the following pairs, where the table `c`

from the previous iteration makes the correspondance.

```
(0, 4)
(4, 0)
(0, 4)
(4, 1)
(1, e)
(e, 3)
(3, 0)
```

In total we have a logarithmic number of outer iterations, each costs `O(n log n)`

, leading to the claimed complexity.

## Implementation

## References

- an O(nlogn) implementation in CP-algorithms – describes applications
- Visualization in action