# Searching a substring

## Statement

A basic problem in string algorithms is to locate a given substring `needle`

in a given string `haystack`

. Formally the goal is to find an index i such that the j-th character in `needle`

is equivalent to the (i+j)-th character in `haystack`

.

A naïve implementation would have worst case time complexity $\Theta(n m)$ where $n,m$ are the respective sizes of the given strings. Here is a tight example.

Given positive integer n, locate the string $a^nb$ in the string $a^{2n}$.

The naïve algorithm would try $\Theta(n)$ locations (candidates for the index $i$), compare the characters one by one and only at the end observe the inequality $a \neq b$ and realize a fail. Hence the naïve algorithm has running time $\Theta(n^2)$ on this example.

However there are algorthms of complexity $O(n + m)$, the most famous being Knuth-Morris-Pratt, see Rechercher des mots dans un texte par l’algorithme de Knuth-Morris-Pratt or tryalgo.knuth_morris_pratt.

## Experiments

We did experiments in Java, Python and C++. The latter languages implement the linear time search algorithm, but Java implements the naïve quadratic time algorithm. This has been confirmed by looking at the code of the `String.indexOf(String)`

method. Experiments have been done with various versions of Java, up to version 10. Here is the code we used.

```
class Test_str_find {
public static void main(String[] args) {
int n = Integer.parseInt(args[0]);
int n2 = 2*n;
StringBuilder needle = new StringBuilder();
StringBuilder haystack = new StringBuilder();
for (int i=0; i<n; i++)
needle.append('a');
needle.append('b');
for (int i=0; i<n2; i++) {
haystack.append('a');
}
String s_needle = needle.toString();
String s_haystack = haystack.toString();
s_haystack.indexOf(s_needle);
}
}
```

## Workaround

In Java however there is the possibility to search a regular expression in a given string, and this implementation is efficient. Hence you can use it to search for a substring. The following code has linear time complexity.

```
public int my_indexOf(String needle, String haystack) {
Pattern p = Pattern.compile(String.quote(needle));
Matcher m = p.matcher(haystack);
if (m.find())
return m.start();
else
return -1;
}
```