Multiplying polynomials

Oct 25, 2022 • arithmetics • Christoph Dürr
Related problems: [spoj:SWERC14C]

You are given two polynomials $P$ and $Q$ and want to compute their product. The polynomials are given in form of an array with their coefficients.

Application

This can be used to compute two big numbers, represented as arrays with their digits. Just observe that 1234 equals $1\cdot 10^3 + 2\cdot 10^2 + 3\cdot 10 + 4$. Hence we could represent the numbers by polynomials (evaluated at $x=10$) using exactly the same array of digits playing the role of coefficients. Once we computed the product of $P$ with $Q$, we need to normalize the resulting array, to make sure that every digit is between 0 and 9 (or whatever base you use.) This normalization can be done in linear time, by transporting a carry over to higher digits. Special care has to be taken to normalize negative digits.

The naive way

If we write $P(x) = \sum_{i=0}^n p_i x^i$ and $Q(x) = \sum_{i=0}^n q_i x^i$, then the product is

\[\sum_{i=0}^{2n} (\sum_{j=0}^i p_j q_{i-j}) x^i.\]

Implementing the above formula in a straightforward manner gives the time complexity $O(n^2)$.

The anecdote

In the 1960s, Andreï Kolmogorov wanted to show that this is best possible. He hold a seminar on this question, and had prepared some exercises that should help to show that claim. A 23 year old student Anatolii Karatsuba, told him a week after the start of the seminar, that he found a better way to compute the product. Kolmogorov presented the method during the next meeting and announced the end of the seminar. This generated a intensive research on this question. At the time of writing, the best known multiplication of two $n$ digit number runs in time $O(n \log n 4^{\log^* n})$, where $\log^ * $ is the iterated logarithm, and can be safely assumed to be a constant for reasonable values of $n$. It is due to Harvey and van der Hoeven in 2021. The multiplication of integers is conjectured to have complexity $\Omega(n\log n)$.

Divide and conquer

Assume that both $P$ and $Q$ have degree at least 2, otherwise the multiplication can be done in linear time. The trick is to divide each polynomial into two parts of roughly equal size. This could be done for example, by splitting the polynomials according to the parity of the coefficients. It might be easier to implement, if we split a polynomial into a part containing only powers of $x$ until $n/2$ and a part containing powers of $x$ strictly larger than $n/2$. Hence we set $k=\lceil n / 2\rceil$ and consider polynomials $a,b,c,d$ of degree at most $k$ such that

\[P = a + b x^k \\ Q = c + d x^k.\]

The trick is to reformulate the product of $P$ with $Q$ in such a way that we only have to make 3 recursive calls to the multiplication of polynomials.

\[P\cdot Q = ac + (ad+cb)x^k + bd x^{2k} \\ = ac + (ac+bd - (a-b)(c-d))x^k + bd x^{2k}.\]

With the first expression we would need to make 4 multplications, namely $ac,ad,cb,bd$, while the second one contains only the 3 multiplications $ac, bd, (a-b)(c-d)$. So clever! The complexity of this method is

\[T(n) = 3 T(\lceil n/2\rceil) + O(n) = O(n^{\log_2 3}) = O(n^{1.585}).\]

Implementation

Since we represent polynomials by lists of their coefficients, adding two polynomials is done by member-wise addition. Special care has to be done if the polynomials do not have exactly the same degree, or to be precise if the lists representing them do not have the same length. One can simply make a copy of the longer one, and accumulate the coefficients of the shorter into this copy.

For the subtraction we make use of the equation $P-Q = P + (-Q)$.

def add_poly(P, Q):
    if len(P) < len(Q):
        P, Q = Q, P     # add the shorter to the longer vector
    R = P[::]           # make a copy
    for i, qi in enumerate(Q):
        R[i] += qi      # accumulate Q into R
    return R 


def sub_poly(P, Q):
    return add_poly(P, [-qi for qi in Q])

If a polynomial $P$ is represented by a list $L$ of its coefficients, then $P\cdot x^k$ is represented by $L$ appended to a list consisting of $k$ zeros. We use this in the following implementation.

def mul_poly(P, Q):                   
    if not P or not Q:  # case one of P, Q is the constant zero
        return []
    if len(P) == 1:
        return [qi * P[0] for qi in Q]
    elif len(Q) == 1:
        return [pi * Q[0] for pi in P]
    k = max(len(P), len(Q)) // 2
    xk = [0] * k
    a = P[:k]           # split: P = a + b * x**k
    b = P[k:]
    c = Q[:k]           # split: Q = c + d * x**k
    d = Q[k:]
    a_b = sub_poly(a, b)
    c_d = sub_poly(c, d)
    ac = mul_poly(a, c)
    bd = mul_poly(b, d)
    abcd = mul_poly(a_b, c_d)
    ad_bc = sub_poly(add_poly(ac, bd), abcd)    # = ad - bc
    # result is ac + [ac + bd - (a - b)(c - d)]*x**k + bd*x**(2k)
    return add_poly(ac, xk + add_poly(ad_bc, xk + bd))

Fast Fourier Transformation

When you have a polynomial $P$ of degree $n-1$, there are essentially two ways to represent it in a computer.

The representation by coefficients, where you describe $P$ by a coefficients array $p$ such that $P[x] = p_0 + p_1 x + \ldots + p_{n-1} x^{n-1}.$
The representation by samples, where you describe $P$ by a sample value array $y$ and a sample location array $x$ (all different) such that $P$ is the unique polynomial of degree $n-1$ satisfying $P[x_0]=y_0, \ldots, P[x_{n-1}] = y_{n-1}.$

The first representation has the advantage that you can evaluate a polynomial in linear time, as well as add two polynomials. However multiplying is a bit more complicated in this representation. You could use Karatsuba’s algorithm for example, running in time $O(n^{1.585})$.

The second representation has the advantage that you can add two polynomials in linear time (add the sample values), and multiply two polynomials in linear time. First extend the samples in each polynomial, so to have $2n-1$ samples, and then multiply the sample values.

It would be nice to convert from one representation to the other one, so to benefit from the advantages of both representations. And this is exactly what the fast Fourier transformation (FFT for short) does.

First it assumes that $n$ is a power of two. This simplifies the recursion. And then it considers particular sample locations $(x_j)$, which are the complex numbers $x_j = exp(-2i \pi j / n).$ Here $i$ stands for the imaginary number satisfying $i^2=-1$.

We refer to the excellent lecture notes by Jeff Erickson, who explains the method in detail. It is a divide and conquer method, first dividing the given coefficient array $p$ into even and odd indexed parts. Recurse on each part and recombining the result.

def fft(x):
    """ Fast Fourier Transformation
    :input x: list of coefficients. Length n has to be a power of 2.
    :returns: list of sample values.
    :complexity: O(n log n).
    """
    n2 = len(x) // 2
    if n2 == 0:
        return x
    assert(2 * n2 == len(x))        # need to split evenly
    even = fft(x[0::2])
    odd  = fft(x[1::2])
    T = [cmath.exp(-1j * math.pi * k / n2) * odd[k] for k in range(n2)]
    return [even[k] + T[k] for k in range(n2)] + \
           [even[k] - T[k] for k in range(n2)]

This function can be inverted fairly easily. We need to run the original FFT on the conjugate input vector, and do some scaling at the end.

def inv_fft(y):
    n = len(y)
    p = fft([yi.conjugate() for yi in y])
    return [pi.conjugate() / n for pi in p]

There are more efficient implementations, which do the transformation in-place in the given array, and also which avoid recursive calls. But the above implementation has the advantage of being rather short and this can be an advantage during a competition.

So how can we use the FFT to multiply two polynomials $P,Q$ given by their coefficient arrays $p,q$? Well first we pad with zeros the arrays so they have the same length and a length which is a power of 2. Moreover the resulting length should be at least the total original length. This is needed, because the degree of $P\cdot Q$ is the sum of the degrees of $P$ and $Q$. Then we compute the FFT for each array, resulting in arrays $y$ and $z$. From this we compute a resulting array $r$ satisfying $r[i]=y[i] * z[i]$. To complete we apply the inverse FFT on $r$.

If the given polynomials have integer coefficients, it might be a good idea to round the result. The FFT manipulates complex numbers and will generate rounding errors.

def pad(x):
    """ pad array x with zeros to make its length a power of two
    """
    n = 1   
    while n < len(x):
        n <<= 1
    x += [0] * (n - len(x))

    
def mul_poly_fft(p, q):
    """Multiply two polynomials in integer coefficient representation
    :complexity: O(n log n)
    """
    n = (len(p) + len(q))   # make them of same and enough large size
    p += [0] * (n - len(p))
    q += [0] * (n - len(q))
    pad(p)
    y = fft(p)
    pad(q)
    z = fft(q)
    n = len(y)  # the padding might have increased the size n
    r = [y[i] * z[i] for i in range(n)]
    R = inv_fft(r)
    return [int(round(ri.real)) for ri in R]

Discussion on Golf Bot, a problem from ICPC/SWERC 2014

In this problem, essentially we receive two lists $K$ and $D$, and ask for how many $d\in D$ either $d\in K$ or there are $j,k\in K$ with $j+k=d$. We can assume that $0$ belongs to $K$, and then we only need to count the intersection of $D$ and $K+K$, which is $ \{ j+k:j,k\in K \} $.

Imagine a polynomial $P[x]=\sum_{j\in K}x^j$, and compute its product $Q=P\cdot P$. Now it is just a matter of counting the number of $d\in D$, such that $x^d$ has a positive coefficient in $Q$.

With the implementation of FFT presented above in Python, our solution is too slow for the judges. It takes 2 seconds on the largest test case on my machine at the time of writing, and the judges have a limit of one second.