We defined the FFT as:
If N is even, the above sum can be split into 'top' (n=0..N/2-1) and 'bottom' (n=N/2..N-1) halves and re-arranged as follows:
If we now consider the form of this result for even and odd valued k separately, we can see how this result enables us to express an N point FFT in terms of 2 (N/2) point FFT's.
Even k, k=2k' (k'=0..N/2-1):
or equivalently..
Odd k, k=2k'+1 (k'=0..N/2-1):
or equivalently..
The process of dividing the frequency components into even and odd parts is what gives this algorithm its name 'Decimation In Frequency'. If N is a regular power of 2, we can apply this method recursively until we get to the trivial 1 point transform. The factors TN are conventionally referred to as 'twiddle factors'.
Given the above results, we can now have a 'first stab' at a recursive routine to implement this algorithm (in a hypothetical Pascal like programming language which supports complex numbers and allows arrays as function arguments and results):
FUNCTION DIF(N,f);
LOCAL N',n,fe,fo,Fe,Fo,k',F;
IF N==1
THEN RETURN(f); {trivial if N==1}
ELSE BEGIN {perform 2 sub-transforms}
N':=N/2; {size of sub-transforms}
FOR n:=0 TO (N'-1) DO {perform N' DIF 'butterflies'}
BEGIN
fe[n]:= f[n]+f[n+N']; {even subset}
fo[n]:=(f[n]-f[n+N'])*T(N,n); {odd subset}
END;
Fe:=DIF(N',fe); {even k}
Fo:=DIF(N',fo); {odd k}
FOR k':=0 TO (N'-1) DO
BEGIN
F[2*k' ]:= Fe[k']; {even k}
F[2*k'+1]:= Fo[k']; {odd k}
END;
RETURN(F);
END;
This is simplest form of DIF implementation and directly reflects the mathematical derivation of the algorithm. The process of calculating the fe and fo components is conventionally referred to as a (DIF) 'butterfly', and is the basic primitive operation of the FFT. (Apparently, this name was aquired from certain diagramatic representations of this operation which resemble butterflies.)
The algorithmic complexity of an FFT is usually quantified in terms of the total number of butterfly operations performed. Let this number be C(p) for a 2p point transform. Looking at the DIF routine above, it is easy to see that C(p) must satisfy the following recurrence relation:
This has solution:
or, in terms of N (=2p):
Dropping the constant scaling factors (including the log base) we get an algorithmic complexity of O(N.logN)
If you coded something like the above routine in a real programming language it would work fine. However, from an efficiency point of view it is somewhat less than ideal. Allocating local arrays on the stack will make this implementation fairly memory hungry, and the last loop in the above routine performs no useful numerical computation, it simply re-arranges values previously calculated.
What if the last loop were somehow replaced by a simple catenation the results of the two sub-transforms? Say we put all the even k samples in the top half of the result and all the odd samples in the bottom half of the result. Treating k as a binary number, the least significant bit of k determines which 'half' of the result array the corresponding value resides in. The least significant bit of k becomes the most significant bit of the corresponding (binary) array index. Because the function is recursive, this bit swapping will occur for each sub-transform etc.. I.E. The array index is determined by bit reversing k. Doing this won't save any time, unless we can arrange that the catenation operation is actually performed implicitly by each sub-transform. This suggests that an 'in place' algorithm should be used (one which performs all necessary operations in a single array, rather than allocating temporary arrays on the stack). Looking at the form of the first loop in the above routine indicates how to do this, put the 'even' subset values back in f[n] (the top half) and put the 'odd' subset values back in f[n+N'] (the bottom half), then perform the DIF operation on each half. Pulling all these ideas together, we get a more efficient (recursive) in-place DIF routine:
{Perform in place DIF of N points starting at position BaseE
DIF(0,N,f) performs DIF FFT on entire array f, N= size of f
}
PROCEDURE DIF(BaseE,N, VAR f); {f is an external array}
LOCAL N',BaseO,n,e,o;
IF N==1
THEN {do nothing}
ELSE BEGIN
N':=N>>1; {shift right to get size of sub-transforms}
BaseO:=BaseE+N'; {split block into 2 halves}
FOR n:=0 TO (N'-1) DO
BEGIN
e:= f[BaseE+n]+f[BaseO+n];
o:=(f[BaseE+n]-f[BaseO+n])*T(N,n);
f[BaseE+n]:=e;
f[BaseO+n]:=o;
END;
DIF(BaseE,N',f); {even sub-transform}
DIF(BaseO,N',f); {odd sub-transform}
END;
This version of the DIF routine is a little simpler and more efficient than the first, but has the disadvantage that the output is in 'jumbly' (bit reversed) order. This may on may not be a serious problem, depending on what the output is to be used for and whether or not your processor/programming language supports bit reversed addressing (most DSP's do). If bit reversed addressing is not available then you may need to produce a bit reversed index look up table to access the output.
It is worth noting that this is the simplest formulation of the 'in-place' DIF algorithm. It takes normal order input and generates bit reversed order output. However, this is not fundamental to the algorithm, it is merely a consequence of the simplest implementation. It is possible to write a reverse order DIF algorithm FFT which takes bit reversed order input and generates normal order output. This requires the DIF FFT routine to be amended in two ways:
This modified algorithm could be used to implement an inverse DIF FFT, but it's probably simpler to use the DIT algorithm presented in the next section.
For many processors/languages a recursive routine is not attractive, because of the overhead incurred by a procedure call. Be careful though, in reality an aversion to recursion could cost performance (see the note about cache efficiency). There is also a particular problem with DSP's which often have small hardware stacks, so deeply recursive routines will cause stack overflow. It is fairly easy to see that the above routine can be flattened into nested loops, to yield something resembling the 'classic' (non recursive) DIF FFT. Each DIF uses 2 'half size' DIF's, which in turn will use 4 'quarter size' DIF's, etc.. the last (not trivial) DIF will be for size 2. So, keeping notation consistent with the above recursive DIF, we get 3 nested loops:
{Perform in place DIF of 2^p points (=size of f)}
PROCEDURE DIF(p,VAR f);
LOCAL Bp,Np,Np',P,b,n,BaseE,BaseO,e,o;
BEGIN {DIF}
{initialise pass parameters}
Bp:=1; {No. of blocks}
Np:=1<<p; {No. of points in each block}
{perform p passes}
FOR P:=0 TO (p-1) DO
BEGIN {pass loop}
Np':=Np>>1; {No. of butterflies}
BaseE:=0; {Reset even base index}
FOR b:=0 TO (Bp-1) DO
BEGIN {block loop}
BaseO:=BaseE+Np'; {calc odd base index}
FOR n:=0 TO (Np'-1) DO
BEGIN {butterfly loop}
e:= f[BaseE+n]+f[BaseO+n];
o:=(f[BaseE+n]-f[BaseO+n])*T(Np,n);
f[BaseE+n]:=e;
f[BaseO+n]:=o;
END; {butterfly loop}
BaseE:=BaseE+Np; {start of next block}
END; {block loop}
{calc parameters for next pass}
Bp:=Bp<<1; {twice as many blocks}
Np:=Np>>1; {half as many points in each block}
END; {pass loop}
END; {DIF}
That's as far as we'll go on DIF code. Of course, if you look, there's still plenty of scope for further optimisation of this routine. We won't do it here for fear of obscuring the code even more. The remainder of this section is devoted to a few observations about the practical details of the twiddle factors, which so far have been largely neglected.
Recall the form of the twiddle factors:
In practice, calculating these will require time consuming COS's and SIN's, so you certainly don't want to do this for every butterfly (if you do your FFT will be anything but fast). So what you need is a look up table (an array of size N/2) which is calculated just once. The important thing to realise is that you don't need a separate table for each DIF pass (N halves each pass). Instead you use the same table each pass and introduce an additional 'twiddle_step_size' parameter which starts at 1 and doubles after each pass. The twiddle factor used has index n*twiddle_step_size. (The array bounds will never be exceeded, because N halves on each pass and the maximum value of n is N/2-1.)
The next trick is to notice that often the twiddle factor will be either +1 or -j (it will never be -1 or +j). In such cases there is no need to do a full blown complex multiply (remember this requires 4 real multiplies and 2 real adds). Simple addition and subtraction will suffice. We shall call butterflies that use on or other of these twiddle factors 'trivial butterflies'. Specifically:
How often does this occur? Consider the last DIF FFT pass (N=2). In this case T2(0)=1 is the only twiddle factor used. So the entire last pass consists of trivial butterflies. Similarly, the first butterfly of of any sub-block will use TN(0)=1. In general, in pass P (P=0..p-1), there are BP (=2P) sub-blocks and therefore 2P butterflies which use a twiddle factor of +1.
In the penultimate (N=4) pass the butterflies will use either T4(0)=1 or T4(1)=-j. So this pass also uses only trivial butterflies. As in the above case, in each of the previous (N>4, P<p-2) passes, there are BP (=2P) sub-blocks and therefore 2P butterflies which use a twiddle factor of -j (at n=N/4) .
We can now calculate exactly how many of these trivial butterflies Triv(p) there are in a typical DIF FFT for sizes >4 (p>2). This is the sum of the number of butterflies in the last 2 passes (the easy bit) and the number of trivial butterflies in all the other passes (the hard bit, but the result in Annex B is useful here):
The total number of butterflies was calculated earlier:
So for a 2p point DIF FFT, the proportion of trivial butterflies is about 3/p. If this shortcut is exploited, then large FFT's benefit less than small FFT's. For example, in a 512 point (p=9) DIF FFT about 1/3 of the butterflies will be trivial.
The non-recursive routine presented above can be modified to exploit trivial butterflies by doing something like this:
Of course, exactly how much time this will really save in practice is greatly dependent on the processor used.
As a final note, we can observe that there are further opportunities to exploit which we haven't considered:
These aren't quite so trivial (should call them semi-trivial?), but there are still potentially faster ways of using these than full blown complex multiplication.
©1999 - Engineering Productivity Tools Ltd.