Speedup tricks
EMD is inherently slow with little chances on improving its performance. This is mainly due to it being a serial method. That’s both on within IMF stage, i.e. iterative sifting, or between IMFs, i.e. the next IMF depends on the previous. On top of that, the common configuration of the EMD uses the natural cubic spline to span envelopes, which in turn additionally decreases performance since it depends on all extrema in the signal.
Since the EMD is the basis for other methods like EEMD and CEEMDAN these will also suffer from the same problem. What’s more, these two methods perform the EMD many (hundreds) times which significantly increases any imperfections. It is expected that when it’ll take more than a minute to perform an EEMD/CEEMDAN with default settings on a 10k+ samples long signal with a “medium complexity”. There are, however, a couple of tweaks one can do to do make the computation finish sooner.
Sections below describe a tweaks one can do to improve performance of the EMD. In short, these changes are:
Parallel execution (enabled by default for EEMD/CEEMDAN)
Change data type (downscale)
Change spline method to piecewise
Parallel execution
This is enabled by default since version 1.8.0.
EEMD and CEEMDAN can take advantage of multiple CPU cores to speed up computation significantly. Both methods run multiple independent EMD decompositions on noise-perturbed signals, making them well-suited for parallel execution.
As of version 1.8.0, both EEMD and CEEMDAN have parallel=True by default and will automatically use all available CPU cores. This typically provides near-linear speedup with the number of cores.
To explicitly control parallelization:
from PyEMD import EEMD, CEEMDAN
# Use all available CPUs (default behavior)
eemd = EEMD(trials=100)
# Use specific number of processes
eemd = EEMD(trials=100, processes=4)
# Disable parallelization (for reproducibility with seeds or debugging)
eemd = EEMD(trials=100, parallel=False)
Note
When using noise_seed() for reproducible results, you must set parallel=False.
Parallel execution does not guarantee deterministic ordering of results.
When to disable parallelization:
When you need reproducible results using
noise_seed()For very short signals (< 100 samples) where multiprocessing overhead exceeds the benefit
For very few trials (< 4) where overhead isn’t amortized
When debugging or profiling
Change data type
Many programming frameworks by default casts numerical values to the largest data type it has. In case of Python’s Numpy that’s going to be numpy.float64. It’s unlikely that one needs such resolution when using EMD [*]. A suggestion is to downcast your data, e.g. to float16. The PyEMD should handle the same data type without upcasting but it can be additionally enforce a specific data type. To enable data type enforcement one needs to pass the DTYPE, i.e.
from PyEMD import EMD
emd = EMD(DTYPE=np.float16)
Change spline method
EMD was presented with the natural cubic spline method to span envelopes and that’s the default option in the PyEMD. It’s great for signals with not many extrema but its not suggested for longer/more complex signals. The suggestion is to change the spline method to some piecewise splines like ‘Akima’ or ‘piecewise cubic’.
Example:
from PyEMD import EEMD
eemd = EEMD(spline_kind='akima')
Decrease number of trials
This relates more to EEMD and CEEMDAN since they perform an EMD a multiple times with slightly modified signal. It’s difficult to choose a correct number of iterations. This definitely relates to the signal in question. The more iterations the more certain that the solution is convergent but there is likely a point beyond which more evaluations change little. On the other side, the quicker we can get output the quicker we can use it.
In the PyEMD, the number of iterations is referred to by trials and it’s an explicit parameter to EEMD and CEEMDAN. The default value was selected arbitrarily and it’s most likely wrong. An example on updating it:
from PyEMD import CEEMDAN
ceemdan = CEEMAN(trials=20)
Limit numer of output IMFs
Each method, by default, will perform decomposition until all components are returned. However, many use cases only require the first component. One can limit the number of returned components by setting up an implicit variable max_imf to the desired value.
Example:
from PyEMD import EEMD
eemd = EEMD(max_imfs=2)