Class SpectralFeaturePipelines


  • public final class SpectralFeaturePipelines
    extends Object
    Extracts low level audio features based on frequency domain values. EXPERIMENTAL!!
    Author:
    Hendrik Schreiber
    • Field Detail

      • MAGNITUDE_STANDARD_DEVIATION

        public static AggregateFunction<AudioBuffer,​Float> MAGNITUDE_STANDARD_DEVIATION
        Standard deviation of magnitudes, a.k.a. variability or flatness.
    • Method Detail

      • createOnsetStrengthProcessors

        public static SignalProcessor[] createOnsetStrengthProcessors​(int startTime,
                                                                      int duration,
                                                                      SignalProcessor... tail)
        Creates an array of processors that can be used to form a SignalPipeline that will deliver OnsetStrength values. The signal is decimated to 11025 Hz, Hamming windowed 1024/512, Fourier transformed, bandpass filtered 30-720 Hz and then the onset strength values are determined.
        Parameters:
        startTime - start time in seconds
        duration - duration in seconds
        tail - signal processors to append to the produced processor array
        Returns:
        array of processors
      • createOnsetStrengthProcessors

        public static SignalProcessor[] createOnsetStrengthProcessors​(int startTime,
                                                                      int duration,
                                                                      int lowFrequency,
                                                                      int highFrequency,
                                                                      float onsetFactor,
                                                                      SignalProcessor... tail)
        Creates an array of processors that can be used to form a SignalPipeline that will deliver OnsetStrength values. The signal is decimated to 11025 Hz, Hamming windowed 1024/512, Fourier transformed, and then the onset strength values are determined.
        Parameters:
        startTime - start time in seconds
        duration - duration in seconds
        lowFrequency - lower boundary for the bandpass in Hz (e.g. 30Hz)
        highFrequency - upper boundary for the bandpass in Hz (e.g. 720Hz)
        onsetFactor - the factor by which a power value has to be greater than the power value for the previous frame (e.g. 1.76f)
        tail - signal processors to append to the produced processor array
        Returns:
        array of processors
        See Also:
        OnsetStrength
      • createAverageSpectralFlatnessPipeline

        public static SignalPipeline<AudioBuffer,​Float> createAverageSpectralFlatnessPipeline​(String id,
                                                                                                    int windowSize,
                                                                                                    int hopsize,
                                                                                                    int maxFramesToProcess)
        Average spectral flatness over maxFramesToProcess frames similar to MPEG-7 ASF. The default frequency range is 250Hz - 16kHz (n=-8, bands=24). Window size may be equals to hopsize. MPEG-7 recommends a window length of 30ms (for a 44.1kHz sample rate this means a window size of 1323 samples).
        Parameters:
        id - result id
        windowSize - window size
        hopsize - hopsize
        maxFramesToProcess - max frames to process
        Returns:
        pipeline
        See Also:
        Spectral Flatness on Wikipedia, AudioSpectrumFunctions.createSpectralFlatnessFunction(int, int)
      • createAverageSpectralCentroidPipeline

        public static SignalPipeline<AudioBuffer,​Float> createAverageSpectralCentroidPipeline​(String id,
                                                                                                    int windowSize,
                                                                                                    int hopsize,
                                                                                                    int maxFramesToProcess)
        Average of the spectral centroids computed for individual windows of the given length and hopsize
        Parameters:
        id - result id
        windowSize - window size
        hopsize - hopsize
        maxFramesToProcess - max frames to process
        Returns:
        pipeline
      • createAverageSpectralSpreadPipeline

        public static SignalPipeline<AudioBuffer,​Float> createAverageSpectralSpreadPipeline​(String id,
                                                                                                  int windowSize,
                                                                                                  int hopsize,
                                                                                                  int maxFramesToProcess)
      • createAverageSpectralFluxPipeline

        public static SignalPipeline<AudioBuffer,​Float> createAverageSpectralFluxPipeline​(String id,
                                                                                                int windowSize,
                                                                                                int hopsize,
                                                                                                int maxFramesToProcess)
      • createAverageSpectralVariabilityPipeline

        public static SignalPipeline<AudioBuffer,​Float> createAverageSpectralVariabilityPipeline​(String id,
                                                                                                       int windowSize,
                                                                                                       int hopsize,
                                                                                                       int maxFramesToProcess)
      • createStandardDeviationSpectralVariabilityPipeline

        public static SignalPipeline<AudioBuffer,​Float> createStandardDeviationSpectralVariabilityPipeline​(String id,
                                                                                                                 int windowSize,
                                                                                                                 int hopsize,
                                                                                                                 int maxFramesToProcess)
      • createAverageSpectralRollOffPipeline

        public static SignalPipeline<AudioBuffer,​Float> createAverageSpectralRollOffPipeline​(String id,
                                                                                                   int windowSize,
                                                                                                   int hopsize,
                                                                                                   float threshold,
                                                                                                   int maxFramesToProcess)
        Parameters:
        id - id to collect the result
        windowSize - window size
        hopsize - hopsize
        threshold - threshold in percent, typically 0.85 or 0.95
        maxFramesToProcess - max number of frames to process, the following frames are ignored
        Returns:
        average roll off frequency
        See Also:
        AudioSpectrumFunctions.createRollOffFunction(float)
      • createAverageSpectralBrightnessPipeline

        public static SignalPipeline<AudioBuffer,​Float> createAverageSpectralBrightnessPipeline​(String id,
                                                                                                      int windowSize,
                                                                                                      int hopsize,
                                                                                                      float cutOffFrequency,
                                                                                                      int maxFramesToProcess)
        Parameters:
        id - id to collect the result
        windowSize - window size
        hopsize - hopsize
        cutOffFrequency - cut off frequency in Hz
        maxFramesToProcess - max number of frames to process, the following frames are ignored
        Returns:
        average spectral brightness (brighness values are computed for each frame and then those values are averaged)
        See Also:
        AudioSpectrumFunctions.createBrightnessFunction(float)
      • createAverageRelativeSpectralEntropyPipeline

        public static SignalPipeline<AudioBuffer,​Float> createAverageRelativeSpectralEntropyPipeline​(String id,
                                                                                                           int windowSize,
                                                                                                           int hopsize,
                                                                                                           int maxFramesToProcess)

        Creates a pipeline that converts the signal to mono, applies the given window (recommended is a window length of 65536 and a hopsize of 32768 for audio with a 44.1kHz sample rate), performs a FFT (Hamming window), maps the resulting spectrum into the cents scale, wraps the result into a single octave, smooths it and then calculates the relative entropy for every window. These entropy values are then averaged.

        This is similar (but not identical) to MIRToolBox mirentropy(mirspectrum(x,'Collapsed','Min',40,'Smooth',70,'Frame',1.5,.5)). The main difference lies in the length of the FFT. The MIRToolbox version ensures that the FFT delivers a bandwidth slightly smaller than 1 cent of the min bandwidth (40Hz), which leads to excessively large FFTs (2,097,152 samples for a sample rate of 44.1kHz).

        This version does not ensure a small enough bandwidth, but simply uses the given window length. This (depending on the window length) leads to wrong results in the lower cent bins, but in the end does not affect the overall relative entropy too much. Also the cent spectrum itself is computed differently - this version does not try to interpolate at all.

        Parameters:
        id - id
        windowSize - window size, 65536 recommended for 44.1kHz sample rate (roughly 1.5s)
        hopsize - hopsize, half the window size is recommended
        maxFramesToProcess - max frames to process
        Returns:
        averaged relative entropy
      • createFrameSummarizedSpectralFluctuationPipeline

        public static SignalPipeline<AudioBuffer,​LinearFrequencySpectrum> createFrameSummarizedSpectralFluctuationPipeline​(String id,
                                                                                                                                 int maxFramesToProcess)

        Creates a pipeline that converts the signal to mono, applies a 1024 samples window with 512 hopsize, then applies a Hamming window followed by a FFT. Then: Terhardt outer ear model, grouping into bark bands, masking using the Schroeder et al spreading function, conversion to dB and FFT alongside the bark bands. The resulting spectra for all bands in one window are summed.

        Parameters:
        id - id
        maxFramesToProcess - max audio frames to process
        Returns:
        a LinearFrequencySpectrum that contains the mentioned sums
        See Also:
        OuterEarModel.Terhardt, AuditoryMasking.SCHROEDER_MAGNITUDE_MASK, BandSplit, MultiBand