shithub: audio-stretch

--- /dev/null

+++ b/.gitignore

@@ -1,0 +1,4 @@

+output

+audio-stretch

+samples/*.wav

--- a/README

+++ b/README

@@ -1,7 +1,7 @@

 ////////////////////////////////////////////////////////////////////////////

 //                        **** AUDIO-STRETCH ****                         //

 //                      Time Domain Harmonic Scaler                       //

-//                    Copyright (c) 2019 David Bryant                     //

+//                    Copyright (c) 2022 David Bryant                     //

 //                          All Rights Reserved.                          //

 //      Distributed under the BSD Software License (see license.txt)      //

 ////////////////////////////////////////////////////////////////////////////

@@ -38,31 +38,56 @@

 non-standard sampling rates will probably result. Many programs will still

 properly play these files, and audio editing programs will likely import

 them correctly (by resampling), but it is possible that some applications

-will barf on them.

+will barf on them. They could also be resampled using an audio resampling

+tool also available here: https://github.com/dbry/audio-resampler

-For version 0.2 a new option was added to cycle through the full possible

-ratio range in a sinusoidal pattern, starting at 1.0, and either going

-up (-c) or down (-cc) first. In this case any specified ratio is ignored

-(except if the -s option is also specified to scale the sampling rate).

-The total period is fixed at 2π seconds, at which point the output will

-again be exactly aligned with the input.

+There's an option to cycle through the full possible ratio range in a

+sinusoidal pattern, starting at 1.0, and either going up (-c) or down

+(-cc) first. In this case any specified ratio is ignored (except if the

+-s option is also specified to scale the sampling rate). The total period

+is fixed at 2π seconds, at which point the output will again be exactly

+aligned with the input.

-To build the demo app:

+                *** Version 0.4 Enhancements ***

-    $ gcc -O2 *.c -lm -o audio-stretch

+For version 0.4 two useful features were added. First, the ability to

+cascade two instances of the stretcher was added. This is enabled by

+including the flag STRETCH_DUAL_FLAG when initializing the stretcher

+and allows double the stretch ratio of the regular code (i.e., now 0.25X

+to 4.00X). Note that the audio quality degrades some when slowed beyond

+2X, and generally voice becomes unintelligible when sped faster than 2X,

+however these values may still be useful for some applications, and

+specifically the very high speed values are useful for silence gaps

+(see the next feature).

-The "help" display from the demo app:

+The other feature added is the ability to detect silence gaps in the

+audio and apply a different (likely lower) stretch ratio to these areas.

+This is currently not performed in the library itself, but in the demo

+command-line program where it is highly configurable, but it should be

+relatively easy to copy the functionality into another application. If

+I get requests for it, I will consider moving it into the library.

- AUDIO-STRETCH  Time Domain Harmonic Scaling Demo  Version 0.2

- Copyright (c) 2019 David Bryant. All Rights Reserved.

+There is a script to build the demo app on Linux (build.sh), and this also

+allows building the app to test for UB (undefined behavior) and ASAN (bad

+addressing). Also, some artificial test signals (both mono and stereo) and

+a script (test.sh) for running them at various ratios has been added.

+The current "help" display from the demo app:

+ AUDIO-STRETCH  Time Domain Harmonic Scaling Demo  Version 0.4

+ Copyright (c) 2022 David Bryant. All Rights Reserved.

  Usage:     AUDIO-STRETCH [-options] infile.wav outfile.wav

- Options:  -r<n.n> = stretch ratio (0.5 to 2.0, default = 1.0)

+ Options:  -r<n.n> = stretch ratio (0.25 to 4.0, default = 1.0)

+           -g<n.n> = gap/silence stretch ratio (if different)

            -u<n>   = upper freq period limit (default = 333 Hz)

            -l<n>   = lower freq period limit (default = 55 Hz)

+           -b<n>   = audio buffer/window length (ms, default = 25)

+           -t<n>   = gap/silence threshold (dB re FS, default = -40)

            -c      = cycle through all ratios, starting higher

            -cc     = cycle through all ratios, starting lower

+           -d      = force dual instance even for shallow ratios

            -s      = scale rate to preserve duration (not pitch)

            -f      = fast pitch detection (default >= 32 kHz)

            -n      = normal pitch detection (default < 32 kHz)

--- /dev/null

+++ b/build.sh

@@ -1,0 +1,18 @@

+#!/bin/bash

+if [ -z "$1" ] || [ "$1" = "rel" ]; then

+  echo "building release .."

+  gcc -Ofast main.c stretch.c -lm -o audio-stretch

+elif [ "$1" = "dbg" ]; then

+  echo "building debug .."

+  gcc -O0 -g main.c stretch.c -lm -o audio-stretch

+elif [ "$1" = "ubsan" ]; then

+  echo "building debug with undefined behaviour sanitizer .."

+  gcc -O0 -g main.c stretch.c -fsanitize=undefined -lm -o audio-stretch

+elif [ "$1" = "asan" ]; then

+  echo "building debug with address sanitizer .."

+  gcc -O0 -g main.c stretch.c -fsanitize=address -lm -o audio-stretch

+else

+  echo "error: unknown option '$1'"

+fi

--- a/stretch.c

+++ b/stretch.c

@@ -19,8 +19,6 @@

 // and should contain approximately similar content.

 // For independent channels, prefer using multiple StretchHandle-instances.

 // see https://github.com/dbry/audio-stretch/issues/6

-// Multiple instances, of course, will consume more CPU load.

-// In addition, different output amounts need to be handled.

 #include <stdio.h>

@@ -63,7 +61,12 @@

  * are specified here. The longest period determines the lowest fundamental frequency

  * that can be handled correctly. Note that higher frequencies can be handled than the

  * shortest period would suggest because multiple periods can be combined, and the

- * worst-case performance will suffer if too short a period is selected.

+ * worst-case performance will suffer if too short a period is selected. The flags are:

+ *

+ * STRETCH_FAST_FLAG    0x1     Use the "fast" version of the period calculation

+ *

+ * STRETCH_DUAL_FLAG    0x2     Cascade two instances of the stretcher to expand

+ *                              available ratios to 0.25X to 4.00X

*/

 StretchHandle stretch_init (int shortest_period, int longest_period, int num_channels, int flags)

@@ -131,8 +134,8 @@

/*

  * Determine how many samples (per channel) should be reserved in 'output'-array

- * for stretch_samples() and stretch_flush(). max_num_samples is the maximum for

- * 'num_samples' when calling stretch_samples().

+ * for stretch_samples() and stretch_flush(). max_num_samples and max_ratio are the

+ * maximum values that will be passed to stretch_samples().

*/

 int stretch_output_capacity (StretchHandle handle, int max_num_samples, float max_ratio)

@@ -165,12 +168,19 @@

/*

- * Process the specified samples with the given ratio (which is clipped to the

- * range 0.5 to 2.0). Note that the number of samples refers to total samples for

- * both channels in stereo and can be as large as desired (samples are buffered

- * here). The exact number of samples output is not possible to determine in

- * advance, but the maximum will be the number of input samples times the ratio

- * plus 3X the longest period (or 4X the longest period in "fast" mode).

+ * Process the specified samples with the given ratio (which is normally clipped to

+ * the range 0.5 to 2.0, or 0.25 to 4.00 for the "dual" mode). Note that in stereo

+ * the number of samples refers to the samples for one channel (i.e., not the total

+ * number of values passed) and can be as large as desired (samples are buffered here).

+ * The ratio may change between calls, but there is some latency to consider because

+ * audio is buffered here and a new ratio may be applied to previously sent samples.

+ *

+ * The exact number of samples output is not easy to determine in advance, so a function

+ * is provided (stretch_output_capacity()) that calculates the maximum number of samples

+ * that can be generated from a single call to this function (or stretch_flush()) given

+ * a number of samples and maximum ratio. It is reccomended that that function be used

+ * after initialization to allocate in advance the buffer size required. Be sure to

+ * multiply the return value by the number channels!

*/

 int stretch_samples (StretchHandle handle, const int16_t *samples, int num_samples, int16_t *output, float ratio)

@@ -234,8 +244,6 @@

             else

                 period = cnxt->longest;

-            // printf ("%d\n", period / cnxt->num_chans);

/*

              * Once we have calculated the best-match period, there are 4 possible transformations

              * available to convert the input samples to output samples. Obviously we can simply

@@ -265,7 +273,7 @@

                 if (ratio != 1.0)

                     cnxt->outsamples_error += (period * 2.0) - (period * 2.0 * ratio);

                 else

-                    cnxt->outsamples_error = 0;

+                    cnxt->outsamples_error = 0; /* if the ratio is 1.0, we can never cancel the error, so just do it now */

                 out_samples += period * 2;

                 cnxt->tail += period * 2;

@@ -345,7 +353,9 @@

/*

  * Flush any leftover samples out at normal speed. For cascaded dual instances this must be called

- * twice to completely flush, or simply call it until it returns zero samples

+ * twice to completely flush, or simply call it until it returns zero samples. The maximum number

+ * of samples that can be returned from each call of this function can be determined in advance with

+ * stretch_output_capacity().

*/

 int stretch_flush (StretchHandle handle, int16_t *output)

@@ -367,6 +377,8 @@

     cnxt->tail = cnxt->head;

+    memset (cnxt->inbuff, 0, cnxt->tail * sizeof (*cnxt->inbuff));

     return samples_flushed;

--- a/stretch.h

+++ b/stretch.h

@@ -1,7 +1,7 @@

 ////////////////////////////////////////////////////////////////////////////

 //                        **** AUDIO-STRETCH ****                         //

 //                      Time Domain Harmonic Scaler                       //

-//                    Copyright (c) 2019 David Bryant                     //

+//                    Copyright (c) 2022 David Bryant                     //

 //                          All Rights Reserved.                          //

 //      Distributed under the BSD Software License (see license.txt)      //

 ////////////////////////////////////////////////////////////////////////////

@@ -19,8 +19,6 @@

 // and should contain approximately similar content.

 // For independent channels, prefer using multiple StretchHandle-instances.

 // see https://github.com/dbry/audio-stretch/issues/6

-// Multiple instances, of course, will consume more CPU load.

-// In addition, different output amounts need to be handled.

 #ifndef STRETCH_H

 #define STRETCH_H

@@ -27,8 +25,8 @@

 #include <stdint.h>

-#define STRETCH_FAST_FLAG    0x1

-#define STRETCH_DUAL_FLAG    0x2

+#define STRETCH_FAST_FLAG    0x1    // use "fast" version of period determination code

+#define STRETCH_DUAL_FLAG    0x2    // cascade two instances (doubles usable ratio range)

 #ifdef __cplusplus

 extern "C" {

--- /dev/null

+++ b/test.sh

@@ -1,0 +1,93 @@

+#!/bin/bash

+if [ ! -d output ]; then

+  echo "creating directory output"

+  mkdir output

+fi

+if [ ! -f samples/mono.wav ] || [ ! -f samples/stereo.wav ]; then

+  WVUNPACK=$(which wvunpack)

+  if [ -z "$WVUNPACK" ]; then

+    echo "please build/install WavPack with wvunpack to convert .wv samples to .wav"

+    exit 1

+  fi

+  $WVUNPACK samples/mono.wv

+  $WVUNPACK samples/stereo.wv

+fi

+STARTER=""

+if [ "$1" = "gdb" ]; then

+  STARTER="gdb -q -ex run -ex quit --args"

+  shift

+fi

+EXAMPLE="mono"

+if [ "$1" = "mono" ]; then

+  EXAMPLE="$1"

+  shift

+fi

+if [ "$1" = "stereo" ]; then

+  EXAMPLE="$1"

+  shift

+fi

+if [ -z "$1" ] && [ -z "$2" ]; then

+  echo "usage: $0 [mono|stereo] [f|n] [s|x]"

+  echo "  'f': fast pitch detection"

+  echo "  'n': normal pitch detection"

+  echo "  's': simple range for ratio: 0.5 .. 2.0"

+  echo "  'x': extended range for ratio: 0.25 .. 4.0"

+  echo ""

+fi

+if [ -z "$1" ] || [ "$1" = "f" ]; then

+  echo "testing with fast pitch detection"

+  FO="-f"

+  FN="f"

+else

+  echo "testing with normal pitch detection"

+  FO="-n"

+  FN="n"

+fi

+if [ -z "$2" ] || [ "$2" = "s" ]; then

+  echo ""

+  echo "testing normal range 0.5 .. 2.0"

+  echo "x2.0"

+  $STARTER ./audio-stretch -q -y $FO -r0.5   samples/${EXAMPLE}.wav output/out_${EXAMPLE}_${FN}_r050_x200.wav

+  echo "x1.75"

+  $STARTER ./audio-stretch -q -y $FO -r0.571 samples/${EXAMPLE}.wav output/out_${EXAMPLE}_${FN}_r057_x175.wav

+  echo "x1.5"

+  $STARTER ./audio-stretch -q -y $FO -r0.666 samples/${EXAMPLE}.wav output/out_${EXAMPLE}_${FN}_r066_x150.wav

+  echo "x1.25"

+  $STARTER ./audio-stretch -q -y $FO -r0.8   samples/${EXAMPLE}.wav output/out_${EXAMPLE}_${FN}_r080_x125.wav

+  echo "x1.0"

+  $STARTER ./audio-stretch -q -y $FO -r1.0   samples/${EXAMPLE}.wav output/out_${EXAMPLE}_${FN}_r100_x100.wav

+  echo "x0.75"

+  $STARTER ./audio-stretch -q -y $FO -r1.333 samples/${EXAMPLE}.wav output/out_${EXAMPLE}_${FN}_r133_x075.wav

+  echo "x0.5"

+  $STARTER ./audio-stretch -q -y $FO -r2.0   samples/${EXAMPLE}.wav output/out_${EXAMPLE}_${FN}_r200_x050.wav

+fi

+if [ -z "$2" ] || [ "$2" = "x" ]; then

+  echo ""

+  echo "testing extended range 0.25 .. 0.5 and 2.0 .. 4.0"

+  echo "x4.0"

+  $STARTER ./audio-stretch -q -y $FO -r0.25  samples/${EXAMPLE}.wav output/out_${EXAMPLE}_${FN}_r025_x400.wav

+  echo "x3.5"

+  $STARTER ./audio-stretch -q -y $FO -r0.285 samples/${EXAMPLE}.wav output/out_${EXAMPLE}_${FN}_r028_x350.wav

+  echo "x3.0"

+  $STARTER ./audio-stretch -q -y $FO -r0.333 samples/${EXAMPLE}.wav output/out_${EXAMPLE}_${FN}_r033_x300.wav

+  echo "x2.5"

+  $STARTER ./audio-stretch -q -y $FO -r0.4   samples/${EXAMPLE}.wav output/out_${EXAMPLE}_${FN}_r040_x250.wav

+  echo "x0.4"

+  $STARTER ./audio-stretch -q -y $FO -r2.5   samples/${EXAMPLE}.wav output/out_${EXAMPLE}_${FN}_r250_x040.wav

+  echo "x0.333"

+  $STARTER ./audio-stretch -q -y $FO -r3.0   samples/${EXAMPLE}.wav output/out_${EXAMPLE}_${FN}_r300_x033.wav

+  echo "x0.285"

+  $STARTER ./audio-stretch -q -y $FO -r3.5   samples/${EXAMPLE}.wav output/out_${EXAMPLE}_${FN}_r350_x028.wav

+  echo "x0.25"

+  $STARTER ./audio-stretch -q -y $FO -r4.0   samples/${EXAMPLE}.wav output/out_${EXAMPLE}_${FN}_r400_x025.wav

+fi

--

⑨