classify_audio¶
Classify real-time audio from a development board’s or PC’s microphone.
Additional Documentation¶
Usage¶
Usage: yzlite classify_audio [OPTIONS] <model>
Classify keywords/events detected in a microphone's streaming audio
NOTE: This command is experimental. Use at your own risk!
This command runs an audio classification application on either the local PC OR
on an embedded target. The audio classification application loads the given
audio classification ML model (e.g. Keyword Spotting) and streams real-time audio
from the local PC's/embedded target's microphone into the ML model.
System Dataflow:
Microphone -> AudioFeatureGenerator -> ML Model -> Command Recognizer -> Local Terminal
Refer to the yzlite.models.tflite_micro.tflite_micro_speech model for a reference on how to train
an ML model that works the audio classification application.
For more details see:
https://github.com/ReRAM-Labs/yzlite/docs/audio/audio_utilities
----------
Examples
----------
# Classify audio on local PC using tflite_micro_speech model
# Simulate the audio loop latency to be 200ms
# i.e. If the app was running on an embedded target, it would take 200ms per audio loop
# Also enable verbose logs
yzlite classify_audio tflite_micro_speech --latency 200 --verbose
# Classify audio on an embedded target using model: ~/workspace/my_model.tflite
# and the following classifier settings:
# - Set the averaging window to 1200ms (i.e. drop samples older than <now> minus window)
# - Set the minimum sample count to 3 (i.e. must have at last 3 samples before classifying)
# - Set the threshold to 175 (i.e. the average of the inference results within the averaging window must be at least 175 of 255)
# - Set the suppression to 750ms (i.e. Once a keyword is detected, wait 750ms before detecting more keywords)
# i.e. If the app was running on an embedded target, it would take 200ms per audio loop
yzlite classify_audio /home/john/my_model.tflite --device --window 1200ms --count 3 --threshold 175 --suppression 750
# Classify audio and also dump the captured raw audio and spectrograms
yzlite classify_audio tflite_micro_speech --dump-audio --dump-spectrograms
Arguments
* model <model> On of the following: [default: None] [required]
- YZLITE model name
- Path to .tflite file
- Path to model archive file (.yzlite.zip)
NOTE: The model must have been previously trained for keyword spotting
Options
--accelerator -a <name> Name of accelerator to use while executing the audio classification ML model. [default: None]
If omitted, then use the reference kernels
NOTE: It is recommended to NOT use an accelerator if running on the PC since the HW simulator can be slow.
--device -d If provided, then run the keyword spotting model on an embedded device, otherwise use the PC's local microphone.
If this option is provided, then the device must be locally connected
--port <port> Serial COM port of a locally connected embedded device. [default: None]
This is only used with the --device option.
'If omitted, then attempt to automatically determine the serial COM port
--verbose -v Enable verbose console logs
--window_duration -w <duration ms> Controls the smoothing. Drop all inference results that are older than <now> minus window_duration. [default: None]
Longer durations (in milliseconds) will give a higher confidence that the results are correct, but may miss some commands
--count -c <count> The *minimum* number of inference results to average when calculating the detection value. Set to 0 to disable averaging [default: None]
--threshold -t <threshold> Minimum averaged model output threshold for a class to be considered detected, 0-255. Higher values increase precision at the cost of recall [default: None]
--suppression -s <suppression ms> Amount of milliseconds to wait after a keyword is detected before detecting new keywords [default: None]
--latency -l <latency ms> This the amount of time in milliseconds between processing loops [default: None]
--microphone -m <name> For non-embedded, this specifies the name of the PC microphone to use [default: None]
--volume -u <volume gain> Set the volume gain scaler (i.e. amplitude) to apply to the microphone data. If 0 or omitted, no scaler is applied [default: None]
--dump-audio -x Dump the raw microphone and generate a corresponding .wav file
--dump-raw-spectrograms -w Dump the raw (i.e. unquantized) generated spectrograms to .jpg images and .mp4 video
--dump-spectrograms -z Dump the quantized generated spectrograms to .jpg images and .mp4 video
--sensitivity -i FLOAT Sensitivity of the activity indicator LED. Much less than 1.0 has higher sensitivity [default: None]
--app <path> By default, the audio_classifier app is automatically downloaded. [default: None]
This option allows for overriding with a custom built app.
Alternatively, if using the --device option, set this option to "none" to NOT program the audio_classifier app to the device.
In this case, ONLY the .tflite will be programmed and the existing audio_classifier app will be re-used.
--test Run as a unit test
--help Show this message and exit.