ref: 153e43aa447d8167f2d70bcc4ea61b82d5b95129
parent: 7fdca7f01dc1c5974b38c0ad77e0174a2c010577
author: Jean-Marc Valin <jmvalin@amazon.com>
date: Thu Oct 21 06:13:45 EDT 2021
update instructions
--- a/dnn/README.md
+++ b/dnn/README.md
@@ -65,13 +65,13 @@
1. Generate training data:
```
- ./dump_data -train input.s16 features.f32 data.u8
+ ./dump_data -train input.s16 features.f32 data.s16
```
where the first file contains 16 kHz 16-bit raw PCM audio (no header) and the other files are output files. This program makes several passes over the data with different filters to generate a large amount of training data.
1. Now that you have your files, train with:
```
- ./training_tf2/train_lpcnet.py features.f32 data.u8 model_name
+ ./training_tf2/train_lpcnet.py features.f32 data.s16 model_name
```
and it will generate an h5 file for each iteration, with model\_name as prefix. If it stops with a
"Failed to allocate RNN reserve space" message try specifying a smaller --batch-size for train\_lpcnet.py.
@@ -79,14 +79,13 @@
1. You can synthesise speech with Python and your GPU card (very slow):
```
./dump_data -test test_input.s16 test_features.f32
- ./training_tf2/test_lpcnet.py test_features.f32 test.s16
+ ./training_tf2/test_lpcnet.py lpcnet_model_name.h5 test_features.f32 test.s16
```
- Note the .h5 is hard coded in test_lpcnet.py, modify for your .h5 file.
1. Or with C on a CPU (C inference is much faster):
First extract the model files nnet_data.h and nnet_data.c
```
- ./dump_lpcnet.py lpcnet15_384_10_G16_64.h5
+ ./dump_lpcnet.py lpcnet_model_name.h5
```
and move the generated nnet_data.* files to the src/ directory.
Then you just need to rebuild the software and use lpcnet_demo as explained above.
--
⑨