Does Okular have a speech engine inside...?

brontosaurusrex · 2024-05-31 18:50:08

Actually piper can 'stream', so that the playback is in place even before synth is over (which makes my book readin script actually useful!).

This is working (aplay)

cat "$file" | piper -m "${voicespath}/${voices[$rand]}" --output-raw  2>/dev/null | aplay -i -r 22050 -f S16_LE -t raw - 2>/dev/null || exit

And this as well (ffmpeg/mpv)

cat "$file" | piper -m "${voicespath}/${voices[$rand]}" --output-raw  2>/dev/null | ffmpeg -f s16le -ar 22050 -ac 1 -i - -f wav - 2>/dev/null | mpv --no-resume-playback --msg-level=all=no --no-video - 2>/dev/null

And mpv only version

cat "$file" | piper -m "${voicespath}/${voices[$rand]}" --output-raw  2>/dev/null | mpv --demuxer=rawaudio --demuxer-rawaudio-format=s16le --demuxer-rawaudio-rate=22050 --audio-samplerate=22050 --demuxer-rawaudio-channels=1 --no-resume-playback --msg-level=all=no --no-video --cache=no -

Mpv is interesting since it will catch space for pause or q for quit. Note: Some voices marked as 'low' will have a lower sample rate, making all this sound like mickey mouse.

johnraff · 2024-06-03 08:09:31

^Yes, with a longish piece of text, the raw output started talking much sooner. Thanks! Where did you get aplay's sampling rate of 22050 from btw?

I found Alan's delivery a bit slow, so your suggestion of piper's --length_scale helped, turning it down to 0.85.

brontosaurusrex · 2024-06-03 08:26:32

johnraff wrote:

Where did you get aplay's sampling rate of 22050 from btw?

It is in included onnx.json, also # of speakers is defined there (Also aplay example is on the piper git page). Voices marked as 'low' may have a different sampling rate.

Latest/greatest cli is now (just hardcoding 22050 sample rate)

piper -s "$speaker" -m "${voicespath}/${voices[$rand]}" --output-raw < "$file" 2>/dev/null | mpv --demuxer=rawaudio --demuxer-rawaudio-format=s16le --demuxer-rawaudio-rate=22050 --audio-samplerate=22050 --demuxer-rawaudio-channels=1 --no-resume-playback --no-video --no-input-default-bindings --input-conf="$storeroot/mpv_keybindings.conf" --msg-level=all=no - || exit 1

Constructing mpv command line was a slow and painful process btw, and even now there is a chance to get the playback chain in weird state (when pausing mpv), when it can't continue and can't abort clearly as well (Possibly related to running al this in wsl2). Workaround is to use ctrl+z for pause.

Anyway my lil bookreader is in kinda useful state now.

johnraff · 2024-06-04 04:13:49

Our use cases are diverging a bit, and your focus on external control during the readout is important for reading a book or so, but for feedback from scripts I'm more interested in fine-tuning intelligibility and impact.

When my backup shutdown script says "Would you like to backup to internal hard disk?" with the alan voice, it sounds as if it's offering something quite salacious like "Would you like to try the special service?"

Hilarious, but I might also try the other voices you mentioned, while playing with playback speed and the other piper options.

Last edited by johnraff (2024-06-04 07:52:32)

brontosaurusrex · 2024-06-04 07:27:37

Maybe both hfc voices (female/male) are the most neutral/'hi-fi' I heard so far.

johnraff · 2024-10-23 06:25:54

Follow-up: I'm basically happy enough with Piper's "Alan" voice and '--length_scale 0.85' not to search for alternatives, but I have to wait too long the first time piper is called - several seconds. The delay goes right down the second or third time, even with a quite different string to say, so I'm wondering if you might know any way to preload piper in memory, and keep it there?

I'll try a "startup" message but I expect piper will be pushed out of RAM when newer tasks arrive...

brontosaurusrex · 2024-10-23 08:29:49

No idea at the moment, but will post solution if one presents itself ^^.
p.s. Are strings known in advance? Or how far in advance are they known? (If so, you can obviously just bake to wavs).

DeepDayze · 2024-10-23 16:48:04

This all sounds like fun to play with and perhaps with time you can use something like espeak and friends to read books in text.

brontosaurusrex · 2024-10-23 17:10:16

@DeepDayze, Actually piper is good enough for that and I 'read' plenty of books that way.

DeepDayze · 2024-10-23 18:20:19

brontosaurusrex wrote:

@DeepDayze, Actually piper is good enough for that and I 'read' plenty of books that way.

That sounds good literally.

johnraff · 2024-10-24 02:48:57

brontosaurusrex wrote:

Are strings known in advance? Or how far in advance are they known? (If so, you can obviously just bake to wavs).

Hey! Great idea - many of them are indeed fixed strings, like the one that annoys me the most "Would you like to backup to hard disk?" that comes just before shutdown, usually several seconds after I've already decided what to do...

And a little wav-baker script might be good in cases where a variable is known before the notification is needed.

Thanks!

brontosaurusrex · 2024-10-24 04:11:45

So $hash would be md5sum (or whatever is fashionable this days) of voice+string, and if $hash.wav doesn't exists, piper play+generate one, else just play $hash.wav. And a function to keep wav dir at a reasonable size maybe. Or how would you construct such script?

johnraff · 2024-10-24 07:01:14

^depending on how complicated it turns out to be, I might just incorporate it in my "say" script, which calls whatever synthesiser I'm using (currently piper) to say the string it's passed, plus some bash queueing for multiple calls.

Checking via $hash would add some flexibility yes. I was just thinking of an associative array holding a fixed list of strings I have cached + path to wav file, and carrying on to the synthesiser if the called string is not there. I would add strings to the array as I ran into the need for them. But haven't touched this at all yet.

johnraff · 2024-10-24 07:56:17

johnraff wrote:

I was just thinking of an associative array holding a fixed list of strings I have cached + path to wav file...

Of course that array would have to be stored somewhere which would be possible but maybe unwieldy. I'll try your hash idea.

johnraff · 2024-10-24 08:29:26

Rough sketch.
Run 'say --record "some string"' to put a hash-labeled wav file in the cache.
Next time you run 'say "some string"' if it finds a wav file named with the hash it gets from the string it will use that instead of piper.
I don't want to automatically cache new strings in case they were dynamically generated.

Main snag I can see so far is that admin has no easy way of telling which strings have already been cached.

To make the hash I found crc32 was already on my system (came with libarchive-zip-perl) and makes nice short 8-character hashes.

Provisional code, with old comments etc etc:

#!/bin/bash
#say (customized espeak, or other synthesiser)

pdir="$HOME/Downloads/executables/piper/"
wavdir="$HOME/.cache/piper/wavs"
voice=en_GB-alan-medium.onnx

mkdir -p "$wavdir"

[[ $1 = '--record' ]] && {
    shift
    hash=$( crc32 <(printf '%s' "$*") )
    "$pdir"/piper <<<"$*" --length_scale 0.85 --quiet --model "$pdir"/voices/"$voice" --output_file "$wavdir/$hash.wav" >/dev/null
    exit
}

# https://blog.skbali.com/2019/03/queue-up-multiple-instances-of-a-shell-script/
# https://stackoverflow.com/a/17030546
# https://jdimpson.livejournal.com/5685.html
#mkdir -p "$HOME/tmp/say"
lock="/tmp/say.lock"
exec {file_desc}>$lock
flock --timeout 60 "$file_desc" || exit 1

#hash espeak-ng || {
#    echo "$0: needs espeak" >&2
#    exit 1
#}
#hash flite || {
#    echo "$0: needs flite" >&2
#    exit 1
#}

# also try festival some day?
# echo “Hi, Welcome to Circuit Digest Tutorial” | festival --tts

# also
# ln -s /dev/stdout ~/.cache/pico2wave/pico.wav
# pico2wave --wave=/home/john/.cache/pico2wave/pico.wav "everything has Transpired according to my design." | aplay


if [[ -n $1 ]]
then
    hash=$( crc32 <(printf '%s' "$*") )
    if [[ -r "$wavdir/$hash.wav" ]]
    then
        aplay --quiet "$wavdir/$hash.wav"
        exit
    fi
fi

[[ -x "$pdir"/piper ]] || { echo "${0}: needs a piper executable." >&2 ; exit 1;}

if [[ -z $1 ]]
then
#    "$pdir"/piper --quiet --model "$pdir"/voices/"$voice" --output_file - | aplay > /dev/null 2>&1
    "$pdir"/piper --length_scale 0.85 --quiet --model "$pdir"/voices/"$voice" --output-raw  2>/dev/null | aplay -r 22050 -f S16_LE -t raw - 2>/dev/null
else
#    "$pdir"/piper <<<"$*" --quiet --model "$pdir"/voices/"$voice" --output_file - | aplay > /dev/null 2>&1
    "$pdir"/piper <<<"$*" --length_scale 0.85 --quiet --model "$pdir"/voices/"$voice" --output-raw  2>/dev/null | aplay -r 22050 -f S16_LE -t raw - 2>/dev/null
fi
#if [[ -z $1 ]]
#then
#    espeak-ng -k10 -s150 # --stdout | aplay > /dev/null 2>&1
#else
#    espeak-ng -k10 -s150 "$*" # --stdout | aplay > /dev/null 2>&1 # "( )$*" is a hack to attempt to workaround truncated start of word
#fi
# truncation hack not needed if snd_hda_intel power saving is turned off: https://major.io/p/stop-audio-pops-on-intel-hd-audio/

#if [[ -z $1 ]]
#then
#    pico2wave --wave=/home/john/.cache/pico2wave/pico.wav | aplay > /dev/null 2>&1
#else
#    pico2wave --wave=/home/john/.cache/pico2wave/pico.wav "$1" | aplay > /dev/null 2>&1
#fi

#if [[ -z $1 ]]
#then
#    flite -voice slt
#else
#    flite -voice slt -t "$1"
#fi

exit

brontosaurusrex · 2024-10-24 11:38:15

Main snag I can see so far is that admin has no easy way of telling which strings have already been cached.

I'd probably try something like generating $hash.txt next to $hash.wav with a data inside:

hash   First 5 or 10 words of the string...

then you can cat/grep them when needed. But that adds extra complexity.

Another way to keep wavdir in check would be to see if files were used in last 14 days or similar (not exactly sure how to write that) and delete the unused ones.

Also you can use lossy audio:

chatGPT:

For a 45 kbps Opus file to reach 1 GB in size on disk, the file would need to be approximately 49 hours, 22 minutes, and 58 seconds long.

opus at 45kbps should be good enough for mono/voice. (opusenc/opusdec should be in repos).

p.s. this one is 40.5 kbps https://brontosaurusrex.github.io/audio/neumann.opus

brontosaurusrex · 2024-10-24 14:19:49

p.s. Cloud ai says this would work for deleting files older than 14 days, completely UNTESTED by me:

#!/bin/bash

# Specify the directory to clean up (default to current directory)
TARGET_DIR="${1:-.}"

# First show what will be deleted
echo "The following files have not been accessed for more than 14 days:"
find "$TARGET_DIR" -type f -atime +14 -print

# Ask for confirmation
read -p "Do you want to delete these files? (y/N): " confirm

if [ "$confirm" = "y" ] || [ "$confirm" = "Y" ]; then
    # Delete the files
    find "$TARGET_DIR" -type f -atime +14 -delete
    echo "Files deleted successfully."
else
    echo "Operation cancelled."
fi

p.s. Access time may be unreliable and it depends on the mount options.

mount | grep ' / '

To avoid/overcome that use touch $hash.wav before/after play $hash.wav, since touch by default should update mtime and atime as well.

johnraff · 2024-10-25 05:04:02

brontosaurusrex wrote:

Main snag I can see so far is that admin has no easy way of telling which strings have already been cached.
I'd probably try something like generating $hash.txt next to $hash.wav with a data inside:
hash   First 5 or 10 words of the string...

Not a bad idea. Either individual $hash.txt files or one hashes.txt with all of them. Separate files are easier to edit or remove, one list maybe easier to grep or read manually.

Another way to keep wavdir in check...

In my own use case - short strings which are added manually - dir size is not likely to be a problem.

But for you, sure...

I might think about some way of auto-adding a hashed wav every time 'say' is called, but that would need a way to exclude dynamically generated strings. A popup "add this string?" every time a new one arrived would likely get very annoying. Or else add them all and use your method of weeding out strings that haven't been used in the last month or whatever.

For now it probably suits my purpose well enough to just add selected strings manually with '--record'.

In fact it's only in a few cases that the delay before speech is output is annoying.

#21 2024-05-31 18:50:08

Re: Does Okular have a speech engine inside...?

#22 2024-06-03 08:09:31

Re: Does Okular have a speech engine inside...?

#23 2024-06-03 08:26:32

Re: Does Okular have a speech engine inside...?

#24 2024-06-04 04:13:49

Re: Does Okular have a speech engine inside...?

#25 2024-06-04 07:27:37

Re: Does Okular have a speech engine inside...?

#26 2024-10-23 06:25:54

Re: Does Okular have a speech engine inside...?

#27 2024-10-23 08:29:49

Re: Does Okular have a speech engine inside...?

#28 2024-10-23 16:48:04

Re: Does Okular have a speech engine inside...?

#29 2024-10-23 17:10:16

Re: Does Okular have a speech engine inside...?

#30 2024-10-23 18:20:19

Re: Does Okular have a speech engine inside...?

#31 2024-10-24 02:48:57

Re: Does Okular have a speech engine inside...?

#32 2024-10-24 04:11:45

Re: Does Okular have a speech engine inside...?

#33 2024-10-24 07:01:14

Re: Does Okular have a speech engine inside...?

#34 2024-10-24 07:56:17

Re: Does Okular have a speech engine inside...?

#35 2024-10-24 08:29:26

Re: Does Okular have a speech engine inside...?

#36 2024-10-24 11:38:15

Re: Does Okular have a speech engine inside...?

#37 2024-10-24 14:19:49

Re: Does Okular have a speech engine inside...?

#38 2024-10-25 05:04:02

Re: Does Okular have a speech engine inside...?

Board footer