You are not logged in.

#21 2024-05-31 18:50:08

brontosaurusrex
Middle Office
Registered: 2015-09-29
Posts: 2,737

Re: Does Okular have a speech engine inside...?

Actually piper can 'stream', so that the playback is in place even before synth is over (which makes my book readin script actually useful!).

This is working (aplay)

cat "$file" | piper -m "${voicespath}/${voices[$rand]}" --output-raw  2>/dev/null | aplay -i -r 22050 -f S16_LE -t raw - 2>/dev/null || exit

And this as well (ffmpeg/mpv)

cat "$file" | piper -m "${voicespath}/${voices[$rand]}" --output-raw  2>/dev/null | ffmpeg -f s16le -ar 22050 -ac 1 -i - -f wav - 2>/dev/null | mpv --no-resume-playback --msg-level=all=no --no-video - 2>/dev/null

And mpv only version

cat "$file" | piper -m "${voicespath}/${voices[$rand]}" --output-raw  2>/dev/null | mpv --demuxer=rawaudio --demuxer-rawaudio-format=s16le --demuxer-rawaudio-rate=22050 --audio-samplerate=22050 --demuxer-rawaudio-channels=1 --no-resume-playback --msg-level=all=no --no-video --cache=no -

Mpv is interesting since it will catch space for pause or q for quit. Note: Some voices marked as 'low' will have a lower sample rate, making all this sound like mickey mouse.

Offline

#22 2024-06-03 08:09:31

johnraff
nullglob
From: Nagoya, Japan
Registered: 2015-09-09
Posts: 12,550
Website

Re: Does Okular have a speech engine inside...?

^Yes, with a longish piece of text, the raw output started talking much sooner. Thanks! Where did you get aplay's sampling rate of 22050 from btw?

I found Alan's delivery a bit slow, so your suggestion of piper's --length_scale helped, turning it down to 0.85.


...elevator in the Brain Hotel, broken down but just as well...
( a boring Japan blog (currently paused), now on Bluesky, there's also some GitStuff )

Introduction to the Bunsenlabs Boron Desktop

Online

#23 2024-06-03 08:26:32

brontosaurusrex
Middle Office
Registered: 2015-09-29
Posts: 2,737

Re: Does Okular have a speech engine inside...?

johnraff wrote:

Where did you get aplay's sampling rate of 22050 from btw?

It is in included onnx.json, also # of speakers is defined there (Also aplay example is on the piper git page). Voices marked as 'low' may have a different sampling rate.

Latest/greatest cli is now (just hardcoding 22050 sample rate)

piper -s "$speaker" -m "${voicespath}/${voices[$rand]}" --output-raw < "$file" 2>/dev/null | mpv --demuxer=rawaudio --demuxer-rawaudio-format=s16le --demuxer-rawaudio-rate=22050 --audio-samplerate=22050 --demuxer-rawaudio-channels=1 --no-resume-playback --no-video --no-input-default-bindings --input-conf="$storeroot/mpv_keybindings.conf" --msg-level=all=no - || exit 1

Constructing mpv command line was a slow and painful process btw, and even now there is a chance to get the playback chain in weird state (when pausing mpv), when it can't continue and can't abort clearly as well (Possibly related to running al this in wsl2). Workaround is to use ctrl+z for pause.

Anyway my lil bookreader is in kinda useful state now.

Offline

#24 2024-06-04 04:13:49

johnraff
nullglob
From: Nagoya, Japan
Registered: 2015-09-09
Posts: 12,550
Website

Re: Does Okular have a speech engine inside...?

Our use cases are diverging a bit, and your focus on external control during the readout is important for reading a book or so, but for feedback from scripts I'm more interested in fine-tuning intelligibility and impact.

When my backup shutdown script says "Would you like to backup to internal hard disk?" with the alan voice, it sounds as if it's offering something quite salacious like "Would you like to try the special service?"

Hilarious, but I might also try the other voices you mentioned, while playing with playback speed and the other piper options.

Last edited by johnraff (2024-06-04 07:52:32)


...elevator in the Brain Hotel, broken down but just as well...
( a boring Japan blog (currently paused), now on Bluesky, there's also some GitStuff )

Introduction to the Bunsenlabs Boron Desktop

Online

#25 2024-06-04 07:27:37

brontosaurusrex
Middle Office
Registered: 2015-09-29
Posts: 2,737

Re: Does Okular have a speech engine inside...?

Maybe both hfc voices (female/male) are the most neutral/'hi-fi' I heard so far.

Offline

#26 2024-10-23 06:25:54

johnraff
nullglob
From: Nagoya, Japan
Registered: 2015-09-09
Posts: 12,550
Website

Re: Does Okular have a speech engine inside...?

Follow-up: I'm basically happy enough with Piper's "Alan" voice and '--length_scale 0.85' not to search for alternatives, but I have to wait too long the first time piper is called - several seconds. The delay goes right down the second or third time, even with a quite different string to say, so I'm wondering if you might know any way to preload piper in memory, and keep it there?

I'll try a "startup" message but I expect piper will be pushed out of RAM when newer tasks arrive...


...elevator in the Brain Hotel, broken down but just as well...
( a boring Japan blog (currently paused), now on Bluesky, there's also some GitStuff )

Introduction to the Bunsenlabs Boron Desktop

Online

#27 2024-10-23 08:29:49

brontosaurusrex
Middle Office
Registered: 2015-09-29
Posts: 2,737

Re: Does Okular have a speech engine inside...?

No idea at the moment, but will post solution if one presents itself ^^.
p.s. Are strings known in advance? Or how far in advance are they known? (If so, you can obviously just bake to wavs).

Offline

#28 2024-10-23 16:48:04

DeepDayze
Like sands through an hourglass...
From: In Linux Land
Registered: 2017-05-28
Posts: 1,897

Re: Does Okular have a speech engine inside...?

This all sounds like fun to play with and perhaps with time you can use something like espeak and friends to read books in text.


Real Men Use Linux

Offline

#29 2024-10-23 17:10:16

brontosaurusrex
Middle Office
Registered: 2015-09-29
Posts: 2,737

Re: Does Okular have a speech engine inside...?

@DeepDayze, Actually piper is good enough for that and I 'read' plenty of books that way.

Offline

#30 2024-10-23 18:20:19

DeepDayze
Like sands through an hourglass...
From: In Linux Land
Registered: 2017-05-28
Posts: 1,897

Re: Does Okular have a speech engine inside...?

brontosaurusrex wrote:

@DeepDayze, Actually piper is good enough for that and I 'read' plenty of books that way.

That sounds good literally.


Real Men Use Linux

Offline

#31 2024-10-24 02:48:57

johnraff
nullglob
From: Nagoya, Japan
Registered: 2015-09-09
Posts: 12,550
Website

Re: Does Okular have a speech engine inside...?

brontosaurusrex wrote:

Are strings known in advance? Or how far in advance are they known? (If so, you can obviously just bake to wavs).

Hey! Great idea - many of them are indeed fixed strings, like the one that annoys me the most "Would you like to backup to hard disk?" that comes just before shutdown, usually several seconds after I've already decided what to do...

And a little wav-baker script might be good in cases where a variable is known before the notification is needed.

Thanks!


...elevator in the Brain Hotel, broken down but just as well...
( a boring Japan blog (currently paused), now on Bluesky, there's also some GitStuff )

Introduction to the Bunsenlabs Boron Desktop

Online

#32 2024-10-24 04:11:45

brontosaurusrex
Middle Office
Registered: 2015-09-29
Posts: 2,737

Re: Does Okular have a speech engine inside...?

So $hash would be md5sum (or whatever is fashionable this days) of voice+string, and if $hash.wav doesn't exists, piper play+generate one, else just play $hash.wav. And a function to keep wav dir at a reasonable size maybe. Or how would you construct such script?

Offline

#33 2024-10-24 07:01:14

johnraff
nullglob
From: Nagoya, Japan
Registered: 2015-09-09
Posts: 12,550
Website

Re: Does Okular have a speech engine inside...?

^depending on how complicated it turns out to be, I might just incorporate it in my "say" script, which calls whatever synthesiser I'm using (currently piper) to say the string it's passed, plus some bash queueing for multiple calls.

Checking via $hash would add some flexibility yes. I was just thinking of an associative array holding a fixed list of strings I have cached + path to wav file, and carrying on to the synthesiser if the called string is not there. I would add strings to the array as I ran into the need for them. But haven't touched this at all yet.


...elevator in the Brain Hotel, broken down but just as well...
( a boring Japan blog (currently paused), now on Bluesky, there's also some GitStuff )

Introduction to the Bunsenlabs Boron Desktop

Online

#34 2024-10-24 07:56:17

johnraff
nullglob
From: Nagoya, Japan
Registered: 2015-09-09
Posts: 12,550
Website

Re: Does Okular have a speech engine inside...?

johnraff wrote:

I was just thinking of an associative array holding a fixed list of strings I have cached + path to wav file...

Of course that array would have to be stored somewhere which would be possible but maybe unwieldy. I'll try your hash idea.


...elevator in the Brain Hotel, broken down but just as well...
( a boring Japan blog (currently paused), now on Bluesky, there's also some GitStuff )

Introduction to the Bunsenlabs Boron Desktop

Online

#35 2024-10-24 08:29:26

johnraff
nullglob
From: Nagoya, Japan
Registered: 2015-09-09
Posts: 12,550
Website

Re: Does Okular have a speech engine inside...?

Rough sketch.
Run 'say --record "some string"' to put a hash-labeled wav file in the cache.
Next time you run 'say "some string"' if it finds a wav file named with the hash it gets from the string it will use that instead of piper.
I don't want to automatically cache new strings in case they were dynamically generated.

Main snag I can see so far is that admin has no easy way of telling which strings have already been cached.

To make the hash I found crc32 was already on my system (came with libarchive-zip-perl) and makes nice short 8-character hashes.

Provisional code, with old comments etc etc:

#!/bin/bash
#say (customized espeak, or other synthesiser)

pdir="$HOME/Downloads/executables/piper/"
wavdir="$HOME/.cache/piper/wavs"
voice=en_GB-alan-medium.onnx

mkdir -p "$wavdir"

[[ $1 = '--record' ]] && {
    shift
    hash=$( crc32 <(printf '%s' "$*") )
    "$pdir"/piper <<<"$*" --length_scale 0.85 --quiet --model "$pdir"/voices/"$voice" --output_file "$wavdir/$hash.wav" >/dev/null
    exit
}

# https://blog.skbali.com/2019/03/queue-up-multiple-instances-of-a-shell-script/
# https://stackoverflow.com/a/17030546
# https://jdimpson.livejournal.com/5685.html
#mkdir -p "$HOME/tmp/say"
lock="/tmp/say.lock"
exec {file_desc}>$lock
flock --timeout 60 "$file_desc" || exit 1

#hash espeak-ng || {
#    echo "$0: needs espeak" >&2
#    exit 1
#}
#hash flite || {
#    echo "$0: needs flite" >&2
#    exit 1
#}

# also try festival some day?
# echo “Hi, Welcome to Circuit Digest Tutorial” | festival --tts

# also
# ln -s /dev/stdout ~/.cache/pico2wave/pico.wav
# pico2wave --wave=/home/john/.cache/pico2wave/pico.wav "everything has Transpired according to my design." | aplay


if [[ -n $1 ]]
then
    hash=$( crc32 <(printf '%s' "$*") )
    if [[ -r "$wavdir/$hash.wav" ]]
    then
        aplay --quiet "$wavdir/$hash.wav"
        exit
    fi
fi

[[ -x "$pdir"/piper ]] || { echo "${0}: needs a piper executable." >&2 ; exit 1;}

if [[ -z $1 ]]
then
#    "$pdir"/piper --quiet --model "$pdir"/voices/"$voice" --output_file - | aplay > /dev/null 2>&1
    "$pdir"/piper --length_scale 0.85 --quiet --model "$pdir"/voices/"$voice" --output-raw  2>/dev/null | aplay -r 22050 -f S16_LE -t raw - 2>/dev/null
else
#    "$pdir"/piper <<<"$*" --quiet --model "$pdir"/voices/"$voice" --output_file - | aplay > /dev/null 2>&1
    "$pdir"/piper <<<"$*" --length_scale 0.85 --quiet --model "$pdir"/voices/"$voice" --output-raw  2>/dev/null | aplay -r 22050 -f S16_LE -t raw - 2>/dev/null
fi
#if [[ -z $1 ]]
#then
#    espeak-ng -k10 -s150 # --stdout | aplay > /dev/null 2>&1
#else
#    espeak-ng -k10 -s150 "$*" # --stdout | aplay > /dev/null 2>&1 # "( )$*" is a hack to attempt to workaround truncated start of word
#fi
# truncation hack not needed if snd_hda_intel power saving is turned off: https://major.io/p/stop-audio-pops-on-intel-hd-audio/

#if [[ -z $1 ]]
#then
#    pico2wave --wave=/home/john/.cache/pico2wave/pico.wav | aplay > /dev/null 2>&1
#else
#    pico2wave --wave=/home/john/.cache/pico2wave/pico.wav "$1" | aplay > /dev/null 2>&1
#fi

#if [[ -z $1 ]]
#then
#    flite -voice slt
#else
#    flite -voice slt -t "$1"
#fi

exit

...elevator in the Brain Hotel, broken down but just as well...
( a boring Japan blog (currently paused), now on Bluesky, there's also some GitStuff )

Introduction to the Bunsenlabs Boron Desktop

Online

#36 2024-10-24 11:38:15

brontosaurusrex
Middle Office
Registered: 2015-09-29
Posts: 2,737

Re: Does Okular have a speech engine inside...?

Main snag I can see so far is that admin has no easy way of telling which strings have already been cached.

I'd probably try something like generating $hash.txt next to $hash.wav with a data inside:

hash   First 5 or 10 words of the string...

then you can cat/grep them when needed. But that adds extra complexity.

Another way to keep wavdir in check would be to see if files were used in last 14 days or similar (not exactly sure how to write that) and delete the unused ones.

Also you can use lossy audio:

chatGPT:

For a 45 kbps Opus file to reach 1 GB in size on disk, the file would need to be approximately 49 hours, 22 minutes, and 58 seconds long.

opus at 45kbps should be good enough for mono/voice. (opusenc/opusdec should be in repos).

p.s. this one is 40.5 kbps https://brontosaurusrex.github.io/audio/neumann.opus

Offline

#37 2024-10-24 14:19:49

brontosaurusrex
Middle Office
Registered: 2015-09-29
Posts: 2,737

Re: Does Okular have a speech engine inside...?

p.s. Cloud ai says this would work for deleting files older than 14 days, completely UNTESTED by me:

#!/bin/bash

# Specify the directory to clean up (default to current directory)
TARGET_DIR="${1:-.}"

# First show what will be deleted
echo "The following files have not been accessed for more than 14 days:"
find "$TARGET_DIR" -type f -atime +14 -print

# Ask for confirmation
read -p "Do you want to delete these files? (y/N): " confirm

if [ "$confirm" = "y" ] || [ "$confirm" = "Y" ]; then
    # Delete the files
    find "$TARGET_DIR" -type f -atime +14 -delete
    echo "Files deleted successfully."
else
    echo "Operation cancelled."
fi

p.s. Access time may be unreliable and it depends on the mount options.

mount | grep ' / '

To avoid/overcome that use touch $hash.wav before/after play $hash.wav, since touch by default should update mtime and atime as well.

Offline

#38 2024-10-25 05:04:02

johnraff
nullglob
From: Nagoya, Japan
Registered: 2015-09-09
Posts: 12,550
Website

Re: Does Okular have a speech engine inside...?

brontosaurusrex wrote:

Main snag I can see so far is that admin has no easy way of telling which strings have already been cached.

I'd probably try something like generating $hash.txt next to $hash.wav with a data inside:

hash   First 5 or 10 words of the string...

Not a bad idea. Either individual $hash.txt files or one hashes.txt with all of them. Separate files are easier to edit or remove, one list maybe easier to grep or read manually.

Another way to keep wavdir in check...

In my own use case - short strings which are added manually - dir size is not likely to be a problem.

But for you, sure...

I might think about some way of auto-adding a hashed wav every time 'say' is called, but that would need a way to exclude dynamically generated strings. A popup "add this string?" every time a new one arrived would likely get very annoying. Or else add them all and use your method of weeding out strings that haven't been used in the last month or whatever.

For now it probably suits my purpose well enough to just add selected strings manually with '--record'.

In fact it's only in a few cases that the delay before speech is output is annoying.


...elevator in the Brain Hotel, broken down but just as well...
( a boring Japan blog (currently paused), now on Bluesky, there's also some GitStuff )

Introduction to the Bunsenlabs Boron Desktop

Online

Board footer

Powered by FluxBB