data wrangling, replace some lines and bypass other

brontosaurusrex · 2020-08-25 17:41:26

data.txt for testing (seconds are irrelevant)

some
12:45:00
other
14:45:00
data
is
here
00:09:00
23:02:00
and here as well

And I would like to convert time lines to H.MM format, and I came up with

#!/bin/bash

while read -r line
do

    # line by line, if it looks like time, convert, else bypass
    if (echo "$line" | grep -E '^[0-9]{2}' &> /dev/null ) ; then

        echo -n "$line > " # debug
        echo "$line" | date "+%k.%M" -f -

    else # bypass

        echo "$line < bypass"

    fi

done < data.txt

A better way, a simpler way? A magic one-line awk/sed?

Last edited by brontosaurusrex (2020-08-25 17:42:17)

ohnonot · 2020-08-25 19:55:39

Possibly:

#!/bin/bash

while read -r line
do
    if [[ "$line" == [0-2][0-9]:[0-5][0-9]:[0-5][0-9] ]]; then
        line="${line%:*}"
        echo "${line/0/ }"
    else
         echo "$line"
    fi
done < data.txt

not tested.

brontosaurusrex · 2020-08-25 20:17:25

^ Nice. (Except 23:02 becomes 23. 2).

Last edited by brontosaurusrex (2020-08-25 20:20:44)

Sector11 · 2020-08-25 20:22:42

One line:

 25 Aug 20 @ 17:18:42 ~
   $ sed 's@\(..\):\(..\):\(..\)@\1:\2@' ~/time.data.txt
some
12:45
other
14:45
data
is
here
00:09
23:02 
 25 Aug 20 @ 17:19:16 ~
   $ sed 's@\(..\):\(..\):\(..\)@\1:\2@' ~/time.data.txt >~/tdata.txt

new file ~/tdata.txt

EDIT: Not me, I'm not that smart. Google is though.
Check #4

Last edited by Sector11 (2020-08-25 20:24:50)

johnraff · 2020-08-26 02:10:01

Borrowing part of ohnonot's regex, alternative sed 1-liner:

sed -nr 's/^([0-2][0-9]:[0-5][0-9]).*$/\1/p' <<<"$data"

That will accept lines with extra stuff after the H M S part too (deleting it), which might or might not be good.

nobody · 2020-08-26 20:30:27

Hej, the OP explicitly asked for H.MM format and NOT HH.MM or HH:MM

I think this is the clearest code in this thread yet

gawk -NF: '
  /^[0-9]+:[0-9]+:[0-9]+$/ {
    printf "%d.%02d\n", $1, $2;
  }
' data.txt

This one is more verbose but makes it easier to deal with "bypassed" lines if you want to do things with those, too, 'cause you can write just else:

gawk -NF: '
  {
    if ($0 ~ /^[0-9]+:[0-9]+:[0-9]+$/) {
      printf "%d.%02d\n", $1, $2;
    } else {
      print "Woo";
    }
  }
' data.txt

One-liners are not always best. Hard to read and slow to understand...

brontosaurusrex · 2020-08-27 16:03:18

@sector, john, @nobody, thanks, I will test everything.

One-liners are not always best. Hard to read and slow to understand...

^ Certainly one for the remembrance.

Last edited by brontosaurusrex (2020-08-27 16:03:47)

#1 2020-08-25 17:41:26

data wrangling, replace some lines and bypass other

#2 2020-08-25 19:55:39

Re: data wrangling, replace some lines and bypass other

#3 2020-08-25 20:17:25

Re: data wrangling, replace some lines and bypass other

#4 2020-08-25 20:22:42

Re: data wrangling, replace some lines and bypass other

#5 2020-08-26 02:10:01

Re: data wrangling, replace some lines and bypass other

#6 2020-08-26 20:30:27

Re: data wrangling, replace some lines and bypass other

#7 2020-08-27 16:03:18

Re: data wrangling, replace some lines and bypass other

Board footer