You are not logged in.

#1 2015-10-29 12:24:36

shot-in-the-head
Member
Registered: 2015-10-28
Posts: 61

what's an easy way to remove pictures from website just leaving text

I know you can go to printfriendly.com  but is there a way to do it - preferably with a cli program/command?
Basically sometimes I want to have just the text from a web page for printing or putting into pdf etc

edit, just found Printliminator which does what I want
https://css-tricks.github.io/The-Printliminator/

Last edited by shot-in-the-head (2015-10-29 12:39:30)

Offline

#2 2015-10-29 12:50:52

vasa1
Member
Registered: 2015-09-29
Posts: 187

Re: what's an easy way to remove pictures from website just leaving text

Try

w3m -dump url > ~/Destination/filename

You'll need to install w3m.


Using the Openbox (3.5.2) session of Lubuntu 14.04 LTS but very interested in BL :)

Offline

#3 2015-10-29 13:16:59

shot-in-the-head
Member
Registered: 2015-10-28
Posts: 61

Re: what's an easy way to remove pictures from website just leaving text

Tried w3m but it was too brutal and didn't produce anything readable smile but printliminator and another javascript I found is fine

Offline

#4 2015-10-29 15:22:13

twoion
ほやほや
Registered: 2015-08-10
Posts: 2,934

Re: what's an easy way to remove pictures from website just leaving text

Depending on your use case, you can write a tailored fit using pyhton-beautifulsoup. Otherwise, html2text does nice things:

html2text --ignore-links --ignore-images --no-automatic-links --ignore-emphasis $file

Play with the options (see html2text -h).


Per aspera ad astra.

Online

#5 2015-10-29 15:39:30

shot-in-the-head
Member
Registered: 2015-10-28
Posts: 61

Re: what's an easy way to remove pictures from website just leaving text

Yep html2text does a good job on sites I've tried so far smile

Offline

#6 2015-10-30 05:13:41

seppalta
Member
Registered: 2015-10-02
Posts: 41
Website

Re: what's an easy way to remove pictures from website just leaving text

Here is a handy table that shows several file changing commands:  Openbox Guide - Splitters, Joiners, Converters, etc
You can extract text with either "html2text  xx.html | tee ~/xx.tex", as has already been identified, or "vilistextum -rcn xx.html xx.tex"

Offline

#7 2015-10-30 07:10:36

ohnonot
...again
Registered: 2015-09-29
Posts: 4,877
Website

Re: what's an easy way to remove pictures from website just leaving text

if you just want plain text, it's enough to highlight the desired text and ctrl+c - ctrl+v it into a geany window:

seppalta
    Member
    Registered: 2015-10-02
    Posts: 3
    Email PM Website

Re: what's an easy way to remove pictures from website just leaving text

Here is a handy table that shows several file changing commands:  Openbox Guide - Splitters, Joiners, Converters, etc
You can extract text with either "html2text  xx.html | tee ~/xx.tex", as has already been identified, or "vilistextum -rcn xx.html xx.tex"

Offline

    Report Quote 

Pages: 1

Post reply

BL quote proposals to this thread please.
how to ask smart questions | my repos / my repos | my blog
---
Thank you for posting direct image links!

Offline

#8 2015-10-30 09:04:02

shot-in-the-head
Member
Registered: 2015-10-28
Posts: 61

Re: what's an easy way to remove pictures from website just leaving text

Thanks for all the interesting/helpful suggestions.  I'm happy now smile  but dont let me stop anyone for more ideas that others can read and learn from.

Offline

Board footer

Powered by FluxBB