You are not logged in.

#1 2020-07-16 20:41:22

kozimodo
Member
Registered: 2015-10-04
Posts: 67

Python Craigslist scraper

It's taken a while as I've been really busy but here is the scraper I mentioned in this thread.

It assumes you are running it on a mail server that you have access to so that signing in to the SMTP server is not necessary.  It keeps track of the 200 most recent posts so that it does not keep sending the same posts over and over. (If you live in a big urban area, you may need to increase this.)

The basic steps to get it running are:

  1. If Python 3 is not installed, do so.

    sudo apt install python3
    sudo apt install python3-mechanicalsoup
  2. If "mechanicalsoup" is not packaged in your distribution (it is in Buster), the second command above will fail so you need to install it:

    sudo apt install python3-pip
    pip3 install mechanicalsoup
  3. Create a "craigslist-scraper" directory in your home directory

  4. Copy https://pastebin.com/Vw0b9hHw into your home directory as "scraper.py"

  5. Modify "name" using your own name and "email" using your own email address (lines 13 and 14).

  6. Modify "url" with the URL for your local craigslist. The easiest way to find this is to do the search you want in a browser and then copy and paste the resulting URL.

  7. On line 49, the 'bik/d' (this is for bicycles) needs to be changed to reflect what you are searching for. To find the right replacement, you need to manually do your search in a browser, hover over an item to reveal the post url. The replacement string will be the 5 characters directly following "craigslist.org/"

  8. chmod +x scraper.py

  9. Add a crontab entry to run the script periodically. E.g.,

    */15 * * * * /home/$USER/craigslist-scraper/scraper.py

    replacing $USER with your own username.

To send yourself email using a mail server that you do not have access to you need to change 'localhost' in line 61 to the domain name for the SMTP server you want to use and insert the line

smtpObj.login('user', 'password')

in the next line, changing "user" and "password" to your own email credentials. The SMTP server will likely reject this if these are not the username and password  for the email address you supplied in 3.

I'm sure there are lots of improvements and tweaks that can be made but it works for my purposes.

Enjoy!

Offline

#2 2020-07-17 07:47:03

ohnonot
...again
Registered: 2015-09-29
Posts: 4,926
Website

Re: Python Craigslist scraper

The URL is hardcoded.
It just scrapes all links from that web page; actually craigslist itself does the heavy lifting, yes? Like deciding which posts are new to you?
Even so, good stuff, adaptable, thanks for sharing!


BL quote proposals to this thread please.
my repos / my repos

Offline

#3 2020-07-17 13:38:53

kozimodo
Member
Registered: 2015-10-04
Posts: 67

Re: Python Craigslist scraper

Yea, it's hardcoded and yea Craigslist does all the heavy lifting but it's not much lifting.  I believe Craigslist just spits out everything that hasn't been deleted or expired and fits your search criteria.  The script itself keeps track of what posts have been seen in a text file.

It could certainly be made to work more generally and interactive with lists of region codes, search categories, etc.  It could also easily look beyond the first page of posts but my thinking is that stuff beyond the first page will likely no longer be available.

All of that said, if one has a modicum of tech ability and only occasionally needs to search Craigslist, the script with minor edits will do everything you need.

Offline

#4 2020-07-17 18:15:02

ohnonot
...again
Registered: 2015-09-29
Posts: 4,926
Website

Re: Python Craigslist scraper

I was kinda looking for a simple example of python's web scraping abilites, again thanks for sharing.


BL quote proposals to this thread please.
my repos / my repos

Offline

Board footer

Powered by FluxBB