Spider built with Scrapy. Scrapes/Extract Reviews from yellowpages.com and gets all review with relevant basic information available of this YP Profile. From here it is possible to get results into csv, json and xml files that Scrapy can generate.
Export in CSV file, use this command
scrapy crawl yp_reviews -o sampleDataYp.csv
Export in JSON file, use this command
scrapy crawl yp_reviews -o sampleDataYp.json
Further scope is develop pipelines and to add back-end SQL DB tables [ Example:
master_reviews&master_surveys] OR MongoDB and Log files in "logs" folder , log info table to track activities inlog_historytable, email notification etc features can be added.
- name
- total_review
- ratings
- reviewer_name
- reviews
- review_date
- hash_key
- sourceURL
git clone https://github.com/azambd/yellowpages-review.git
cd yellowpages
scrapy crawl yp_reviews
→ tree -l -v
.
├── LICENSE
├── README.md
├── sampleDataYp.csv
├── sampleDataYp.json
├── scrapy.cfg
└── yellowpages
├── __init__.py
├── items.py
├── middlewares.py
├── pipelines.py
├── settings.py
└── spiders
├── __init__.py
└── yp-reviews.py
2 directories, 12 files
If you need any help to upgrade this spider to a production version, shoot an email at [azam@wscraper.com] - I'll help you.