89 lines
3.4 KiB
Markdown
89 lines
3.4 KiB
Markdown
# FundaScraper
|
||
|
||
[![Project Status: Active – The project has reached a stable, usable state and is being actively developed.](https://www.repostatus.org/badges/latest/active.svg)](https://www.repostatus.org/#active)
|
||
[![Build Status](https://app.travis-ci.com/whchien/funda-scraper.svg?branch=main)](https://app.travis-ci.com/whchien/funda-scraper)
|
||
[![codecov](https://codecov.io/gh/whchien/funda-scraper/branch/main/graph/badge.svg?token=QUKTDyeUqp)](https://codecov.io/gh/whchien/funda-scraper)
|
||
[![Downloads](https://static.pepy.tech/badge/funda-scraper)](https://pepy.tech/project/funda-scraper)
|
||
[![PyPI version](https://img.shields.io/pypi/v/funda-scraper)](https://pypi.org/project/funda-scraper/)
|
||
[![PEP8](https://img.shields.io/badge/code%20style-pep8-orange.svg)](https://www.python.org/dev/peps/pep-0008/)
|
||
|
||
`FundaScaper` provides you the easiest way to perform web scraping from Funda, the Dutch housing website.
|
||
You can find houses either for sale or for rent, and the historical data from the past few year are also attainable.
|
||
|
||
Please note:
|
||
1. Scraping this website is only allowed for personal use (as per Funda's Terms and Conditions).
|
||
2. Any commercial use of this Python package is prohibited. The author holds no liability for any misuse of the package.
|
||
|
||
|
||
## Install
|
||
1. The easiest way is to install with pip:
|
||
```
|
||
pip install funda-scraper
|
||
```
|
||
2. You can also clone the repository to your local machine with:
|
||
```
|
||
git clone https://github.com/whchien/funda-scraper.git
|
||
cd funda-scraper
|
||
export PYTHONPATH=${PWD}
|
||
python funda_scraper/scrape.py --area amsterdam --want_to rent --find_past False --page_start 1 --n_pages 3
|
||
```
|
||
|
||
## Quickstart
|
||
```
|
||
from funda_scraper import FundaScraper
|
||
|
||
scraper = FundaScraper(area="amsterdam", want_to="rent", find_past=False, page_start=1, n_pages=3)
|
||
df = scraper.run(raw_data=False, save=True, filepath="test.csv", min_price=500, max_price=2000)
|
||
df.head()
|
||
```
|
||
![image](https://i.imgur.com/mmN9mjQ.png)
|
||
|
||
|
||
You can pass several arguments to `FundaScraper()` for customized scraping:
|
||
- `area`: Specify the city or specific area you want to look for, e.g. Amsterdam, Utrecht, Rotterdam, etc
|
||
- `want_to`: You can choose either `buy` or `rent`, which finds houses either for sale or for rent.
|
||
- `find_past`: Specify whether you want to find the data in the past or the ones in the market. If `True`, only historical data will be scraped. The default is `False`.
|
||
- `page_start`: Indicate which page you want to start scraping. The default is `1`.
|
||
- `n_pages`: Indicate how many page you want to scrape. The default is `1`.
|
||
- `min_price`: Indicate the lowest amount for the budget
|
||
- `max_price`: Indicate the highest amount for the budget
|
||
|
||
The scraped raw result contains following information:
|
||
- url
|
||
- price
|
||
- address
|
||
- description
|
||
- listed_since
|
||
- zip_code
|
||
- size
|
||
- year_built
|
||
- living_area
|
||
- kind_of_house
|
||
- building_type
|
||
- num_of_rooms
|
||
- num_of_bathrooms
|
||
- layout
|
||
- energy_label
|
||
- insulation
|
||
- heating
|
||
- ownership
|
||
- exteriors
|
||
- parking
|
||
- neighborhood_name
|
||
- date_list
|
||
- date_sold
|
||
- term
|
||
- price_sold
|
||
- last_ask_price
|
||
- last_ask_price_m2
|
||
- city
|
||
|
||
You can use `scraper.run(raw_data=True)` to fetch the data without preprocessing.
|
||
|
||
## More information
|
||
|
||
You can check the [example notebook](https://colab.research.google.com/drive/1hNzJJRWxD59lrbeDpfY1OUpBz0NktmfW?usp=sharing) for further details.
|
||
Please give me a [star](https://github.com/whchien/funda-scraper) if you find this project helpful.
|
||
|
||
|