This commit is contained in:
Eljakim Herrewijnen 2023-12-27 22:45:02 +01:00
parent 7389497e82
commit eec47adff0
3 changed files with 47 additions and 4 deletions

12
scrape/Readme.md Normal file
View File

@ -0,0 +1,12 @@
# Scrape Usse
Funda scraper that automatically calculates the distance to several points in the Netherlands. Project relies heavily on a slightly modified version of the funda-scraper that is available on github.
## Usage
Update the ``URL`` parameter to use your filtered url from Funda.
```python
URL = "https://www.funda.nl/zoeken/koop?selected_area=%5B%22utrecht,15km%22%5D&price=%22-400000%22&object_type=%5B%22house%22%5D"
```
Next you should be able to scrape the data from funda. See the RTD for more docs on how to setup OSRM and use the results.

34
scrape/scrape_README.md Normal file
View File

@ -0,0 +1,34 @@
# FundaScraper
`FundaScaper` provides you the easiest way to perform web scraping from Funda, the Dutch housing website.
You can find listings from either house-buyer or rental market, and you can find historical data from the past few year.
## Install
```
pip install funda-scraper
```
## Quickstart
```
from funda_scraper import FundaScraper
scraper = FundaScraper(area="amsterdam", want_to="rent", find_past=False)
df = scraper.run()
df.head()
```
![image](https://i.imgur.com/mmN9mjQ.png)
You can pass several arguments to `FundaScraper()` for customized scraping:
- `area`: Specify the city or specific area you want to look for, eg. Amsterdam, Utrecht, Rotterdam, etc
- `want_to`: You can choose either `buy` or `rent`, which finds houses either for sale or for rent.
- `find_past`: Specify whether you want to check the historical data. The default is `False`.
- `n_pages`: Indicate how many pages you want to look up. The default is `1`.
## Advanced usage
You can check the [example notebook](https://colab.research.google.com/drive/1hNzJJRWxD59lrbeDpfY1OUpBz0NktmfW?usp=sharing) for further details.
Please give me a [star](https://github.com/whchien/funda-scraper) if you find this project helpful.

View File

@ -56,13 +56,10 @@ def generate_json(houses):
destination_location = res.point destination_location = res.point
destination_location = [destination_location.longitude, destination_location.latitude] destination_location = [destination_location.longitude, destination_location.latitude]
# distance_matrix = gmaps.distance_matrix(origin_locations['nfi_location'], destination_location, mode = 'driving')
out_dict['name'] = f"{address}_{count}" # Fix for duplicate names in dictionary. out_dict['name'] = f"{address}_{count}" # Fix for duplicate names in dictionary.
out_dict['position'] = destination_location out_dict['position'] = destination_location
for key in houses.keys(): for key in houses.keys():
out_dict[key] = houses.__getattr__(key).get(i) out_dict[key] = houses.__getattr__(key).get(i)