usse/scrape/Readme.md

34 lines
1.2 KiB
Markdown
Raw Normal View History

2023-12-27 21:45:02 +00:00
# Scrape Usse
Funda scraper that automatically calculates the distance to several points in the Netherlands. Project relies heavily on a slightly modified version of the funda-scraper that is available on github.
2023-12-27 22:18:06 +00:00
## Install
Create a virtual env and install the dependencies
```bash
python3 -m venv venv/
source venv/bin/activate
pip3 install -r requirements.txt
```
Also lxml is required for beautifullsoup to run:
```bash
sudo apt-get install python3-lxml
```
2023-12-27 21:45:02 +00:00
## Usage
Update the ``URL`` parameter to use your filtered url from Funda.
```python
URL = "https://www.funda.nl/zoeken/koop?selected_area=%5B%22utrecht,15km%22%5D&price=%22-400000%22&object_type=%5B%22house%22%5D"
```
2024-03-31 16:21:59 +00:00
Next you should be able to scrape the data from funda. See the RTD for more docs on how to setup OSRM and use the results.
## Panda
To just interact with the panda dataframe:
```python
data = pickle.load(open('panda_dump.bin', 'rb'))
type(data)
<class 'pandas.core.frame.DataFrame'>
data.descrip.get(0)
"Aan de rand van de populaire woonwijk 'De Hagen' te Vianen staat deze fijne tussenwoning met groenstrook en water voor de deur. De buurt straalt een gemoedelijke sfeer uit en[..]"
```