usse/documentation/build/funda.html

130 lines
7.4 KiB
HTML
Raw Normal View History

2023-02-28 22:09:00 +00:00
<!DOCTYPE html>
<html class="writer-html5" lang="en" >
<head>
<meta charset="utf-8" /><meta name="generator" content="Docutils 0.17.1: http://docutils.sourceforge.net/" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<title>Scraping Funda &mdash; Usse 1 documentation</title>
<link rel="stylesheet" href="_static/pygments.css" type="text/css" />
<link rel="stylesheet" href="_static/css/theme.css" type="text/css" />
<!--[if lt IE 9]>
<script src="_static/js/html5shiv.min.js"></script>
<![endif]-->
<script data-url_root="./" id="documentation_options" src="_static/documentation_options.js"></script>
<script src="_static/jquery.js"></script>
<script src="_static/underscore.js"></script>
<script src="_static/_sphinx_javascript_frameworks_compat.js"></script>
<script src="_static/doctools.js"></script>
<script src="_static/sphinx_highlight.js"></script>
<script src="_static/js/theme.js"></script>
<link rel="index" title="Index" href="genindex.html" />
<link rel="search" title="Search" href="search.html" />
<link rel="next" title="Google Maps API" href="googlemaps.html" />
<link rel="prev" title="Project Usse" href="index.html" />
</head>
<body class="wy-body-for-nav">
<div class="wy-grid-for-nav">
<nav data-toggle="wy-nav-shift" class="wy-nav-side">
<div class="wy-side-scroll">
<div class="wy-side-nav-search" >
<a href="index.html" class="icon icon-home"> Usse
</a>
<div role="search">
<form id="rtd-search-form" class="wy-form" action="search.html" method="get">
<input type="text" name="q" placeholder="Search docs" />
<input type="hidden" name="check_keywords" value="yes" />
<input type="hidden" name="area" value="default" />
</form>
</div>
</div><div class="wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="Navigation menu">
<p class="caption" role="heading"><span class="caption-text">Contents:</span></p>
<ul class="current">
<li class="toctree-l1 current"><a class="current reference internal" href="#">Scraping Funda</a></li>
<li class="toctree-l1"><a class="reference internal" href="googlemaps.html">Google Maps API</a></li>
<li class="toctree-l1"><a class="reference internal" href="osm.html">Open Street Maps</a></li>
</ul>
</div>
</div>
</nav>
<section data-toggle="wy-nav-shift" class="wy-nav-content-wrap"><nav class="wy-nav-top" aria-label="Mobile navigation menu" >
<i data-toggle="wy-nav-top" class="fa fa-bars"></i>
<a href="index.html">Usse</a>
</nav>
<div class="wy-nav-content">
<div class="rst-content">
<div role="navigation" aria-label="Page navigation">
<ul class="wy-breadcrumbs">
<li><a href="index.html" class="icon icon-home"></a></li>
<li class="breadcrumb-item active">Scraping Funda</li>
<li class="wy-breadcrumbs-aside">
<a href="_sources/funda.rst.txt" rel="nofollow"> View page source</a>
</li>
</ul>
<hr/>
</div>
<div role="main" class="document" itemscope="itemscope" itemtype="http://schema.org/Article">
<div itemprop="articleBody">
<section id="scraping-funda">
<h1>Scraping Funda<a class="headerlink" href="#scraping-funda" title="Permalink to this heading"></a></h1>
<p><code class="docutils literal notranslate"><span class="pre">Funda</span></code> is a real estate housing market that tries to keep track of all houses that are currently for sale.
Scraping is not allowed, but on github there are several projects that still try to do this.</p>
<p>A quick test from several github projects landed us with <a class="reference external" href="https://github.com/whchien/funda-scraper">this project</a>.</p>
<p>This project still works, but is very limited in the filtering methods.
A few patches to code allows us to inject a URL that will be used and no other filters will be applied.
Next we can setup a basic filter in the browser and copy the URL in order to do scraping.</p>
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="k">if</span> <span class="bp">self</span><span class="o">.</span><span class="n">url</span> <span class="o">!=</span> <span class="s2">&quot;&quot;</span><span class="p">:</span>
<span class="c1"># https://www.funda.nl/koop/gemeente-huizen/0-350000/tuin/+10km/</span>
<span class="c1"># gemeente-huizen/0-350000/tuin/+10km/</span>
<span class="k">return</span> <span class="p">{</span>
<span class="s2">&quot;close&quot;</span><span class="p">:</span> <span class="sa">f</span><span class="s2">&quot;</span><span class="si">{</span><span class="bp">self</span><span class="o">.</span><span class="n">base_url</span><span class="si">}</span><span class="s2">/koop/verkocht/</span><span class="si">{</span><span class="bp">self</span><span class="o">.</span><span class="n">url</span><span class="si">}</span><span class="s2">/&quot;</span><span class="p">,</span>
<span class="s2">&quot;open&quot;</span><span class="p">:</span> <span class="sa">f</span><span class="s2">&quot;</span><span class="si">{</span><span class="bp">self</span><span class="o">.</span><span class="n">base_url</span><span class="si">}</span><span class="s2">/koop/</span><span class="si">{</span><span class="bp">self</span><span class="o">.</span><span class="n">url</span><span class="si">}</span><span class="s2">/&quot;</span><span class="p">,</span>
<span class="p">}</span>
</pre></div>
</div>
<p>Scrape funda with URL:</p>
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">get_funda_data</span><span class="p">():</span>
<span class="n">scraper</span> <span class="o">=</span> <span class="n">FundaScraper</span><span class="p">(</span><span class="n">url</span><span class="o">=</span><span class="s2">&quot;nijkerk/beschikbaar/100000-400000/woonhuis/tuin/eengezinswoning/landhuis/+30km/&quot;</span><span class="p">,</span> <span class="n">find_past</span><span class="o">=</span><span class="kc">False</span><span class="p">,</span> <span class="n">n_pages</span><span class="o">=</span><span class="mi">81</span><span class="p">)</span>
<span class="n">df</span> <span class="o">=</span> <span class="n">scraper</span><span class="o">.</span><span class="n">run</span><span class="p">()</span>
<span class="k">return</span> <span class="n">df</span>
</pre></div>
</div>
</section>
</div>
</div>
<footer><div class="rst-footer-buttons" role="navigation" aria-label="Footer">
<a href="index.html" class="btn btn-neutral float-left" title="Project Usse" accesskey="p" rel="prev"><span class="fa fa-arrow-circle-left" aria-hidden="true"></span> Previous</a>
<a href="googlemaps.html" class="btn btn-neutral float-right" title="Google Maps API" accesskey="n" rel="next">Next <span class="fa fa-arrow-circle-right" aria-hidden="true"></span></a>
</div>
<hr/>
<div role="contentinfo">
<p>&#169; Copyright 2023, Eljakim.</p>
</div>
Built with <a href="https://www.sphinx-doc.org/">Sphinx</a> using a
<a href="https://github.com/readthedocs/sphinx_rtd_theme">theme</a>
provided by <a href="https://readthedocs.org">Read the Docs</a>.
</footer>
</div>
</div>
</section>
</div>
<script>
jQuery(function () {
SphinxRtdTheme.Navigation.enable(true);
});
</script>
</body>
</html>