Meer plotjes
17380
Dataset.csv
Normal file
BIN
Description.pdf
Normal file
BIN
IML_Assignment_2122.zip
Normal file
110
assignment.tex
Normal file
@ -0,0 +1,110 @@
|
||||
% This is samplepaper.tex, a sample chapter demonstrating the
|
||||
% LLNCS macro package for Springer Computer Science proceedings;
|
||||
% Version 2.20 of 2017/10/04
|
||||
%
|
||||
\documentclass[runningheads]{llncs}
|
||||
%
|
||||
% A lot of package loading
|
||||
\usepackage[pdftex]{graphicx}
|
||||
\usepackage{geometry}
|
||||
\usepackage[cmex10]{amsmath}
|
||||
\usepackage{array, algpseudocode}
|
||||
\usepackage{amsmath, amssymb, amsfonts, parskip, graphicx, verbatim}
|
||||
\usepackage{url, hyperref}
|
||||
\usepackage{bm, rotating, adjustbox, latexsym}
|
||||
\usepackage{tabularx, booktabs}
|
||||
\newcolumntype{Y}{>{\centering\arraybackslash}X}
|
||||
\usepackage{float, setspace, mdframed}
|
||||
\usepackage{color, contour, placeins, subfig, cite}
|
||||
\usepackage[mathscr]{euscript}
|
||||
\usepackage[osf]{mathpazo}
|
||||
\usepackage{pgf, tikz, microtype, algorithm}
|
||||
\usetikzlibrary{shapes,backgrounds,calc,arrows}
|
||||
\usepackage{xcolor, colortbl, dsfont}
|
||||
|
||||
|
||||
% If you use the hyperref package, please uncomment the following line
|
||||
% to display URLs in blue roman font according to Springer's eBook style:
|
||||
\renewcommand\UrlFont{\color{blue}\rmfamily}
|
||||
|
||||
\graphicspath{{figures/}}
|
||||
|
||||
\begin{document}
|
||||
%
|
||||
\title{Assignment report, group number}
|
||||
%
|
||||
%\titlerunning{Abbreviated paper title}
|
||||
% If the paper title is too long for the running head, you can set
|
||||
% an abbreviated paper title here
|
||||
%
|
||||
\author{Your Names Go Here}
|
||||
%
|
||||
\authorrunning{Short Author Names}
|
||||
% First names are abbreviated in the running head.
|
||||
% If there are more than two authors, 'et al.' is used.
|
||||
%
|
||||
\institute{Leiden Institute of Advanced Computer Science, The Netherlands}
|
||||
%
|
||||
\maketitle % typeset the header of the contribution
|
||||
%
|
||||
\begin{abstract}
|
||||
This document contains the format for the report required for submission of the practical assignment for the course Introduction to Machine Learning. The tasks for this assignment are provided in Appendices~\ref{app:p1}-\ref{app:grade}.
|
||||
\end{abstract}
|
||||
|
||||
|
||||
\section{Introduction}
|
||||
This document serves as a \textit{description of the practical assignment} for the course Introduction to Machine Learning. For this assignment, you are provided with a data set which you should analyze using some of the algorithms discussed during the lectures or this course. The assignment report should be written as a \textit{scientific paper} and submitted together with the code (in Python, mainly using the scikit-learn library~\cite{scikit-learn}).
|
||||
|
||||
To help you structure your report, we provide you with a \textit{brief report outline} in this document. Please complete the following sections with your own results, explanations and conclusions. This includes the abstract and this introduction!
|
||||
|
||||
Appendices~\ref{app:p1}-\ref{app:grade} contain the \textit{specification of the tasks of the assignment}. Do not include them in your report.
|
||||
|
||||
\section{Data Set} \label{sect:dataset}
|
||||
The data set (available on Brightspace) contains data about bike rentals in a large European city. The main learning task for this data set is predicting the amount of bikes rented (by subscribers to the service and by non-subscribers) based on the other features in the data set, but we will also define some additional tasks during this assignment.
|
||||
|
||||
In the remaining part of this section please add your description of the data set you are provided with.
|
||||
|
||||
\subsection{Problem formulation}\label{lab:problem}
|
||||
Please add problem description here.
|
||||
|
||||
\section{Experiments}\label{lab:exp}
|
||||
This is the main section of your report. All methods, experiment descriptions and results should be included here.
|
||||
|
||||
\section{Conclusion and future work}
|
||||
Conclude your most important findings, and what you can learn from them. Identify some points on which can be improved in future, or areas where other algorithms might be useful.
|
||||
|
||||
|
||||
\bibliographystyle{splncs04}
|
||||
\bibliography{bibliography.bib}
|
||||
|
||||
\appendix
|
||||
\section{Content of part 1 of the assignment, deadline of 18.10.2021}\label{app:p1}
|
||||
\begin{enumerate}
|
||||
\item Identify what variables are present in the data set, how they are distributed, what type of variables they are. Apply some pre-processing if this is needed to make the data usable\footnote{Hint: Look at the variable types. Any strings should be transformed to numeric, and simple categorical variables might be better suited to be turned into binary features (look into one-hot-encoding),\dots. You might also want to exclude the 'date' feature.}. Make use of different ways to visualize the data, and look at the correlations between different features\footnote{Hint: For some inspiration on the kind of plots you can create, you can look at the practicums, or go to \url{https://seaborn.pydata.org/examples/index.html}}. (This should be part of Section~\ref{sect:dataset} of your report.).
|
||||
\item Formulate the problem of predicting the number of bikes rented based on the other features present. Use the terminology that has been used in the lectures. (This should be part of the 'problem formulation' part (Section~\ref{lab:problem} of your report.)
|
||||
\item Split the data into two sets: train and test. Train a linear regression method to predict the total number of bikes rented based on the data in the training set, and verify the performance of this regressor on the test set. Identify how its performance varies based on how large the training set is (vizualise this, for example using matplotlib or similar packages). For this linear regressor, try to experiment with different transformations of the target, as this often has a large impact on the R-squared metric. Explain why this is the case! This part, and all of the following tasks, should be part of the experiments section (Section~\ref{lab:exp}).
|
||||
\item Create a decision tree regressor to predict the number of bikes rented by subscribers. For this algorithm, identify what parameter settings you can modify, and explain what these parameters control. Select the one which has the most impact on the test performance, and create a plot showing how this parameter impacts both train and test error, and identify the ideal setting based on this plot. Then, apply these same parameter settings to predicting the number of bikes rented by non-subscribers. Are these settings optimal in this case as well? Clearly motivate your answers.
|
||||
\end{enumerate}
|
||||
|
||||
For your report, make sure you explain the working principles of the methods you use and reason why they lead to the found results. Use relevant visualizations and explain what is being shown (every figure needs to have a caption, and be referenced in the text). The reasoning and discussion about the methods used is key in showing that you understand the concepts, and is thus the most important part in deciding your assignment grade. Since this is a scientific report, make sure to cite all references you use (papers, books,\dots)!
|
||||
|
||||
|
||||
\subsection{Submission of assignment part 1}\label{sec:submission1}\label{app:s1}
|
||||
Each group should submit:
|
||||
\begin{enumerate}
|
||||
\item The python code used to generate all data and figures used in this report. This should be structured clearly, so it can be easily run by reviewers. Pay attention to your coding style and use enough comments (in English). You should use a jupyter notebook for this, as this allows you to give a clear structure to your code. Your code should be one file only!\footnote{Hint: You can use \url{https://git.liacs.nl} to host and share code with your teammates. You can log in using your ULCN username and password.}
|
||||
\item Pdf file of the report typeset in \LaTeX\footnote{Hint: You can use free version of \url{https://overleaf.com} to edit your report as a group}, following the format outlined in this document.
|
||||
\end{enumerate}
|
||||
|
||||
Submission for this part of assignment is mandatory and is to be made via Brightspace. You will not receive a grade for this, but you will be given feedback via Brightspace. This submission serves as a basis for part 2 of the assignment.
|
||||
|
||||
\section{Content of part 2 of the assignment, deadline of 06.12.2021}\label{app:p2}
|
||||
The second part of the assignment will be made available at a later date (middle of October).
|
||||
|
||||
\section{Peer review}\label{app:review}
|
||||
The assignment includes students individually carrying out the reviews of assignment reports from other groups.
|
||||
|
||||
\section{Grading}\label{app:grade}
|
||||
The first part of the assignment will not be graded, but you will receive some feedback on it to improve the final submission. At the final deadline for part 2, the content of part 1 will be graded as well, including based on how you incorporate the received feedback.
|
||||
|
||||
\end{document}
|
12
bibliography.bib
Normal file
@ -0,0 +1,12 @@
|
||||
@article{scikit-learn,
|
||||
title={Scikit-learn: Machine Learning in {P}ython},
|
||||
author={Pedregosa, F. and Varoquaux, G. and Gramfort, A. and Michel, V.
|
||||
and Thirion, B. and Grisel, O. and Blondel, M. and Prettenhofer, P.
|
||||
and Weiss, R. and Dubourg, V. and Vanderplas, J. and Passos, A. and
|
||||
Cournapeau, D. and Brucher, M. and Perrot, M. and Duchesnay, E.},
|
||||
journal={Journal of Machine Learning Research},
|
||||
volume={12},
|
||||
pages={2825--2830},
|
||||
year={2011}
|
||||
}
|
||||
|
BIN
humidity-total-lineplot.png
Normal file
After Width: | Height: | Size: 126 KiB |
59
main.py
Normal file
@ -0,0 +1,59 @@
|
||||
import sklearn
|
||||
import pandas as pd
|
||||
import matplotlib
|
||||
import seaborn as sns
|
||||
import numpy as np
|
||||
import matplotlib.pyplot as plt
|
||||
|
||||
data = pd.read_csv("Dataset.csv")
|
||||
|
||||
#Windspeed
|
||||
plt.figure()
|
||||
sns.set_theme(style="white", color_codes=True)
|
||||
g = sns.JointGrid(data=data, x="Subscribed", y="windspeed", space=0, ratio=17)
|
||||
g.plot_joint(sns.scatterplot, sizes=(30, 120), color="g", alpha=.6, legend=False) #size=data["hour"],
|
||||
g.plot_marginals(sns.rugplot, height=1, color="g", alpha=.6)
|
||||
|
||||
#Seasons - Boxplot
|
||||
plt.figure()
|
||||
sns.set_theme(style="ticks", palette="pastel")
|
||||
sns.boxplot(x="season", y="Total", palette=["m", "g"], data=data)
|
||||
sns.despine(offset=10, trim=True)
|
||||
|
||||
#Subscribed, non-subscribed
|
||||
plt.figure()
|
||||
sns.lineplot(x="month", y="Total", data=data, color='purple')
|
||||
sns.lineplot(x="month", y="Subscribed", data=data, color='blue')
|
||||
sns.lineplot(x="month", y="Non-subscribed", data=data, color='red')
|
||||
plt.legend(title='Type', loc='lower right', labels=['Total', 'Subscribed', 'Non-subscribed'])
|
||||
|
||||
# Distribution plot per month with weather type annotated?
|
||||
plt.figure()
|
||||
sns.displot(data=data, x="month", y="Total", hue="weather")
|
||||
|
||||
# Season - weather scatterplot
|
||||
plt.figure()
|
||||
sns.pointplot(x="Total", y="season", hue="weather", data=data, dodge=.8 - .8 / 3, join=False, palette="dark",markers="d", scale=.75, ci=None)
|
||||
|
||||
# By day -> Holidays
|
||||
plt.figure()
|
||||
sns.violinplot(data=data, x="day", y="Total", hue="smoker", split=True, inner="quart", linewidth=1, palette={"Yes": "b", "No": ".85"})
|
||||
sns.despine(left=True)
|
||||
|
||||
#Temperature total
|
||||
plt.figure()
|
||||
sns.lineplot(x="temperature", y="Total", data=data)
|
||||
sns.lineplot(x="feeling_temperature", y="Total", data=data)
|
||||
plt.legend(title='Type', loc='upper left', labels=['Feeling', 'Feeling temperature'])
|
||||
|
||||
#Humidity total
|
||||
plt.figure()
|
||||
sns.lineplot(x="humidity", y="Total", data=data)
|
||||
|
||||
#Windspeed total
|
||||
plt.figure()
|
||||
sns.lineplot(x="windspeed", y="Total", data=data)
|
||||
|
||||
# Boxplot with weather types
|
||||
plt.figure()
|
||||
sns.boxplot(x="weather", y="Total", data=data, dodge=.8 - .8 / 3, palette="dark")
|
BIN
month-weather-total-barchart.png
Normal file
After Width: | Height: | Size: 72 KiB |
BIN
season-total-boxplot.png
Normal file
After Width: | Height: | Size: 51 KiB |
1548
splncs04.bst
Normal file
BIN
subsribed-total-lineplot.png
Normal file
After Width: | Height: | Size: 160 KiB |
BIN
temperature-rentals-lineplot.png
Normal file
After Width: | Height: | Size: 148 KiB |
BIN
weather-total-scatterplot.png
Normal file
After Width: | Height: | Size: 63 KiB |
BIN
weathertype-total-boxplot.png
Normal file
After Width: | Height: | Size: 58 KiB |
BIN
windspeed-subscribed.png
Normal file
After Width: | Height: | Size: 63 KiB |
BIN
windspeed-total-lineplot.png
Normal file
After Width: | Height: | Size: 89 KiB |