Author Archives: Captain Haddock

Some kind of plots

Distribution (Histogram)

import matplotlib.pyplot as plt
df.hist(bins=50, figsize=(20, 15))

Distribution (Density)

A density plot is a smoothed, continuous version of a histogram estimated from the data.

df.plot(kind='density', subplots=True, layout=(8,8), sharex=False, legend=False, figsize = (12,12))


corr = train_df.corr()

Heatmap (needs correlations)

%matplotlib inline
import seaborn as sns
plt.figure(figsize = (16,8))
sns.heatmap(corr, annot = True)


Histogram plot of select variable(s)

from matplotlib import pyplot
pyplot.hist(train_X.iloc[:, 1])
pyplot.hist(train_X.iloc[:, 2])

Machine Learning: Pima Indians Diabetes

Visualise the Dataset

Visualising the data is an important step of the data analysis. With a graphical visualisation of the data we have a better understanding of the various features values distribution: for example we can understand what’s the average age of the people or the average BMI etc…We could of course limit our inspection to the table visualisation, but we could miss important things that may affect our model precision.

import matplotlib.pyplot as plt
dataset.hist(bins=50, figsize=(20, 15))

Source: Machine Learning: Pima Indians Diabetes

Get Started: 3 Ways to Load CSV files into Colab – Towards Data Science

To upload from your local drive, start with the following code:

from google.colab import files
uploaded = files.upload()

It will prompt you to select a file. Click on “Choose Files” then select and upload the file. Wait for the file to be 100% uploaded. You should see the name of the file once Colab has uploaded it.

Finally, type in the following code to import it into a dataframe (make sure the filename matches the name of the uploaded file).

import io
df2 = pd.read_csv(io.BytesIO(uploaded['Filename.csv']))

Dataset is now stored in a Pandas Dataframe

Source: Get Started: 3 Ways to Load CSV files into Colab – Towards Data Science