Heads up! To view this whole video, sign in with your Courses account or enroll in your free 7-day trial. Sign In Enroll
Preview
Start a free Courses trial
to watch this video
Understand what cleaning data means.
Terms
- Data Cleaning is the process of fixing any errors or mistakes in a dataset.
Related Discussions
Have questions about this video? Start a discussion with the community and Treehouse staff.
Sign upRelated Discussions
Have questions about this video? Start a discussion with the community and Treehouse staff.
Sign up
[MUSIC]
0:00
Hi, my name is Megan, and
I'm a teacher here at Treehouse.
0:09
In this course, I'll teach you
how to prepare data for analysis.
0:13
It's unlikely you will always get
a perfect data set without any mistakes or
0:18
errors or missing information.
0:23
You will most likely need to clean the
data in order to prepare it for analysis.
0:25
And analysis is only as
good as your data is.
0:31
Data cleaning is the process of fixing
any errors or mistakes in a dataset.
0:38
You've probably seen data in one
of its common forms, like a table.
0:45
The columns inform you of the type
of data this table contains, and
0:49
the rows hold data points.
0:54
For example,
here is a table filled with Pokemon.
0:56
This table is clean because all of
the data types are the same, and
1:01
there isn't any missing information.
1:06
Now let's look at a dirty
version of this table.
1:09
This table now has instances of missing
data, like in the first row and Ekans row.
1:13
Data in the incorrect format,
one is written in feet and inches,
1:19
another includes the notation for
pounds and
1:24
is also a whole number instead
of a decimal or float.
1:27
And lastly, this one's items are broken
up by dashes instead of commas.
1:31
While cleaning data, you'll need to make
decisions about whether to discard rows,
1:36
which format to use for
the column, and more.
1:41
Depending on the amount of cleaning you
need to do, these decisions will be
1:44
important to share with stakeholders
when you share your analysis.
1:49
We'll be working with spreadsheets
using Google Sheets and
1:53
then Python's pandas library.
1:56
If you aren't familiar with either topic,
1:58
I would suggest taking the prerequisites
for this course before continuing.
2:01
Throughout the course, don't forget to
check the teacher's notes below each video
2:05
for additional information.
2:10
Let's dig in.
2:11
You need to sign up for Treehouse in order to download course files.
Sign upYou need to sign up for Treehouse in order to set up Workspace
Sign up