Heads up! To view this whole video, sign in with your Courses account or enroll in your free 7-day trial. Sign In Enroll
Well done!
You have completed Data Analysis Basics!
You have completed Data Analysis Basics!
Preview
It's not always obvious how you should analyze your data. In this video we'll analyze whether some ages are over-represented at the Boston Marathon.
Related Discussions
Have questions about this video? Start a discussion with the community and Treehouse staff.
Sign upRelated Discussions
Have questions about this video? Start a discussion with the community and Treehouse staff.
Sign up
All right, we've got our lines in.
0:00
Now it's time to find out if any
of our data crosses the line.
0:02
Let's start by creating
a table of ages and
0:06
how many runners of each age ran the race.
0:08
Let's leave a space below Max Difference
and then in row 6 let's add two labels,
0:12
Age and Count.
0:18
And let's go ahead and bold our labels.
0:22
Then, below our age label,
let's add 18, and then 19.
0:25
Then let's highlight both of those cells,
and drag down until we have an entry for
0:31
each age between the min at 18 and
the max at 84.
0:38
Next, under the count label,
0:51
we need to count how many runners
were the age reported in column A.
0:54
To do this,
we'll use the Count if function,
1:00
which counts all the values in a range,
but only if they meet certain criteria.
1:03
So, under our Count label,
let's type =CountIf and hit Enter.
1:07
And then, we need to specify
the range we are interested in,
1:14
which in this case is
the ages of all the runners.
1:17
We already have that range and
the formulas up here.
1:21
So I'll take a second to copy that.
1:24
And then down here,
looks like I lost my CountIf,
1:32
but we can bring it back,
and paste in the range.
1:36
Then, for the second parameter,
1:40
we need to check that the value is
equal to the number in column A.
1:42
And remember,
this needs to be represented as a string.
1:46
So let's type, in quotation marks,
an equal sign, and then
1:50
let's concatenate that equal sign with the
cell in column A by typing an ampersand.
1:55
And then clicking over here
to select the cell and
2:02
hit Enter to finish our account statement.
2:05
Finally let's the drag list formula
down to page 84 to give us the counts.
2:07
And notice that if we
click on this last count,
2:25
the range of ages is locked
in as C2 to C26,411.
2:30
You can get some pretty hard to
find errors if you forget to lock
2:36
in these rangers by
using the dollar signs.
2:39
So, if you want to be using the exact
same range in a bunch of different cells,
2:42
make sure you're specifying your
range with those dollar signs.
2:47
Now that we've got all our accounts,
lets go back to the top and
2:52
add another column to keep track of
the difference between the two ages.
2:55
Let's call it difference and
3:00
then quickly bold it, and
then let's skip row 7 and
3:04
then row 8, let's type =B8-B7.
3:10
Then let's drag from there all the way
to the bottom to give us all of our
3:14
differences.
3:18
We can already tell that some
of these values are over 200.
3:21
But let's try and make them standout
by using some conditional formatting.
3:26
Conditional formatting is
formatting that only happens
3:30
once a certain condition is met.
3:33
For example, if you wanted to highlight
all the positive cells in green and
3:35
all the negative cells in red,
you could use conditional formatting.
3:40
To add conditional formatting
to a Difference column,
3:43
let's select all of the differences And
3:46
then choose Format,
Conditional formatting.
3:52
On the right we have options for
our formatting.
3:57
Let's change the condition from if
Cell is not empty to if Greater than
4:00
And add a value of 200.
4:10
Finally, let's change the formatting
style to have a red background.
4:13
And there we go.
4:18
It looks like we've got one, two, three,
4:19
four instances where
the difference is more than 200.
4:24
However, we really shouldn't
be hard coding this 200 value.
4:30
It should be set to the value
of our max difference.
4:33
So, instead of 200, let's type =,
and this is cell B4 and
4:38
we're not allowed click it,
which means we have to type it in.
4:44
Well that doesn't look quite right.
4:51
What's going on here?
4:53
Well, it turns out that when we type
B4 like this, it starts with C8 and
4:54
it compares it to B4, but then it moves
on to C9 and it compares it to B5 and
5:01
C10 with B6, and so on.
5:07
If we want this to compare
each cell with B4,
5:12
we need to use dollar signs to
make this an absolute reference.
5:15
That's better.
5:23
Then let's click done, and
5:24
by our criteria, we can now answer
the question we set out to solve.
5:27
Some ages do have an easier
time qualifying than others.
5:32
Though before we move on,
5:36
we should briefly talk about
statistical significance.
5:38
Imagine you just created a new drug.
5:41
Well, before you'd be able to sell that
drug, you'd need to make sure it was safe.
5:44
You'd want to test it on
a small subset of people
5:49
before releasing it to
the whole population.
5:51
But how many people would
you need to test it on?
5:54
If you tried it on five people and
they all turned out fine,
5:57
would you say that it's safe for
the whole population?
6:00
No way, but then,
how many people do you need?
6:03
1,000?
6:07
10,000?
6:08
Either way,
at some point your sample is big enough,
6:10
that you can start to drag conclusions
about the whole population.
6:12
This is what's statistical
significance is all about.
6:17
It tells us if our findings can be safely
extrapolated to the whole population.
6:20
If you'd like to read more
about statistical significance,
6:26
check out the teacher's notes below.
6:28
Coming up we'll look at how we can
present our findings to management.
6:30
You need to sign up for Treehouse in order to download course files.
Sign upYou need to sign up for Treehouse in order to set up Workspace
Sign up