Heads up! To view this whole video, sign in with your Courses account or enroll in your free 7-day trial. Sign In Enroll
Well done!
      You have completed Machine Learning Basics!
      
    
You have completed Machine Learning Basics!
Preview
    
      
  Before we can write a classifier, we need something to classify. That is, we need a dataset.
Resources
- Iris flower dataset | Wikipedia
- load_iris() | scikit-learn Documentation
- Treehouse Workshop: Introducing Text Editors
- Which Text Editor Should I Use? | Treehouse Blog
- A Beginner’s Guide To The Windows Command Line
Python Code
from sklearn.datasets import load_iris
iris = load_iris()
print(list(iris.target_names))
Related Discussions
Have questions about this video? Start a discussion with the community and Treehouse staff.
Sign upRelated Discussions
Have questions about this video? Start a discussion with the community and Treehouse staff.
Sign up
                      Before we can write a classifier
we need something to classify,
                      0:00
                    
                    
                      that is we need a data set.
                      0:03
                    
                    
                      One of the most classic data sets in all
of machine learning is the Iris data set
                      0:06
                    
                    
                      which is a set of 150 examples
of three different types of
                      0:12
                    
                    
                      Iris flowers, the Satosa,
Versicolor and Virginica.
                      0:17
                    
                    
                      In fact, the iris flower data set
even has its own Wikipedia page,
                      0:23
                    
                    
                      to which you can find a link in
the notes associated with this video.
                      0:28
                    
                    
                      The Iris flower data set is like
the Hello World program of data sets.
                      0:32
                    
                    
                      It's not meant to be used in practical
applications, but it's good for testing
                      0:38
                    
                    
                      machine learning techniques, particularly
ones that involve classification.
                      0:42
                    
                    
                      If you scroll down to the data set section
and click the show button next to data.
                      0:47
                    
                    
                      You can see that this data
set has four features.
                      0:56
                    
                    
                      The length and width of each sepal and
the length and width of each petal.
                      1:00
                    
                    
                      After these four features there's a label,
                      1:07
                    
                    
                      which is the species of the iris flower,
                      1:13
                    
                    
                      setosa, versicolor, and virginica.
                      1:18
                    
                    
                      Each of these three labels has
50 examples in the data set for
                      1:23
                    
                    
                      a total of 150 examples.
                      1:28
                    
                    
                      Let's look at another page of
the documentation in Sklearn,
                      1:30
                    
                    
                      which you can also find a link to in
the notes associated with this video.
                      1:36
                    
                    
                      Sklearn has a number of small datasets
                      1:40
                    
                    
                      built in to demonstrate the different
tools available in Sklearn.
                      1:44
                    
                    
                      And one of them happens to
be the Iris flower dataset.
                      1:48
                    
                    
                      This dataset is too small for
real machine learning analysis but
                      1:53
                    
                    
                      it's still useful for testing things
out in this case classification.
                      1:57
                    
                    
                      We're going to load this data
set into a python program and
                      2:02
                    
                    
                      then make a new example and
try to predict the label.
                      2:05
                    
                    
                      First, open your favorite text editor.
                      2:11
                    
                    
                      In these lessons, I'm going to use Atom,
which is available on MAC and
                      2:14
                    
                    
                      PC, but any plain text
editor should work the same.
                      2:19
                    
                    
                      If you're not sure which to use, check
the notes associated with this video.
                      2:23
                    
                    
                      First, create a new file if
you haven't already done so.
                      2:28
                    
                    
                      And save it as ml.py.
                      2:31
                    
                    
                      I already have an ml.py but
I'm just going to save over it.
                      2:40
                    
                    
                      The ml stands for
machine learning and py means Python.
                      2:46
                    
                    
                      You can actually name file whatever
you would like as long it ends in .py.
                      2:53
                    
                    
                      Make sure you remember where your
saving this one on our computer,
                      2:57
                    
                    
                      because you need to access it
later from a command line console.
                      3:02
                    
                    
                      Now, l am going to start by
importing this Iris dataset,
                      3:07
                    
                    
                      so will say from sklearn.datasets and
                      3:15
                    
                    
                      then another space.
                      3:20
                    
                    
                      I'll type import and then another space,
                      3:23
                    
                    
                      and we'll type load_iris.
                      3:29
                    
                    
                      The data set isn't quite ready to use yet,
                      3:36
                    
                    
                      we have to assign it to a variable
in our code, like this.
                      3:39
                    
                    
                      I'll type iris and an equal sign and
                      3:44
                    
                    
                      then use the function, load_iris.
                      3:48
                    
                    
                      Now we could print the entire data set,
but that's going to look pretty ugly
                      3:54
                    
                    
                      on the console and won't really
be all that useful to us anyway.
                      3:59
                    
                    
                      Instead, let's just print the labels,
otherwise known as target names,
                      4:04
                    
                    
                      just to make sure that we've
loaded the dataset correctly.
                      4:09
                    
                    
                      We can do that by using
the print function and
                      4:13
                    
                    
                      converting the target names
into a list like this.
                      4:17
                    
                    
                      So we'll type print and
some parentheses, and
                      4:22
                    
                    
                      inside we'll type list
which is a function.
                      4:26
                    
                    
                      And inside the list function,
                      4:31
                    
                    
                      we'll use the iris variable that
we created followed by a dot.
                      4:35
                    
                    
                      And we'll type target underscore names.
                      4:41
                    
                    
                      And that will list and
print out the target names or
                      4:44
                    
                    
                      the labels in the Iris dataset.
                      4:49
                    
                    
                      Now make sure you've typed everything
carefully and then save the file.
                      4:53
                    
                    
                      Now go back to Anaconda Navigator and
                      4:59
                    
                    
                      make sure you're in your machine
learning basics environment.
                      5:04
                    
                    
                      And click the play button,
and choose Open Terminal.
                      5:08
                    
                    
                      We could use the interactive
Python command line, but
                      5:15
                    
                    
                      using the terminal will be a little
easier for running files like this.
                      5:18
                    
                    
                      If you're on Windows, your terminal will
obviously look different than on a Mac.
                      5:24
                    
                    
                      But the general principles
should remain the same.
                      5:29
                    
                    
                      Next, you'll need to navigate to
the directory where you stored your file.
                      5:33
                    
                    
                      So in my case, I know it's in my home
directory inside my Dropbox folder.
                      5:38
                    
                    
                      Under treehouse, courses,
machine learning, basics,
                      5:47
                    
                    
                      and so now I've changed to that directory
and I will list out its contents.
                      5:53
                    
                    
                      And like I said, this is a little
different on Mac and Windows.
                      5:59
                    
                    
                      So if you do need some additional help,
pause this video and check out the notes.
                      6:03
                    
                    
                      Once you've navigated to the folder
where your Python file is saved,
                      6:09
                    
                    
                      type the word python followed by a space,
                      6:14
                    
                    
                      followed by the name of your program,
ml.py and then hit enter.
                      6:19
                    
                    
                      You should see the three labels
in the data set, setosa,
                      6:28
                    
                    
                      versicolor and virginica.
                      6:32
                    
                    
                      If you get an error go
back to your code and
                      6:35
                    
                    
                      make sure it's exactly the same as mine.
                      6:38
                    
                    
                      It's easy to miss a parentheses or
make a small typo so check carefully.
                      6:42
                    
                    
                      If you need help, check out the notes
in this video for the exact code.
                      6:47
                    
                    
                      Great, now that we've loaded a dataset,
next, we'll use it to make predictions.
                      6:53
                    
              
        You need to sign up for Treehouse in order to download course files.
Sign upYou need to sign up for Treehouse in order to set up Workspace
Sign up