Welcome to the Treehouse Community
Want to collaborate on code errors? Have bugs you need feedback on? Looking for an extra set of eyes on your latest project? Get support with fellow developers, designers, and programmers of all backgrounds and skill levels here with the Treehouse Community! While you're at it, check out some resources Treehouse students have shared here.
Looking to learn something new?
Treehouse offers a seven day free trial for new students. Get access to thousands of hours of content and join thousands of Treehouse students and alumni in the community today.
Start your free trial 
   
    Mark Chesney
11,747 PointsValueError: could not convert string to float: 'sepal_length'
Hi. Ken's code executes perfectly, while my code returns this error:
from itertools import groupby
import csv
import matplotlib.pyplot as plt
input_file = "data/iris.csv"
with open(input_file, 'r') as iris_data:
    irises = list(csv.reader(iris_data))
colors = {"Iris-setosa": "#2B5B84", "Iris-versicolor": "g", "Iris-virginica": "purple"}
irises.pop()  # because the list includes an extra unneeded item
for species, group in groupby(irises, lambda i: i[4]):
    import pdb; pdb.set_trace()
    categorized_irises = list(group)
    sepal_lengths = [float(iris[0]) for iris in categorized_irises]
    sepal_widths = [float(iris[1]) for iris in categorized_irises]
    plt.scatter(sepal_lengths, sepal_widths, s=10, c=colors[species], label=species)  # marker size of 10,
-------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-2-106afecb7f6d> in <module>()
     16 
     17     categorized_irises = list(group)
---> 18     sepal_lengths = [float(iris[0]) for iris in categorized_irises]
     19     sepal_widths = [float(iris[1]) for iris in categorized_irises]
     20     plt.scatter(sepal_lengths,sepal_widths,s=10,c=colors[species],label=species)
ValueError: could not convert string to float: 'sepal_length'
For a reference, there's a similar thread, but the responses provided unfortunately did not solve my error.
Thank you anyone in advance!
 
    Mark Chesney
11,747 Pointshi Cheo, I updated with the error I'm seeing
 
    Cameron Stewart
18,050 Pointstry removing the last row on the data set, AND the first (header)
irises.pop() #last row irises.pop(0) #first row
4 Answers
 
    ewelina krawczak
5,707 PointsHi again!
5.1,3.5,1.4,0.2,Iris-setosa
4.9,3.0,1.4,0.2,Iris-setosa
4.7,3.2,1.3,0.2,Iris-setosa
4.6,3.1,1.5,0.2,Iris-setosa
Those are 4 first lines of my csv file with coma as a separator of "columns" in csv file."Iris -setosa" has index 4-its in 5th "column" of csv.Does it look the same in Your file?
I would reccomed you doing the following:
just after
with open(input_file, 'r') as iris_data:
    irises = list(csv.reader(iris_data))
I would check what irises returns in lines
for i in irises:
    print (i)
 
    Mark Chesney
11,747 PointsYes, mine matches your results. Thanks anyway for your help; I gladly appreciate it.
 
    Mark Chesney
11,747 PointsHi Ewelina, my problem is that the header is caught in that loop's first execution:
>>> irises[0]
['sepal_length', 'sepal_width', 'petal_length', 'petal_width', 'species']
sepal_lengths = [float(iris[0]) for iris in categorized_irises]
ValueError: could not convert string to float: 'sepal_length'
I'm wondering why in the instructor's code, the groupby function safely removes this header row from the loop's execution, but my code tries to treat that header row as data.
I tried removing the header row, but of course, it's needed the way Ken writes the loop. Wow, if anyone knows another workaround, I'm completely stumped!
 
    ewelina krawczak
5,707 PointsHave You checked the csv file?It's structure?Does it have last unnecessary field?Maybe the separator is different? Have you tried looping through irises list to check if it goes without a problem and returns all the lines with the correct order of data?
 
    Mark Chesney
11,747 PointsThanks for your questions, ewelina krawczak. Checking the csv is a fine idea... but I need a benchmark against which I can check it. I'm not quite sure if mine has the "last unnecessary field" -- if it doesn't have it, I won't know what it looks like. Same with the separator: if your separator gets the code to work well, then I'd love to see what your separator is. Otherwise I won't know what "different" would look like.
I found the Iris data on GitHub. someone else asked in a different thread where to get this too
Here's some code I looked at (I don't believe I used it), to obtain the iris data, from scikit-learn's website:
from sklearn import datasets
iris = datasets.load_iris()
 
    Mustafa Başaran
28,046 PointsHi Mark,
I have seen your other thread, as well. The below code works fine in my local environment (jupyter notebook on anaconda 1.8.7). input_file variable will be different of course depending on where you store the iris.csv file.
import csv
import matplotlib.pyplot as plt
from itertools import groupby 
input_file = "/Users/mustafabasaran/Desktop/iris.csv"
with open(input_file, 'r') as iris_data:
    irises = list(csv.reader(iris_data))
colors = {"Iris-setosa": "#2B5B84", "Iris-versicolor": "g", "Iris-virginica": "purple"}
irises.pop()
for species, group in groupby(irises, lambda i:i[4]):
    categorized_irises = list(group)
    sepal_lengths = [float(iris[0]) for iris in categorized_irises]
    sepal_widths = [float(iris[1]) for iris in categorized_irises]
    plt.scatter(sepal_lengths,sepal_widths,s=10,c=colors[species],label=species)
plt.title("Iris Data Set", fontsize=12)
plt.xlabel("sepal length (cm)",fontsize=10)
plt.ylabel("sepal width (cm)",fontsize=10)
plt.legend(loc="upper right")
plt.show()
I hope this helps.
 
    Mark Chesney
11,747 PointsThanks Mustafa... your code was functionally identical to mine, so the error persists, unfortunately :|
 
    ewelina krawczak
5,707 Points"I tried removing the header row, but of course, it's needed the way Ken writes the loop"
Why do You think the header row is needed?
Cheo R
37,150 PointsCheo R
37,150 PointsWhich error are you getting?