Welcome to the Treehouse Community
Want to collaborate on code errors? Have bugs you need feedback on? Looking for an extra set of eyes on your latest project? Get support with fellow developers, designers, and programmers of all backgrounds and skill levels here with the Treehouse Community! While you're at it, check out some resources Treehouse students have shared here.
Looking to learn something new?
Treehouse offers a seven day free trial for new students. Get access to thousands of hours of content and join thousands of Treehouse students and alumni in the community today.
Start your free trialmichaelmugo
3,971 PointsI don't understand the combination of JOIN with GROUP BY
So I understand how GROUP BY works to combine result-sets by columns, and it makes sense coming from the same table: reviews
.
SELECT movie_id, MIN(score) AS minimum_score, MAX(score) AS maximum_score,
AVG(score) AS average,
FROM reviews GROUP BY movie_id;
But later on in the video, we SELECT title to show the results in a more human-understandable form:
SELECT title, MIN(score) AS minimum_score, MAX(score) AS maximum_score,
AVG(score) AS average,
FROM movies JOIN reviews ON movies.id = reviews.movie_id
GROUP BY movie_id;
How exactly are the numeric functions still able to get their data, isn't the movies
table being used where there's no score
?
Maybe I'm not understanding it properly, but as far as I've learnt the numeric functions use the data using the FROM some_given_table
1 Answer
LaVaughn Haynes
12,397 PointsIt looks like you are joining 2 tables: Movies and Reviews. The Movie table has titles and the Reviews table has scores. When you join 2 tables it's like you create a virtual 3rd table. So if you had the tables
Movies
ID | TITLE
-------------------------
1 | The Avengers
-------------------------
2 | The Happening
Reviews
ID | SCORE
-------------------------
1 | 100
-------------------------
2 | 50
If you join them
SELECT title, score FROM movies JOIN reviews ON movies.id = reviews.id
Then you really get a table like this as the result
TITLE | SCORE
-------------------------------------
The Avengers | 100
-------------------------------------
The Happening | 50
So in the original SQL statement from your question, the MAX(), MIN(), AVG() functions are not operating on data from the movies table, it's operating on the data from the reviews table. You just have the data from both tables being queried and returned to that 3rd virtual result table in the same statement. It might be less confusing for you if you used the more specific dot syntax names in your query like this:
SELECT movies.title, MIN(reviews.score) AS minimum_score, MAX(reviews.score) AS maximum_score...
I don't know if that makes sense to you or not?
michaelmugo
3,971 Pointsmichaelmugo
3,971 PointsSo, the numeric functions are dealing with the whole result-set and not just the
movies
table?LaVaughn Haynes
12,397 PointsLaVaughn Haynes
12,397 PointsI would say more that the numeric functions are acting on the rows of the reviews table and pushing that into the result set
SELECT movies.title, AVG(reviews.score)
So above I'm saying, "give me the title from the first table, and an average of all of the scores from the second table"
The only reason that you can write AVG(sore) instead of just AVG(reviews.score) is because there is no score column in the movies table, so you don't have to be specific. But honestly I'm always specific because I feel like it's more clear if I have to look at a query months down the line.