Heads up! To view this whole video, sign in with your Courses account or enroll in your free 7-day trial. Sign In Enroll
Preview
Start a free Courses trial
to watch this video
APIs are all around us on the web. Sometimes we can use scraping techniques to interact with them in a meaningful way.
This video doesn't have any notes.
Related Discussions
Have questions about this video? Start a discussion with the community and Treehouse staff.
Sign upRelated Discussions
Have questions about this video? Start a discussion with the community and Treehouse staff.
Sign up
Back in the good old days of the Internet,
if we wanted data,
0:00
we had to view it on Web pages.
0:03
Now, however, many sites provide
a Web API that shares their data.
0:05
Sometimes, we can use these APIs
to directly access information,
0:11
without having to scrape the data.
0:15
I'd recommend looking to see if
the site you are wanting to scrape
0:18
offers an API for
the information you need.
0:21
It can be a big time saver.
0:24
Let's take a look at how we can get data
from the World Bank, using their API.
0:27
There are many instances
when using an API is great.
0:32
Sometimes, though, scraping results
from an API is useful as well,
0:36
especially if the API
documentation isn't super helpful.
0:41
Let's take a brief look at one
technique we can use to get and
0:45
process data from an API.
0:49
In this case,
we'll look at The World Bank API.
0:51
It's actually very well documented, which
provides us with some extra knowledge
0:54
as we go about trying to scrape things.
0:59
If we look here, at
the Developer Information overview page,
1:01
it provides information about how to
get started, and what the API provides.
1:04
Let's look here,
1:09
at the Country Queries section, to see
what information we might explore there.
1:10
It looks like we could use this
information to get some generic
1:15
information about
the countries of the world.
1:18
For example, if we wanted to do some
high-level data exploration about
1:21
income level in regions of the world,
let's use this request format here,
1:25
look through some ISO codes, and get
some information that we could explore.
1:31
We won't be doing any actual
exploration of data in this course, but
1:35
check the teachers' notes for
more information.
1:40
Let's take a look at the information we
get from a country with a lot of horses,
1:42
like Ethiopia.
1:46
I know their ISO code is ETH, so
let's put that into the request format.
1:47
So we can copy this, Let's create
1:53
a new tab, and we'll do ETH.
1:58
It looks like we're getting back the same
information as the documentation stated,
2:02
and it's in XML format.
2:06
That's great, we can handle that, we'll
use Beautiful Soup to parse this XML,
2:08
and get the name, region,
and income level.
2:13
This could be used, for
2:16
example, to generate a histogram chart of
regions of the world and income levels.
2:18
Lots of options for
data visualization, here.
2:23
Let's go back to our code, and
create a new world_bank.py file.
2:26
We don't need it inside the spider.
2:31
world_bank.py, and
we'll start with our imports.
2:38
So, from urlib.request import urlopen.
2:42
We're going back to Beautiful Soup,
so bs4 import BeautifulSoup, and
2:47
we'll be using a csv file of ISO codes,
so we 'll want to import csv as well.
2:55
Let's define a function to get
the country information, get_country,
3:02
and we'll pass in our country code, and
3:08
just like we've done with Beautiful Soup
in the past, we define our HTML string.
3:13
It's urlopen, and
3:19
it's that request format string
that we saw just a moment ago,
3:23
worldbank.org/v2/countries/, and
we'll use the string formatter,
3:28
country_code, and
let's bring this down to a new line.
3:39
Next, we define our soup object.
3:44
So, we pass in our HTML,
and for our parser,
3:48
since we're dealing with XML,
we can use an XML parser.
3:52
Scraping XML is pretty
straightforward with Beautiful Soup.
3:58
If we look at the results we got for
Ethiopia, we want to get three fields,
4:02
wb:name, wb:region, and
the wb:incomeLevel.
4:07
Let's go ahead and define those.
4:13
Country_name is soup.find( 'wb:name' ),
4:16
Region, ( 'wb:region' ),
4:26
and income_level, soup.find(
4:31
'wb:incomelevel' ), and
it was all lowercase.
4:36
Now, let's print that information out.
4:43
Here's a good example of a time when
we can use the get_text method.
4:45
get_text, and we'll print the region,
4:52
get_text, and the income_level.
4:57
Now, we can loop through the ISO codes,
and pass them to our get_country method.
5:06
So, if __name__, == '__main__':,
5:12
Let's bring that up on
the screen a little bit,
5:19
I've included a file of ISO codes
that we can open up and read.
5:22
So, file, country_code,
5:27
oop, country_iso_codes.csv,
5:33
want to read that.
5:38
Now, iso_codes, then, will be our reader,
File, and our delimiter is ",".
5:43
Now, we can loop through our file,
and get our information.
5:53
for code in iso_codes, and we want to
pass in our code into our get_country
5:58
method, and
we want the first one from the list.
6:03
Now, we can run world_bank.
6:12
And it looks like I made
a mistake back up here,
6:18
it wasn't all lowercase,
it's actually incomeLevel.
6:21
Let's try it again, and
we get all of our expected data.
6:25
Again, we could do something else here,
6:30
like saving the information
to a csv file or database.
6:32
Check the teachers' notes for
more resources on that.
6:36
You need to sign up for Treehouse in order to download course files.
Sign upYou need to sign up for Treehouse in order to set up Workspace
Sign up