Welcome to the Treehouse Community
Want to collaborate on code errors? Have bugs you need feedback on? Looking for an extra set of eyes on your latest project? Get support with fellow developers, designers, and programmers of all backgrounds and skill levels here with the Treehouse Community! While you're at it, check out some resources Treehouse students have shared here.
Looking to learn something new?
Treehouse offers a seven day free trial for new students. Get access to thousands of hours of content and join thousands of Treehouse students and alumni in the community today.
Start your free trialUnashe Mutambashora
3,433 PointsNo module found error
This is the code I inputted in the Spyder IDE
import scrapy
class HorseSpider(scrapy.Spider):
name = "ike"
def start_requests(self):
urls = ["https://treehouse-projects.github.io/horse-land/index.html",
"https://treehouse-projects.github.io/horse-land/mustang.html"]
return [scrapy.Request(url=url, callback=self.parse) for url in urls]
def parse(self, response):
url = response.url
page = url.split("/")[-1]
filename = "horses-{}".format(page)
print("URL is: {}".format(url))
with open (filename, "wb") as file:
file.write(response.body)
print("Saved file {}".format(filename))
But this is the output I am getting in the terminal. Please help
C:\Users\User\Desktop\ScrapyTests\AraneaSpyder\AraneaSpyder\spiders>scrapy crawl
ike
2021-01-07 20:13:18 [scrapy.utils.log] INFO: Scrapy 2.4.1 started (bot: AraneaSpyder)
2021-01-07 20:13:18 [scrapy.utils.log] INFO: Versions: lxml 4.6.2.0, libxml2 2.9.10, cssselect 1.1.0, parsel 1.5.2, w3lib 1.21.0, Twisted 20.3.0, Python 3.8.5 (default, Sep 3 2020, 21:29:08) [MSC v.1916 64 bit (AMD64)], pyOpenSSL 20.0.1 (OpenSSL 1.1.1i 8 Dec 2020), cryptography 3.3.1, Platform Windows-10-10.0.18362-SP0
2021-01-07 20:13:18 [scrapy.utils.log] DEBUG: Using reactor: twisted.internet.selectreactor.SelectReactor
2021-01-07 20:13:18 [scrapy.crawler] INFO: Overridden settings:
{'BOT_NAME': 'AraneaSpyder',
'NEWSPIDER_MODULE': 'AraneaSpyder.spiders',
'ROBOTSTXT_OBEY': True,
'SPIDER_MODULES': ['AraneaSpyder.spiders']}
2021-01-07 20:13:18 [scrapy.extensions.telnet] INFO: Telnet Password: 14ae6acfbbea13ed
2021-01-07 20:13:18 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.corestats.CoreStats',
'scrapy.extensions.telnet.TelnetConsole',
'scrapy.extensions.logstats.LogStats']
Unhandled error in Deferred:
2021-01-07 20:13:19 [twisted] CRITICAL: Unhandled error in Deferred:
Traceback (most recent call last):
File "C:\Users\User\anaconda3\envs\Scrapy enviroment\lib\site-packages\scrapy\crawler.py", line 192, in crawl
return self._crawl(crawler, *args, **kwargs)
File "C:\Users\User\anaconda3\envs\Scrapy enviroment\lib\site-packages\scrapy\crawler.py", line 196, in _crawl
d = crawler.crawl(*args, **kwargs)
File "C:\Users\User\anaconda3\envs\Scrapy enviroment\lib\site-packages\twisted\internet\defer.py", line 1613, in unwindGenerator
return _cancellableInlineCallbacks(gen)
File "C:\Users\User\anaconda3\envs\Scrapy enviroment\lib\site-packages\twisted\internet\defer.py", line 1529, in _cancellableInlineCallbacks
_inlineCallbacks(None, g, status)
--- <exception caught here> ---
File "C:\Users\User\anaconda3\envs\Scrapy enviroment\lib\site-packages\twisted\internet\defer.py", line 1418, in _inlineCallbacks
result = g.send(result)
File "C:\Users\User\anaconda3\envs\Scrapy enviroment\lib\site-packages\scrapy\crawler.py", line 87, in crawl
self.engine = self._create_engine()
File "C:\Users\User\anaconda3\envs\Scrapy enviroment\lib\site-packages\scrapy\crawler.py", line 101, in _create_engine
return ExecutionEngine(self, lambda _: self.stop())
File "C:\Users\User\anaconda3\envs\Scrapy enviroment\lib\site-packages\scrapy\core\engine.py", line 69, in __init__
self.downloader = downloader_cls(crawler)
File "C:\Users\User\anaconda3\envs\Scrapy enviroment\lib\site-packages\scrapy\core\downloader\__init__.py", line 83, in __init__
self.middleware = DownloaderMiddlewareManager.from_crawler(crawler)
File "C:\Users\User\anaconda3\envs\Scrapy enviroment\lib\site-packages\scrapy\middleware.py", line 53, in from_crawler
return cls.from_settings(crawler.settings, crawler)
File "C:\Users\User\anaconda3\envs\Scrapy enviroment\lib\site-packages\scrapy\middleware.py", line 35, in from_settings
mw = create_instance(mwcls, settings, crawler)
File "C:\Users\User\anaconda3\envs\Scrapy enviroment\lib\site-packages\scrapy\utils\misc.py", line 167, in create_instance
instance = objcls.from_crawler(crawler, *args, **kwargs)
File "C:\Users\User\anaconda3\envs\Scrapy enviroment\lib\site-packages\scrapy\downloadermiddlewares\robotstxt.py", line 36, in from_crawler
return cls(crawler)
File "C:\Users\User\anaconda3\envs\Scrapy enviroment\lib\site-packages\scrapy\downloadermiddlewares\robotstxt.py", line 32, in __init__
self._parserimpl.from_crawler(self.crawler, b'')
File "C:\Users\User\anaconda3\envs\Scrapy enviroment\lib\site-packages\scrapy\robotstxt.py", line 124, in from_crawler
o = cls(robotstxt_body, spider)
File "C:\Users\User\anaconda3\envs\Scrapy enviroment\lib\site-packages\scrapy\robotstxt.py", line 116, in __init__
from protego import Protego
builtins.ModuleNotFoundError: No module named 'protego'
[MOD: added ```python formatting -cf]
1 Answer
ryantalbot2
12,537 Pointsdo you have the package already? go to pyCharm in top left, then preferences, click plus sign, then search and add protego
Unashe Mutambashora
3,433 PointsUnashe Mutambashora
3,433 PointsNB: protego is already installed