Welcome to the Treehouse Community

Want to collaborate on code errors? Have bugs you need feedback on? Looking for an extra set of eyes on your latest project? Get support with fellow developers, designers, and programmers of all backgrounds and skill levels here with the Treehouse Community! While you're at it, check out some resources Treehouse students have shared here.

Looking to learn something new?

Treehouse offers a seven day free trial for new students. Get access to thousands of hours of content and join thousands of Treehouse students and alumni in the community today.

Start your free trial

Python Scraping Data From the Web A World Full of Spiders Crawling Spiders

No module found error

This is the code I inputted in the Spyder IDE

import scrapy


class HorseSpider(scrapy.Spider):

    name = "ike"

    def start_requests(self):
        urls = ["https://treehouse-projects.github.io/horse-land/index.html",
               "https://treehouse-projects.github.io/horse-land/mustang.html"]

        return [scrapy.Request(url=url, callback=self.parse) for url in urls]    

    def parse(self, response):
        url = response.url
        page = url.split("/")[-1]
        filename = "horses-{}".format(page)
        print("URL is: {}".format(url))
        with open (filename, "wb") as file:
            file.write(response.body)
        print("Saved file {}".format(filename))

But this is the output I am getting in the terminal. Please help

C:\Users\User\Desktop\ScrapyTests\AraneaSpyder\AraneaSpyder\spiders>scrapy crawl
ike
2021-01-07 20:13:18 [scrapy.utils.log] INFO: Scrapy 2.4.1 started (bot: AraneaSpyder)
2021-01-07 20:13:18 [scrapy.utils.log] INFO: Versions: lxml 4.6.2.0, libxml2 2.9.10, cssselect 1.1.0, parsel 1.5.2, w3lib 1.21.0, Twisted 20.3.0, Python 3.8.5 (default, Sep  3 2020, 21:29:08) [MSC v.1916 64 bit (AMD64)], pyOpenSSL 20.0.1 (OpenSSL 1.1.1i  8 Dec 2020), cryptography 3.3.1, Platform Windows-10-10.0.18362-SP0
2021-01-07 20:13:18 [scrapy.utils.log] DEBUG: Using reactor: twisted.internet.selectreactor.SelectReactor
2021-01-07 20:13:18 [scrapy.crawler] INFO: Overridden settings:
{'BOT_NAME': 'AraneaSpyder',
 'NEWSPIDER_MODULE': 'AraneaSpyder.spiders',
 'ROBOTSTXT_OBEY': True,
 'SPIDER_MODULES': ['AraneaSpyder.spiders']}
2021-01-07 20:13:18 [scrapy.extensions.telnet] INFO: Telnet Password: 14ae6acfbbea13ed
2021-01-07 20:13:18 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.corestats.CoreStats',
 'scrapy.extensions.telnet.TelnetConsole',
 'scrapy.extensions.logstats.LogStats']
Unhandled error in Deferred:
2021-01-07 20:13:19 [twisted] CRITICAL: Unhandled error in Deferred:

Traceback (most recent call last):
  File "C:\Users\User\anaconda3\envs\Scrapy enviroment\lib\site-packages\scrapy\crawler.py", line 192, in crawl
    return self._crawl(crawler, *args, **kwargs)
  File "C:\Users\User\anaconda3\envs\Scrapy enviroment\lib\site-packages\scrapy\crawler.py", line 196, in _crawl
    d = crawler.crawl(*args, **kwargs)
  File "C:\Users\User\anaconda3\envs\Scrapy enviroment\lib\site-packages\twisted\internet\defer.py", line 1613, in unwindGenerator
    return _cancellableInlineCallbacks(gen)
  File "C:\Users\User\anaconda3\envs\Scrapy enviroment\lib\site-packages\twisted\internet\defer.py", line 1529, in _cancellableInlineCallbacks
    _inlineCallbacks(None, g, status)
--- <exception caught here> ---
  File "C:\Users\User\anaconda3\envs\Scrapy enviroment\lib\site-packages\twisted\internet\defer.py", line 1418, in _inlineCallbacks
    result = g.send(result)
  File "C:\Users\User\anaconda3\envs\Scrapy enviroment\lib\site-packages\scrapy\crawler.py", line 87, in crawl
    self.engine = self._create_engine()
  File "C:\Users\User\anaconda3\envs\Scrapy enviroment\lib\site-packages\scrapy\crawler.py", line 101, in _create_engine
    return ExecutionEngine(self, lambda _: self.stop())
  File "C:\Users\User\anaconda3\envs\Scrapy enviroment\lib\site-packages\scrapy\core\engine.py", line 69, in __init__
    self.downloader = downloader_cls(crawler)
  File "C:\Users\User\anaconda3\envs\Scrapy enviroment\lib\site-packages\scrapy\core\downloader\__init__.py", line 83, in __init__
    self.middleware = DownloaderMiddlewareManager.from_crawler(crawler)
  File "C:\Users\User\anaconda3\envs\Scrapy enviroment\lib\site-packages\scrapy\middleware.py", line 53, in from_crawler
    return cls.from_settings(crawler.settings, crawler)
  File "C:\Users\User\anaconda3\envs\Scrapy enviroment\lib\site-packages\scrapy\middleware.py", line 35, in from_settings
    mw = create_instance(mwcls, settings, crawler)
  File "C:\Users\User\anaconda3\envs\Scrapy enviroment\lib\site-packages\scrapy\utils\misc.py", line 167, in create_instance
    instance = objcls.from_crawler(crawler, *args, **kwargs)
  File "C:\Users\User\anaconda3\envs\Scrapy enviroment\lib\site-packages\scrapy\downloadermiddlewares\robotstxt.py", line 36, in from_crawler
    return cls(crawler)
  File "C:\Users\User\anaconda3\envs\Scrapy enviroment\lib\site-packages\scrapy\downloadermiddlewares\robotstxt.py", line 32, in __init__
    self._parserimpl.from_crawler(self.crawler, b'')
  File "C:\Users\User\anaconda3\envs\Scrapy enviroment\lib\site-packages\scrapy\robotstxt.py", line 124, in from_crawler
    o = cls(robotstxt_body, spider)
  File "C:\Users\User\anaconda3\envs\Scrapy enviroment\lib\site-packages\scrapy\robotstxt.py", line 116, in __init__
    from protego import Protego
builtins.ModuleNotFoundError: No module named 'protego' 

[MOD: added ```python formatting -cf]

NB: protego is already installed

1 Answer

ryantalbot2
ryantalbot2
12,537 Points

do you have the package already? go to pyCharm in top left, then preferences, click plus sign, then search and add protego