DATE: 2018-01-25 15:36:44
初学Scrapy,开始以为是官网教程很坑爹的有错,仔细排查后发现还是自己太粗心。
(版本问题:2018-1-25,python 2.7.11)
报错:No module named tutorial.items
解决问题注意两点:
tutorial.items
这个导入包,不知为何不识别,只能以..items
的方式导入。
import scrapy
class DmozItem(scrapy.Item):
# name = scrapy.Field()
title = scrapy.Field()
link = scrapy.Field()
desc = scrapy.Field()
pass
正确导入,解决问题。
from ..items import *
在scrapy目录下执行命令:
`\turorial\spiders>scrapy crawl dmoz -o items.json`
测试发现返回200OK
![这里写图片描述](http://img.blog.csdn.net/20180125153522655?watermark/2/text/aHR0cDovL2Jsb2cuY3Nkbi5uZXQvZHJlYW1zdG9uZV94aWFvcXc=/font/5a6L5L2T/fontsize/400/fill/I0JBQkFCMA==/dissolve/70/gravity/SouthEast)
再看下`items.json` 的文件内容:
![这里写图片描述](http://img.blog.csdn.net/20180126121141630?watermark/2/text/aHR0cDovL2Jsb2cuY3Nkbi5uZXQvZHJlYW1zdG9uZV94aWFvcXc=/font/5a6L5L2T/fontsize/400/fill/I0JBQkFCMA==/dissolve/70/gravity/SouthEast)
结果满足预期。
附:代码全文
import scrapy from ..items import DmozItem
class DmozSpider(scrapy.Spider): name = "dmoz" allowed_domains = ["yixzm.cn"] start_urls = [ "http://www.yixzm.cn" ]
def parse(self, response):
for sel in response.xpath('//ul/li'):
item = DmozItem()
item['title'] = sel.xpath('a/text()').extract()
item['link'] = sel.xpath('a/@href').extract()
item['desc'] = sel.xpath('text()').extract()
yield item