tmall 爬虫出问题,单独调试无问题,运行时报错

改一个地方
tmall 爬虫出问题,单独调试无问题,运行时报错

可能你手误打错了。建议使用一个IDE写程序,这种错误IED可以发现。
还有一个建议就是别把你的敏感信息贴出来,比如cookie,虽然大多数程序员都是纯真善良的,不排除有坏人

■网友
【tmall 爬虫出问题,单独调试无问题,运行时报错】 代码如下:
#!/usr/bin/env python3# -*- coding: utf-8 -*-#spider_tmall_price_v1.0:天猫商品比价定向爬虫(存储到CSV文件)import requestsimport reimport csvfrom bs4 import BeautifulSoupimport timedef getHTMLText(url):\ttry:\t\tcookie = \u0026#39;\u0026#39;\u0026#39;感谢楼上提醒,暂时隐藏cookie\u0026#39;\u0026#39;\u0026#39;\t\theaders = {\t\t"User-Agent":\u0026#39;Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.102 Safari/537.36\u0026#39;,\t\t"Cookie":cookie}\t\tr = requests.get(url,headers=headers,timeout=30)\t\tr.raise_for_status()\t\tr.encoding = r.apparent_encoding\t\treturn r.text\texcept:\t\treturn "获取有问题"def parsePage(ilt,html):\ttry:\t\tptprice = soup.find_all(text=re.compile("^\\d+\\.\\d{2}$")) #"productPrice" 商品价格\t\tpttitle = soup.find_all(\u0026#39;p\u0026#39;,class_="productTitle") #"productTitle" 商品名称\t\tshopname = soup.find_all(\u0026#39;a\u0026#39;,class_="productShop-name") #"productShop" 店铺名称\t\tptsales = soup.find_all(\u0026#39;p\u0026#39;,class_=\u0026#39;productStatus\u0026#39;) #"productStatus"月销量\t\tptcomment = soup.find_all(\u0026#39;p\u0026#39;,class_=\u0026#39;productStatus\u0026#39;) #"productStatus"评价数\t\tfor i in range(len(pttitle)):\t\t\tgoodsprice = list(map(float,ptprice)) #商品价格\t\t\tgoodsname = pttitle.text.split(\u0026#39;\\u0026#39;) #商品名称\t\t\tgoodshop = shopname.text.split(\u0026#39;\\u0026#39;) #店铺名称\t\t\tgoodsales = ptsales.text.split(\u0026#39;\\u0026#39;) #月销量\t\t\tcommentsnum = ptcomment.text.split(\u0026#39;\\u0026#39;) #评价数\t\t\tilt.append()\texcept:\t\tprint(\u0026#39;解析有问题\u0026#39;)def printGoodsList(ilt): \twith open(\u0026#39;天猫前5页床垫产品基础情况.csv\u0026#39;,\u0026#39;w+\u0026#39;,newline=\u0026#39;\u0026#39;) as csvFile:\t\twriter = csv.writer(csvFile)\t\twriter.writerow(("序号","价格","商品名","店铺名","月销量","评价数"))\t\tcount = 0\t\tfor g in ilt:\t\t\tcount = count + 1\t\t\twriter.writerow((count,g,g,g,g,g))def main(): \tkeycode = \u0026#39;uYMVLo\u0026#39; #床垫的随机码,其他随机码好像也能抓取这个关键词的结果\tdepth = 2\tstart_url = \u0026#39;https://list.tmall.com/search_product.htm?spm=a220m.1000858.0.0.\u0026#39; + keycode + \u0026#39;\u0026amp;s=\u0026#39;\tend_url = \u0026#39;\u0026amp;q=%B4%B2%B5%E6\u0026amp;sort=s\u0026amp;style=g\u0026amp;from=mallfp..pc_1_searchbutton\u0026amp;type=pc#J_Filter\u0026#39;\tinfoList = \tfor n in range(depth):\t\ttry:\t\t\turl = start_url + str(60*n) + end_url\t\t\thtml = getHTMLText(url)\t\t\ttime.sleep(5)\t\t\tsoup = BeautifulSoup(html,\u0026#39;lxml\u0026#39;)\t\t\tparsePage(infoList,soup)\t\texcept:\t\t\tcontinue\tprintGoodsList(infoList)main()


推荐阅读