沫言|嫌多(线程/进程)太慢?嫌Scrapy太麻烦?没事,异步高调走起( 二 )


发送请求:
async def scrape(self, url):async with self.semaphore:async with aiohttp.ClientSession(headers=self.header).get(url) as response:await asyncio.sleep(1)return await response.text()注意:再次加入请求头 , 本网站对Cookies有严格的检测 。 且并不能挂IP代理访问
解析响应:
async def parse(self, html):with open('car.csv', 'a+', encoding='utf-8') as f:doc = pq(html)for message in doc('body > div.list-wrap.js-post > ul > li > a').items():# 汽车简介car_name = message('h2.t').text()# 汽车详情(年限、里程、服务)car_info = message('div.t-i').text()year = car_info[:5]mileage = car_info[6:-5]service = car_info[13:].replace('|', '')# 价格try:price = message('div.t-price > p').text()except AttributeError:price = message('em.line-through').text()car_pic = message('img').attr('src')data = http://kandian.youth.cn/index/f'{car_name}, {year},{mileage}, {service}, {price}n'logging.info(data)f.write(data)我这里是直接一步到位了 , 解析响应 , 以及保存数据 。
运行之后即可看到类似于这样的东东
沫言|嫌多(线程/进程)太慢?嫌Scrapy太麻烦?没事,异步高调走起


推荐阅读