标题:新手有关于bs4 requests爬虫问题
只看楼主
勤学好问1231
Rank: 1
等 级:新手上路
帖 子:1
专家分:0
注 册:2018-5-7
 问题点数:0 回复次数:1 
新手有关于bs4 requests爬虫问题
要求爬取https://turnoff.us的漫画
# -*- coding:utf-8 -*-
import requests
import os
import bs4
url = "https://turnoff.us"
os.makedirs("漫画",True)

while not url.endswith('.prev href[previous]'):
    web = requests.get(url)
    web.raise_for_status()
    web_soup = bs4.BeautifulSoup(web.text)
    comics = web_soup.select('.post-content img[src]')
    if comics == []:
        print("no comic exists")
        continue

comics_linkage = comics[0].get("src")
comics_content = requests.get("https://turnoff.us" +comics_linkage)
comics_content.raise_for_status()
disk_file = os.path.join("漫画",os.path.basename(comics_linkage))
with open(disk_file,"wb") as f:
    for i in comics_content.iter_content(100000):
        f.write(i)
    f.close()

prevs = web_soup.select('.prev a[href]')
if prevs ==[]:
    print("cannot get prev linkage")

url = "http://turnoff.us/"+prevs[0].get('href')
哪里出了问题,创建文件夹内是空的,爬不到图片
搜索更多相关主题的帖子: import url href web get 
2018-05-07 16:07
wei_ai_lu
Rank: 3Rank: 3
等 级:论坛游侠
威 望:3
帖 子:19
专家分:158
注 册:2018-5-30
得分:0 
死循环+缩进
2018-05-30 17:30



参与讨论请移步原网站贴子:https://bbs.bccn.net/thread-486732-1-1.html




关于我们 | 广告合作 | 编程中国 | 清除Cookies | TOP | 手机版

编程中国 版权所有,并保留所有权利。
Powered by Discuz, Processed in 0.132010 second(s), 7 queries.
Copyright©2004-2024, BCCN.NET, All Rights Reserved