标题:关于Pool线程池的问题
只看楼主
Mungki
Rank: 2
等 级:论坛游民
帖 子:22
专家分:22
注 册:2020-8-10
结帖率:60%
已结贴  问题点数:10 回复次数:2 
关于Pool线程池的问题
本人写了个爬虫,加了线程池,运行无报错,可我却发现了一个问题,如图

爬取的顺序是乱的,请问这个问题如何解决,谢谢
代码如下:
from multiprocessing import Pool
import re
from lxml import etree
import requests

headers = {
    'UserAgent':'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:6.0) Gecko/20100101 Firefox/6.0'
}

def get_html():
    url = "http://www.
    response = requests.get(url, headers=headers)
    print('%s 请求成功' %(url))
    response.encoding = 'utf-8'
    response = etree.HTML(response.text)
    url_list = response.xpath('/html/body/div[7]/div/ul//li/span/a/@href')
    return url_list

def save_html(url):
    response = requests.get(url, headers=headers)
    response.encoding = 'utf-8'
    response = etree.HTML(response.text)
    title = response.xpath('//title/text()')[0].replace('?', '')
    # title = response.xpath('//div[@cLass="bg"]/h1/text()')[0]
    text_list = response.xpath('//div[@class="bg"]/div[@class="content"]//p/text()')
    with open('C:/Users/Administrator/Desktop/斗罗大陆/%s.txt' %(title), 'w', encoding='utf-8') as file:
        file.write(title + '\n')
        for text in text_list:
            file.write('\t%s\n' %(text))
    print('《%s》爬取成功' %(title))
 
if __name__ == "__main__":
    url_list = get_html()
    pool = Pool(4)
    pool.map(save_html, url_list)
搜索更多相关主题的帖子: div response text Pool title 
2020-11-21 15:13
fall_bernana
Rank: 11Rank: 11Rank: 11Rank: 11
等 级:贵宾
威 望:17
帖 子:240
专家分:2086
注 册:2019-8-16
得分:10 
回复 楼主 Mungki
这个跟你爬取的链接返回的速度有关。有的返回快,有的返回慢,只要最终结果是一致的。有什么问题呢?
程序代码:
from multiprocessing.pool import Pool
import time
def hhh(i):
    if i%2==0:
        time.sleep(2)
    print(i,i%2,i * 2)


if __name__ == '__main__':
    pool = Pool(4)
    pool.map(hhh, [1, 2, 3,4,5,6,7,8,9])
    #print(hh)
---------------------------------
1 1 2
3 1 6
5 1 10
7 1 14
2 0 4
9 1 18
4 0 8
6 0 12
8 0 16

2020-11-23 10:34
phiplato
Rank: 2
等 级:新手上路
威 望:3
帖 子:22
专家分:7
注 册:2020-4-24
得分:0 
自己做个目录,然后做个链接,目录可以用章节排序。
2020-12-26 22:02



参与讨论请移步原网站贴子:https://bbs.bccn.net/thread-503954-1-1.html




关于我们 | 广告合作 | 编程中国 | 清除Cookies | TOP | 手机版

编程中国 版权所有,并保留所有权利。
Powered by Discuz, Processed in 2.560636 second(s), 10 queries.
Copyright©2004-2025, BCCN.NET, All Rights Reserved