标题:[急]python的多线程爬虫,跑着跑着就出现了“段错误”,求大虾指点?
只看楼主
sophiabing
Rank: 1
等 级:新手上路
帖 子:4
专家分:0
注 册:2010-4-6
结帖率:0
已结贴  问题点数:0 回复次数:4 
[急]python的多线程爬虫,跑着跑着就出现了“段错误”,求大虾指点?
附上代码:
-------------
jobs=Queue.Queue()
limit=20

def thread():
    while True:
        i=jobs.get()
        line=linecache.getline('nick.txt',i)

        lj=('(.*?)\n')   

        mat=lj.match(line)

        if mat:

                nick=mat.groups()[0]
        try:
            f=urllib.urlopen('http://'+nick+'.').read()
            print str(i)+'\t'+nick+'\t'+f[:10]   
        except:
            pass   
        jobs.task_done()   


for n in xrange(limit):
        t = threading.Thread(target=thread)
        t.setDaemon(True)
        t.start()


for i in xrange(10000):
        jobs.put(i)

jobs.join()
----------

thread()部分还要做些其他的处理,这个程序问题在哪里啊?
跪求大虾指点!!
搜索更多相关主题的帖子: 线程 爬虫 python 
2010-04-06 16:44
外部三电铃
Rank: 16Rank: 16Rank: 16Rank: 16
来 自:那一年
等 级:贵宾
威 望:55
帖 子:2004
专家分:7306
注 册:2007-12-17
得分:20 
只给这一小段段代码无法调试啊

那一年,苍井空还是处女
2010-04-06 20:06
sophiabing
Rank: 1
等 级:新手上路
帖 子:4
专家分:0
注 册:2010-4-6
得分:0 
全部的代码。。。
import linecache
import re,urllib
import threading
import time
import Queue
from xml.sax import make_parser

from xml.sax import ContentHandler



class FriendHandler(ContentHandler):

        isFriend=""

        Friend=""

        mode=""

        dateCreated=""

        isBirth=""

        birth=""

        interests=""

        isposted=""

        yaposted=""

        def startElement(self,name,attrs):   

                if name=="rdf:RDF":

                        self.mode="person"

                elif name=="foaf:knows":

                        self.mode="knows"



                if name=="foaf:dateOfBirth":

                        self.isBirth=1

            

                if name=="foaf:weblog" and self.mode=='person':        

                        self.dateCreated=attrs.get('lj:dateCreated')

               

                elif self.mode=="knows" and name=="foaf:nick":

                        self.isFriend=1

                elif name=="ya:posted":

                        self.isposted=1



        def endElement(self,name):            

                if name=="foaf:nick" and self.mode=="knows":

                        self.isFriend=""

                        self.mode=""

                if name=="foaf:dateOfBirth":

                        self.isBirth=""

                if name=="ya:posted":

                        self.isposted=""

        

        

        def characters(self,content):         

                if self.isFriend:

                        self.Friend+=content+','

                elif self.isBirth:

                        self.birth=content

                elif self.isposted:

                        self.yaposted=content
################





def thread():
    while True:
        i=jobs.get()
        line=linecache.getline('nick50000.txt',i)

        lj=('(.*?)\n')   

        mat=lj.match(line)

        if mat:

                nick=mat.groups()[0]
        try:   

                    saxparser.parse('http://'+nick+'.')
            print i
                    f2=file('foaf.txt','a')        

                    f2.write(ch.data)

                    f2.close()   

                       ch.data=''

        except:
            pass
        jobs.task_done()   



jobs=Queue.Queue()
limit=10
ch = FriendHandler()                           

saxparser = make_parser()                     

saxparser.setContentHandler(ch)  

for n in xrange(limit):
        t = threading.Thread(target=thread)
        t.setDaemon(True)
        t.start()
   
for i in xrange(1,1001):
        jobs.put(i)
        
jobs.join()
2010-04-06 21:49
sophiabing
Rank: 1
等 级:新手上路
帖 子:4
专家分:0
注 册:2010-4-6
得分:0 
拜托啦!!
2010-04-06 21:50
wangfeng3769
Rank: 1
等 级:新手上路
帖 子:3
专家分:0
注 册:2008-4-29
得分:0 
是不是thread 和 import thread 重名了
2010-05-21 17:36



参与讨论请移步原网站贴子:https://bbs.bccn.net/thread-301876-1-1.html




关于我们 | 广告合作 | 编程中国 | 清除Cookies | TOP | 手机版

编程中国 版权所有,并保留所有权利。
Powered by Discuz, Processed in 0.779888 second(s), 7 queries.
Copyright©2004-2024, BCCN.NET, All Rights Reserved