网页数据抓取状态代码问题
学习网页数据抓取,天眼查https://www.这个网站有些难度(还有类似https://aiqicha.baidu.com/)1、输入关键词“华为”,获取请求 URL: https://www.,状态代码显示200
2、代码
程序代码:
CLEAR lcWb = '华为' &&keywords lcWb1 = STRCONV(STRCONV(lcWb, 9), 15) * 转换为UTF8编码 lcUTF8 = "" FOR ln = 1 TO LEN(lcWb1) STEP 2 lcUTF8 = lcUTF8 + "%" + SUBSTR(lcWb1, ln, 2) ENDFOR myurl = 'https://www.' &&"https://aiqicha.baidu.com/s?q=&lcUTF8" oHTTP = CREATEOBJECT("MSXML2.ServerXMLHTTP") oHTTP.Open("GET", myurl, .F.) OHTTP.SETREQUESTHEADER("Content-Type", "application/x-www-form-urlencoded") lcSend = "erectDate=¬hing=&pjname=" + lcUTF8 + "&head=head_620.js&bottom=bottom_591.js" oHTTP.Send(lcSend) ? oHTTP.Status IF oHTTP.Status = 200 lcStr = oHTTP.ResponseText &&网页内容存入lcstr STRTOFILE(lcStr,'D:\ex.txt') &&调试语句:将下载的网页存为D:\ex.txt ENDIF
3、实际状态代码返回418
4、网址搜索后自动挂上一段变化码&sessionNo=1674728807.71143526,与此有关吗?
https://www.