本文要重点介绍的是VBA中的XmlHttp对象(MSXML2.XMLHTTP或MSXML.XMLHTTP),它可以向http服务器发送请求并使用微软XML文档对象模型Microsoft XML Document Object Model (DOM)处理回应。练习抓取的网页例子是。
第一种方法——DOM经典属性:
参考和
Sub Main()ActiveSheet.Cells.ClearUrl = "https://www.qppstudio.net/public-holidays-by-date/month1.htm"Set oHttp = CreateObject("MSXML2.XMLHTTP") '创建一个xmlhttp对象Set odom = CreateObject("htmlfile") '创建一个Dom对象With oHttp'open,创建一个新的http请求,并指定此请求的方法、URL以及验证信息(用户名/密码) 'send,发送请求到http服务器并接收回应 .Open "GET", Url, False '使用Open方法,用get请求,False代表非异步加载 .Open "GET", Url, False '使用Open方法,用get请求,False代表非异步加载 .send '将open方法的信息发送给网页服务器 odom.body.innerHTML = .responseText '将响应网页的HTML赋值给Dom对象,并只需要body标签里面的内容End Withdom (odom)End Sub Sub dom(odom As Object)i = 2For Each Item In odom.all If Item.className = "list-item" Then For Each itemch In Item.Children If itemch.className = "list-item-heading" Then Range("a" & i) = itemch.innerText ElseIf itemch.className = "list-subitem" Then Range("b" & i) = itemch.Children(1).innerText Range("c" & i) = itemch.Children(3).innerText i = i + 1 End If Next Exit For End IfNextEnd Sub
第二种方法——转换为XML并使用XPATH(比较麻烦):
参考
Sub Main()Url = "https://www.qppstudio.net/public-holidays-by-date/month1.htm"Set oHttp = CreateObject("MSXML2.XMLHTTP") '创建一个xmlhttp对象Set odom = CreateObject("htmlfile") '创建一个Dom对象With oHttp'open,创建一个新的http请求,并指定此请求的方法、URL以及验证信息(用户名/密码) 'send,发送请求到http服务器并接收回应 .Open "GET", Url, False '使用Open方法,用get请求,False代表非异步加载 .Open "GET", Url, False '使用Open方法,用get请求,False代表非异步加载 .send '将open方法的信息发送给网页服务器 odom.body.innerHTML = .responseText '将响应网页的HTML赋值给Dom对象,并只需要body标签里面的内容End With'需要先将html文本进行格式化才能写入xmldoc,才能使用自带的xpath,比如节点一定要有开始和结束,节点属性一定要用双引号括起来'例如'sXML = ""'Debug.Print sXMLDim sXML As String, xDoc, a, nodelist, nodeFor Each Item In odom.all If Item.className = "list-item" Then sXML = Item.outerHTML Exit For End IfNextsXML = rr(sXML, " "'sXML = sXML & " "'sXML = sXML & "true "'sXML = sXML & "APCD03 "'sXML = sXML & "OIS "'sXML = sXML & ""'sXML = sXML & " false "'sXML = sXML & "APCD04 "'sXML = sXML & "OIS "'sXML = sXML & "", "")sXML = rr(sXML, "class=.*?>", ">")Set xDoc = CreateObject("MSXML.DOMDocument")a = xDoc.LoadXML(sXML)'a为true时代表写入成功,为false代表写入失败'Debug.Print a'一旦a为false就可以先写入txt再看哪些还不符合xml规范'file = ThisWorkbook.Path & "\test.txt"'Open file For Output As #1'Print #1, sXML'Close #1Set nodelist = xDoc.SelectNodes("//P")Set node = xDoc.SelectSingleNode("//P")'Debug.Print nodelist.LengthFor Each Item In nodelistDebug.Print Item.TextNextEnd SubFunction rr(str As String, pattern As String, repstr As String)Set reg = CreateObject("vbscript.regexp")With reg.Global = True.pattern = patternEnd Withrr = reg.Replace(str, repstr)End Function