將網頁上的Table資料轉到Excel (VB.NET)

將網頁上的Table資料轉到Excel (VB.NET)

http://www.autohotkey.com/docs/commands/Send.htm

這網址是實驗的網頁

首先 先把這一頁另存新檔 存到桌面上

檔名是Send.htm

改副檔名 為.xls

打開 Send.xls

表格就呈現出來 任人宰割了

大功告成

 


Vb.net 可以用以下原始碼

按下button1 執行以下動作

先下載網頁原始碼  再利用 規則運算式  取出 <table> 的頭尾 再存到c:test.xls


Imports System.Net
Imports System.IO
Imports System.Text
Imports System.Text.RegularExpressions
 
Public Class Form2

    'TextBox1 就是該table的原始碼 部分
    Protected Function GetWebPage(ByVal url As String) As String
        Dim myRequest As System.Net.HttpWebRequest = System.Net.WebRequest.Create(url)
        Dim myResponse As System.Net.WebResponse = myRequest.GetResponse

        Dim myStream As IO.Stream = myResponse.GetResponseStream
        Dim streamReader As New IO.StreamReader(myStream, System.Text.Encoding.UTF8)

        Return streamReader.ReadToEnd
    End Function


    Private Sub Button1_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button1.Click
        '規則運算式 參考 http://blog.stevenlevithan.com/archives/match-innermost-html-element
        '-----------先讀入 網址的原始碼 放到textbox1

        TextBox1.Text = GetWebPage("http://www.autohotkey.com/docs/commands/Send.htm")
        '-----------再用規則 取出 <table> 的頭尾
        TextBox2.Text = ""
        Dim ex As New Regex("<table border\b[^>]*>(?:[^<]+|<(?!table\b[^>]*>))*?</table>")
        For Each m As Match In ex.Matches(TextBox1.Text)
            For Each c As Capture In m.Captures
                Debug.Print(c.Value)
                TextBox2.Text &= c.Value
            Next

        Next
        '-----------再存成excel
        Dim fileName As String = "c:\test.xls"
        Using fs As New FileStream(fileName, FileMode.Create, FileAccess.Write)
            Dim sw As New StreamWriter(fs, _
                        Encoding.GetEncoding("big5"))
            sw.Write(TextBox2.Text)
            sw.Close()
        End Using

    End Sub

End Class

 


如有錯誤 歡迎指正