標籤

4GL (1) 人才發展 (10) 人物 (3) 太陽能 (4) 心理 (3) 心靈 (10) 文學 (31) 生活常識 (14) 光學 (1) 名句 (10) 即時通訊軟體 (2) 奇狐 (2) 爬蟲 (1) 音樂 (2) 產業 (5) 郭語錄 (3) 無聊 (3) 統計 (4) 新聞 (1) 經濟學 (1) 經營管理 (42) 解析度 (1) 遊戲 (5) 電學 (1) 網管 (10) 廣告 (1) 數學 (1) 機率 (1) 雜趣 (1) 證券 (4) 證券期貨 (1) ABAP (15) AD (1) agentflow (4) AJAX (1) Android (1) AnyChart (1) Apache (14) BASIS (4) BDL (1) C# (1) Church (1) CIE (1) CO (38) Converter (1) cron (1) CSS (23) DMS (1) DVD (1) Eclipse (1) English (1) excel (5) Exchange (4) Failover (1) Fedora (1) FI (57) File Transfer (1) Firefox (3) FM (2) fourjs (1) Genero (1) gladiatus (1) google (1) Google Maps API (2) grep (1) Grub (1) HR (2) html (23) HTS (8) IE (1) IE 8 (1) IIS (1) IMAP (3) Internet Explorer (1) java (4) JavaScript (22) jQuery (6) JSON (1) K3b (1) ldd (1) LED (3) Linux (117) Linux Mint (4) Load Balance (1) Microsoft (2) MIS (2) MM (51) MSSQL (1) MySQL (27) Network (1) NFS (1) Office (1) OpenSSL (1) Oracle (126) Outlook (3) PDF (6) Perl (60) PHP (33) PL/SQL (1) PL/SQL Developer (1) PM (3) Postfix (2) postfwd (1) PostgreSQL (1) PP (50) python (5) QM (1) Red Hat (4) Reporting Service (28) ruby (11) SAP (234) scp (1) SD (16) sed (1) Selenium (3) Selenium-WebDriver (5) shell (5) SQL (4) SQL server (8) sqlplus (1) SQuirreL SQL Client (1) SSH (2) SWOT (3) Symantec (2) T-SQL (7) Tera Term (2) tip (1) tiptop (24) Tomcat (6) Trouble Shooting (1) Tuning (5) Ubuntu (37) ufw (1) utf-8 (1) VIM (11) Virtual Machine (2) VirtualBox (1) vnc (3) Web Service (2) wget (1) Windows (19) Windows (1) WM (6) Xvfb (2) youtube (1) yum (2)

2014年10月6日 星期一

perl HTML::TokeParser example

#!/usr/bin/perl

#use strict;
use LWP::Simple;
use HTML::TokeParser;
use Encode;

#my $html   = get("https://www.iyp.com.tw/leisure/Hotels.html");
#my $html   = get("https://www.iyp.com.tw/showroom.php?cate_name_eng_lv1=leisure&cate_name_eng_lv3=Hotels&p=0");

my $i;
open FILE ," >output.csv";

for ($i=0 ; $i<=60 ; $i++) {

    my $html   = get("https://www.iyp.com.tw/showroom.php?cate_name_eng_lv1=leisure&cate_name_eng_lv3=Hotels&p=$i");
    my $stream = HTML::TokeParser->new(\$html);
    my %image  = ( );

    while (my $token = $stream->get_token) {
        #if ($token->[2]{"title"} ne "" && $token->[2]{"target"} eq "_blank") {
            #if ($token->[0] eq 'S' && $token->[1] eq 'a' && $token->[2]{"class"} ne "more-btn") {
            if ($token->[0] eq 'S' && $token->[1] eq 'a' && $token->[2]{"target"} eq "_blank" && $token->[2]{"class"} ne "more-btn") {
                my ($tel) = $token->[2]{"href"} =~ m/(\d+)/;
                print FILE "$tel" ."#\t";
                #print FILE encode("big5",$token->[2]{"title"}). "\t";
                print FILE $token->[2]{"title"}. "#\t";
            }
            if ($token->[0] eq 'S' && $token->[1] eq "span" && $token->[2]{"title"} eq "查看地圖") {
                my ($misc,$addr) = $token->[2]{"go-map"} =~ m/(\/\/.*=)(.*)/;
                #print FILE encode("big5",$addr) ."\n";
                print FILE $addr ."\n";
                #print $token->[2]{"go-map"}. "\n"
            }
        #}
    }

}
close FILE;

沒有留言:

張貼留言