标题:怎样用程序实现将一篇HTML格式的新闻转换为XML格式的文件?
只看楼主
提拉米苏1899
Rank: 1
等 级:新手上路
帖 子:3
专家分:0
注 册:2009-9-1
 问题点数:0 回复次数:0 
怎样用程序实现将一篇HTML格式的新闻转换为XML格式的文件?
怎么样将一篇从网络上搜索到的HTML格式的文件转换为XML格式的文件?下面是算法
Input:HTML file from the ChinaTimes website.
Output:XML file.
Method:
Step 1: Get HTML file from URL: http://www.
  Step 1.1: Compare the file name of HTML file with one of the last processed file in previous process iteration, the last file name is saved in 'check.txt'.
  Step 1.2: If they are different, go to Step 2. Otherwise sleep for several minutes and then go to Step 1.
Step 2: If (date='<li>'), record the URL link which follows'<li>'.
Step 3: Retrieve web page by using the URL link.
Step 4: Compare the retrieved data with the one retrieved last time by using the same URL. If they are the same, ignore current retrieved data and then go to Step 1; otherwise continue.
Step 5:If(data='<tr>'),execute following substeps:
  Step 5.1: If data meet the format of the Title(like '<Title>'...</Title>'). save title data in XML file.
  Step 5.2: If data meet the format of the Date (like '<Date>...</Date>').save date data in XML file.
  Step 5.3: If data meet the format of the Reporter (like'<Reporter>...</Reporter>'). save reporter data in XML file.
  Step 5.4: If data meet the format of the Location (like'<Location>...</Location>').save location data in XML file.
  Step 5.5: If data meet the format of the e-news(like '<News>...</News>').save e-news data in XML file.
Step 6: Save the file name of current processed HTML file in 'check.txt'.
Step 7: Go to Step 1.
搜索更多相关主题的帖子: XML 格式 HTML 文件 
2009-09-01 14:19



参与讨论请移步原网站贴子:https://bbs.bccn.net/thread-284400-1-1.html




关于我们 | 广告合作 | 编程中国 | 清除Cookies | TOP | 手机版

编程中国 版权所有,并保留所有权利。
Powered by Discuz, Processed in 0.014221 second(s), 8 queries.
Copyright©2004-2024, BCCN.NET, All Rights Reserved