标题:判断XML文件中的token
只看楼主
UAPOPPING
Rank: 1
等 级:新手上路
帖 子:13
专家分:5
注 册:2015-1-21
结帖率:100%
 问题点数:0 回复次数:4 
判断XML文件中的token
要写一段java代码,判断XML文件的的token是tag还是word.
以下是要求:
An XML file is a text file made up of tokens. A token is either a word or it is a tag.

1. An token is a tag if and only if it is bounded by angle brackets, e.g., <para>.
2. If the second character is a forward slash (e.g., </para>), this a closing tag.
3. If the second to last character is a forward slash (e.g., <br/>), this is a self-closing tag.
4. If a tag is neither of the above, it is an opening tag
5. Except for the slashes mentioned above, the only characters that may appear between the brackets are letters, numerals, and hyphens. This means, for example, that the string <<para>> is neither a tag nor a word. It should not appear in an XML file.

我已经找出1-4的表达方式,但问题在第5点,如何找出既不是tag又不是word的token。请大家帮帮忙,以下是我写的代码

public class XMLToken {
    private String token;
    XMLToken(String token) {token = this.token;} //constructor
   
        public static boolean isOpeningTag(){ //找出 opening tag   
        String token =new String();
        return token.matches("<\\w+>");
      }

      public static boolean isClosingTag(){ //找出closing tag
         String token =new String();
         return token.matches("</\\w+>");
      }

      public static boolean isSelfClosingTag(){ //找出self closing tag
          String token =new String();
          return token.matches("<\\w+/>");  
      }
      
      public static boolean isTag(){ //找出 tag
          String token =new String();
          return token.matches("<\\w+>") || token.matches("</\\w+>") || token.matches("<\\w+/>");
      }
         
          public static boolean malformedTag(){
             //找出既不是word也不是tag的token
          }

}
搜索更多相关主题的帖子: character neither forward second angle 
2015-02-15 03:05
UAPOPPING
Rank: 1
等 级:新手上路
帖 子:13
专家分:5
注 册:2015-1-21
得分:0 
自己结贴,已找到办法。

public class XMLToken {
   
    private String token;
    XMLToken(String token) {token = this.token;} //constructor
   
    public boolean isOpeningTag(){   
        return token.matches("<\\w+>");
      }

      public boolean isClosingTag() {
         return token.matches("</\\w+>");
      }
      
      public boolean isSelfClosingTag() {
          return token.matches("<\\w+/>");  
      }
      
      public boolean isTag(){
          return token.matches("<\\w+>") || token.matches("</\\w+>") || token.matches("<\\w+/>");
      }
      
      public boolean isMalFormedTokens(){
          return token.matches("<^\\w+>") || token.matches("</^\\w+>") || token.matches("<^\\w+/>");
      }
      
      public boolean isWord(){
          return !isTag() && !isMalFormedTokens();
      }  
}
2015-02-15 10:33
kyoseven
Rank: 2
来 自:新疆阿克苏
等 级:论坛游民
帖 子:7
专家分:26
注 册:2015-2-15
得分:0 
大神啊,支持你,顶!
2015-02-15 23:33
日知己所无
Rank: 11Rank: 11Rank: 11Rank: 11
等 级:贵宾
威 望:38
帖 子:427
专家分:2071
注 册:2014-3-22
得分:0 
原来的代码中有一些问题,帮着修改了一下

最好再好好测试一下
简单的方法是:直接使用System.out.println打印测试结果【下面是示例代码】
更好的方法是:使用JUnit,以及能够自动报告覆盖率(Coverage)的工具

程序代码:
/*
* An XML file is a text file made up of tokens. A token is either a word or it is a tag.
* 1. An token is a tag if and only if it is bounded by angle brackets, e.g., <para>.
* 2. If the second character is a forward slash (e.g., </para>), this a closing tag.
* 3. If the second to last character is a forward slash (e.g., <br/>), this is a self-closing tag.
* 4. If a tag is neither of the above, it is an opening tag.
* 5. Except for the slashes mentioned above,
* the only characters that may appear between the brackets are letters, numerals, and hyphens.
* This means, for example, that the string <<para>> is neither a tag nor a word. It should not appear in an XML file.*/

public class XMLToken {
    private String m_token;

    XMLToken(String p_token) {
        // token = this.token; //这句写反了,原来的写法会发生NullPointerException
        this.m_token = p_token; // 最好区分一下是成员变量还是参数,否则很容易引入Bug
    } //constructor

    public static void main(String[] args) {
        String testArray[] = {null, "", " ", "  ", "\t",
        "<?xml version='1.0' encoding='ISO-8859-1' ?>", // <?xml version='1.0' encoding='ISO-8859-1' ?>
        "<!DOCTYPE log4j:configuration SYSTEM \"log4j.dtd\">", // <!DOCTYPE log4j:configuration SYSTEM "log4j.dtd">
        "<layout class=\"org.apache.log4j.PatternLayout\">", // <layout class="org.apache.log4j.PatternLayout">
        "</layout>",
        "<layout>",
        "<param name=\"ConversionPattern\" value=\"[%7r] %6p - %30.30c - %m \\n\"/>", // <param name="ConversionPattern" value="[%7r] %6p - %30.30c - %m \n"/>
        "<tag id=\"foo\" />", // <tag id="foo" />
        "< Tag/>", // < Tag/>
        };
        for (String testString : testArray) {
            printTestResult(testString);
        }
    }

    public static void printTestResult(String testString) {
        String printString = testString;
        if (printString == null) {
            printString = "null"; // 如果参数为null则在输出结果显示为文字列"null"
        } else if (printString.trim().equals("")) {
            printString = "\"" + printString + "\""; // 如果参数为空格则在输出结果显示为用双引号包围的空格文字列
        }

        // System.out.println("======" + printString + " Test Start");
        try {
            XMLToken xmlToken = new XMLToken(testString);
            if (xmlToken.isOpeningTag()) {
                System.out.println(printString + ".isOpeningTag:" + xmlToken.isOpeningTag());
            }
            if (xmlToken.isClosingTag()) {
                System.out.println(printString + ".isClosingTag:" + xmlToken.isClosingTag());
            }
            if (xmlToken.isSelfClosingTag()) {
                System.out.println(printString + ".isSelfClosingTag:" + xmlToken.isSelfClosingTag());
            }
            if (xmlToken.isTag()) {
                System.out.println(printString + ".isTag:" + xmlToken.isTag());
            }
            if (xmlToken.isMalFormedTokens()) {
                System.out.println(printString + ".isMalFormedTokens:" + xmlToken.isMalFormedTokens());
            }
            if (xmlToken.isWord()) {
                System.out.println(printString + ".isWord:" + xmlToken.isWord());
            }
        } catch (Exception e) {
            System.out.println(printString + ":" + e);
            // e.printStackTrace(); // 要想看异常的详细信息,可以考虑把注释打开
        }
        // System.out.println("======" + printString + " Test End");
    }

    public boolean isOpeningTag() {
        return m_token.matches("<\\w+>");
    }

    public boolean isClosingTag() {
        return m_token.matches("</\\w+>");
    }

    public boolean isSelfClosingTag() {
        return m_token.matches("<\\w+/>");
    }

    public boolean isTag() {
        return m_token.matches("<\\w+>") || m_token.matches("</\\w+>") || m_token.matches("<\\w+/>");
    }

    public boolean isMalFormedTokens() {
        return m_token.matches("<^\\w+>") || m_token.matches("</^\\w+>") || m_token.matches("<^\\w+/>");
    }

    public boolean isWord() {
        return !isTag() && !isMalFormedTokens();
    }
}


程序代码:
null:java.lang.NullPointerException // 输入为null时,会报null异常【程序Bug】
"".isWord:true // 输入为空字符串时,会误判为Word【空字符串应该既不是Word也不是Token,甚至都不是Element(XML的元素)】【误判了】
" ".isWord:true // 空字符串也有一样的问题【一个空格】【误判了】
"  ".isWord:true // 空字符串也有一样的问题【多个空格】【误判了】
"    ".isWord:true // TAB制表符也有一样的问题【我估计改行符等特殊控制字符可能也有问题】【误判了】
<?xml version='1.0' encoding='ISO-8859-1' ?>.isWord:true【误判了】
<!DOCTYPE log4j:configuration SYSTEM "log4j.dtd">.isWord:true【误判了】
<layout class="org.apache.log4j.PatternLayout">.isWord:true【误判了】
</layout>.isClosingTag:true【结果正确】
</layout>.isTag:true【结果正确】
<layout>.isOpeningTag:true【结果正确】
<layout>.isTag:true【结果正确】
<param name="ConversionPattern" value="[%7r] %6p - %30.30c - %m \n"/>.isWord:true【误判了】
<tag id="foo" />.isWord:true【误判了】
< Tag/>.isWord:true【误判了】
2015-02-28 13:48
UAPOPPING
Rank: 1
等 级:新手上路
帖 子:13
专家分:5
注 册:2015-1-21
得分:0 
回复 4楼 日知己所无
厉害!很有用,多谢!
2015-03-11 02:39



参与讨论请移步原网站贴子:https://bbs.bccn.net/thread-442032-1-1.html




关于我们 | 广告合作 | 编程中国 | 清除Cookies | TOP | 手机版

编程中国 版权所有,并保留所有权利。
Powered by Discuz, Processed in 0.120952 second(s), 7 queries.
Copyright©2004-2024, BCCN.NET, All Rights Reserved