[求助]哪位英语能力强的帮我翻译下 - 计算机英语

问题点数：0 回复次数：0

[求助]哪位英语能力强的帮我翻译下

一篇关于数据挖掘决策树有关知识的文章小弟菜鸟
哪位大吓帮忙翻译下
The purpose of the decision tree classiﬁer is to classify instances based on values of ordinary attributes and class label attribute. Traditionally, the data set is single-valued and single-labeled. In this data set, each record has many single- valued attributes and a given single-labeled attribute (i.e. class label attribute), and the class labels that can have two or more than two types are exclusive to each other or one another. Prior art decision tree classiﬁers, such as ID3 (Quinlan, 1979, 1986), Distance-based method (Mantaras, 1991),IC(Agrawal, Ghosh, Imielinski, Iyer, & Swami, 1992), C4.5 (Quinlan, 1993 ), Fuzzy ID3 (Umano et al., 1994 ), CART (Steinberg & Colla, 1995 ), SLIQ (Mehta, Agrawal, &Rissanen, 1996 ), SPRINT (Shafer, Agrawal, & Mehta, 1996 ), Rainforest (Gehrke, Ramakrishnan, & Ganti, 1998 ) and PUBLIC (Rastogi & Shim, 1998 ), all focus on this single- valued and single-labeled data set.

However, there is multi-valued and multi-labeled data in the real world as shown in Table 1 . Multi-valued data means that a record can have multiple values for an ordinary

attribute. Multi-labeled data means that a record can belong to multiple class labels, and the class labels are not exclusive to each other or one another. Readers might have difﬁculties to distinguish multi-labeled data from two-classed or multi- classed data mentioned in some related works. To clarify

this confusion, we discuss the exclusiveness among classes, number of class and representation of the class label attribute in the related works as follows:

1. Exclusiveness: Each data can only belong to a single class. Classes are exclusive to one another. ID3,

Distance-based Method, IC, C4.5, Fuzzy ID3, CART, SLIQ, SPRINT, Rainforest and PUBLIC are such examples.

2. Number of class: Data with classes classiﬁed into two types in the class label attribute is called two-classed data. ID3 and C4.5 are such examples. Data with classes classiﬁed into more than two types in the class label attribute is called multi-classed data. IC, CART and Fuzzy ID3 are such examples.

3. Label representation: Data with a single value for the class label attribute is called single-labeled data. ID3, Distance-based Method, IC, C4.5, Fuzzy ID3, CART, SLIQ, SPRINT, Rainforest and PUBLIC are such examples.

According to the discussion above, a multi-valued and multi-labeled data as we deﬁned here can beregarded as a non-exclusive, multi-classed and multi-labeled data.

In our previous work (Chen, Hsu, & Chou, 2003), we have explained why the traditional classiﬁers are not capable of handling this multi-valued and multi-labeled data. To solve this multi-valued and multi-labeled classiﬁ-cation problem, we have designed a decision tree classiﬁer named MMC (Chen et al., 2003) before. MMC differs from the traditional ones in some major functions including growing a decision tree, assigning labels to represent a leaf and making a prediction for a new data. In the process of growing a tree, MMC proposes a new measure named weighted similarity for selecting multi-valued attribute to partition a node into child nodes to approach perfect grouping. To assign labels, MMC picks the ones with numbers large enough to represent a leaf. To make a prediction for a new data, MMC traverses the tree as usual, and as the traversing reaches several leaf nodes for the record with multi-valued attribute, MMC would union all the labels of the leaf nodes as the prediction result. Experimental results show that MMC can get an average predicting accuracy of 62.56%.

Having a decision classiﬁer developed for the multi-valued and multi-labeled data, this research steps further to

improve the classiﬁer’s accuracy. Considering the following over-ﬁtting problems (Han & Kamber, 2001; Russell & Norving, 1995 ) of MMC, improvement on its predictingaccuracy seems possible. First, MMC neglects to avoid the situation when the data set is too small. Therefore, it may choose some attributes irrelevant to the class labels. Second, MMC appears to prefer the attribute which splits into child nodes with larger similarity among multiple

labels. Therefore, MMC exists inductive bias (Gordon & Desjardins, 1995 ).

Trying to minimize the over-ﬁtting problems above, this paper proposes solutions as: (1) Set a constraint of size for the data set in each node to avoid the data set being too small. (2) Consider not only the average similarity of labels of each child node but also the average appropriateness of labels of

each child node to decrease the bias problem of MMC.Based on the propositions above, we have designed a new decision tree classiﬁer to improve the accuracy of MMC.The decision tree classiﬁer, named MMDT (multi-valued and multi-labeled decision tree), can construct a multi-

valued and multi-labeled decision tree as Fig. 1 shows. The rest of the paper is organized as follows. In Section 2,the symbols will be introduced ﬁrst. In Section 3, the tree construction and data prediction algorithms are described. In Section 4, the experiments are presented. And, ﬁnally, Section 5 makes summaries and conclusions.

搜索更多相关主题的帖子: 英语　 attribute　能力　 class　label