Learning Rules by Sequential Covering
Inputs: labeled training dataset D Outputs: ruleset R that covers all instances in D Procedure: Initialize R as the empty set for each class C { while D is nonempty { Construct one rule r that correctly classifies some instances in D that belong to class C and does not incorrectly classify any non-C instances Add rule r to ruleset R Remove from D all instances correctly classified by r } } return R
Inputs: nonempty dataset D, class C Outputs: a single rule r with C as its consequent that covers some instances in D and that does not incorrectly classify any non-C instances in D Procedure: Initialize r as a rule with an empty antecedent, which predicts C without examining any attributes: => C Initialize A as the set of all attributes over D while r incorrectly classifies some non-C instances of D { write r as ant(r) => C for each attribute-value pair (a=v), where a belongs to A and v is a value of a, compute the accuracy of the rule ant(r) and (a=v) => C let (a*=v*) be the attribute-value pair of maximum accuracy over D; in case of a tie, choose the pair that covers the greatest number of instances of D update r by adding (a*=v*) to its antecedent: r = ( ant(r) and (a*=v*) ) => C remove the attribute a* from the set A: A = A - {a*} } return r
% 1. Title: Database for fitting contact lenses % % 2. Sources: % (a) Cendrowska, J. "PRISM: An algorithm for inducing modular rules", % International Journal of Man-Machine Studies, 1987, 27, 349-370 % (b) Donor: Benoit Julien (Julien@ce.cmu.edu) % (c) Date: 1 August 1990 % % 3. Past Usage: % 1. See above. % 2. Witten, I. H. & MacDonald, B. A. (1988). Using concept % learning for knowledge acquisition. International Journal of % Man-Machine Studies, 27, (pp. 349-370). % % Notes: This database is complete (all possible combinations of % attribute-value pairs are represented). % % Each instance is complete and correct. % % 9 rules cover the training set. % % 4. Relevant Information Paragraph: % The examples are complete and noise free. % The examples highly simplified the problem. The attributes do not % fully describe all the factors affecting the decision as to which type, % if any, to fit. % % 5. Number of Instances: 24 % % 6. Number of Attributes: 4 (all nominal) % % 7. Attribute Information: % -- 3 Classes % 1 : the patient should be fitted with hard contact lenses, % 2 : the patient should be fitted with soft contact lenses, % 1 : the patient should not be fitted with contact lenses. % % 1. age of the patient: (1) young, (2) pre-presbyopic, (3) presbyopic % 2. spectacle prescription: (1) myope, (2) hypermetrope % 3. astigmatic: (1) no, (2) yes % 4. tear production rate: (1) reduced, (2) normal % % 8. Number of Missing Attribute Values: 0 % % 9. Class Distribution: % 1. hard contact lenses: 4 % 2. soft contact lenses: 5 % 3. no contact lenses: 15 @relation contact-lenses @attribute age {young, pre-presbyopic, presbyopic} @attribute spectacle-prescrip {myope, hypermetrope} @attribute astigmatism {no, yes} @attribute tear-prod-rate {reduced, normal} @attribute contact-lenses {soft, hard, none} @data % % 24 instances % young,myope,no,reduced,none young,myope,no,normal,soft young,myope,yes,reduced,none young,myope,yes,normal,hard young,hypermetrope,no,reduced,none young,hypermetrope,no,normal,soft young,hypermetrope,yes,reduced,none young,hypermetrope,yes,normal,hard pre-presbyopic,myope,no,reduced,none pre-presbyopic,myope,no,normal,soft pre-presbyopic,myope,yes,reduced,none pre-presbyopic,myope,yes,normal,hard pre-presbyopic,hypermetrope,no,reduced,none pre-presbyopic,hypermetrope,no,normal,soft pre-presbyopic,hypermetrope,yes,reduced,none pre-presbyopic,hypermetrope,yes,normal,none presbyopic,myope,no,reduced,none presbyopic,myope,no,normal,none presbyopic,myope,yes,reduced,none presbyopic,myope,yes,normal,hard presbyopic,hypermetrope,no,reduced,none presbyopic,hypermetrope,no,normal,soft presbyopic,hypermetrope,yes,reduced,none presbyopic,hypermetrope,yes,normal,none
=> hardand all non-class attributes are still being considered, that is,
A = { age, spectacle-prescription, astigmatism, tear-production-rate }
(a = v) => Cwhere (a, v) ranges over all possible attribute-value pairs:
attribute-value pair p/t age=young 2/8 age=pre-presbyopic 1/8 age=presbyopic 1/8 spectacle-prescription=myope 3/12 spectacle-prescription=hypermetrope 1/12 astigmatism=no 0/12 astigmatism=yes 4/12 tear-production-rate=normal 4/12 tear-production-rate=reduced 0/12In each case, the denominator of the fraction p/t on the right indicates the number of instances that contain the given attribute-value pair, and the numerator is the number of these instances that belong to the hard class. For example, there are 8 instances with age=young, and 2 of these 8 are in the hard class. The atribute-value pairs with the largest accuracies are astigmatism=yes and tear-production-rate=normal. Since both of these have the same coverage, one of them is chosen arbitrarily, say astigmatism=yes. That is:
(a*, v*) = (astigmatism, yes)The rule r is then updated by adding this attribute-value pair to the antecedent:
r = (astigmatism=yes) => hardand the set of attributes being considered is reduced by removing the astigmatism attribute:
A = { age, spectacle-prescription, tear-production-rate }Since the current rule r incorrectly classifies some instances by predicting hard for some instances that are actually in the contact-lenses=none class, another pass is made through the loop that refines the rule.
(astigmatism=yes) and (a=v) => hardAccuracies for the candidate attribute-value pairs are:
attribute-value pair p/t age=young 2/4 age=pre-presbyopic 1/4 age=presbyopic 1/4 spectacle-prescription=myope 3/6 spectacle-prescription=hypermetrope 1/6 tear-production-rate=normal 4/6 tear-production-rate=reduced 0/6In computing these acuracies, only instances for which (astigmatism=yes) are considered. Here is the set of all such instances:
young,myope,yes,reduced,none young,myope,yes,normal,hard young,hypermetrope,yes,reduced,none young,hypermetrope,yes,normal,hard pre-presbyopic,myope,yes,reduced,none pre-presbyopic,myope,yes,normal,hard pre-presbyopic,hypermetrope,yes,reduced,none pre-presbyopic,hypermetrope,yes,normal,none presbyopic,myope,yes,reduced,none presbyopic,myope,yes,normal,hard presbyopic,hypermetrope,yes,reduced,none presbyopic,hypermetrope,yes,normal,noneThe attribute-value pair of greatest accuracy is tear-production-rate=normal, with an accuracy p/t = 4/6. The rule r is updated by adding this attribute-value pair to the antecedent:
r = (astigmatism=yes) and (tear-production-rate=normal) => hardA = { age, spectacle-prescription } Some instances are still misclassified by the refined rule r, so another pass is made through the loop.
r = (astigmatism=yes) and (tear-production-rate=normal) and (a=v) => hardAccuracies for the candidate attribute-value pairs are:
attribute-value pair p/t age=young 2/2 age=pre-presbyopic 1/2 age=presbyopic 1/2 spectacle-prescription=myope 3/3 spectacle-prescription=hypermetrope 1/3These are computed over the set of instances that satisfy both constraints that appear in the current antecedent:
young,myope,yes,normal,hard young,hypermetrope,yes,normal,hard pre-presbyopic,myope,yes,normal,hard pre-presbyopic,hypermetrope,yes,normal,none presbyopic,myope,yes,normal,hard presbyopic,hypermetrope,yes,normal,noneThere are now two attribute-value pairs with 100% accuracy, namely (age=young) and (spectacle-prescription=myope). The tie is broken by choosing the one with greater coverage, that is:
(a*, v*) = (spectacle-prescription, myope)The updated rule is
r = (astigmatism=yes) and (tear-production-rate=normal) and (spectacle-prescription=myope) => hardSince this rule has 100% accuracy, it is accepted and returned by the rule construction procedure as the first rule generated for the hard class by the inner loop in the sequential covering pseudocode.
r = (astigmatism=yes) and (tear-production-rate=normal) and (spectacle-prescription=myope) => hardis added to the ruleset R:
R = { (astigmatism=yes) and (tear-production-rate=normal) and (spectacle-prescription=myope) => hard }and the three instances covered by this rule are removed from the dataset D. D now contains 21 instances:
young,myope,no,reduced,none young,myope,no,normal,soft young,myope,yes,reduced,none young,hypermetrope,no,reduced,none young,hypermetrope,no,normal,soft young,hypermetrope,yes,reduced,none young,hypermetrope,yes,normal,hard pre-presbyopic,myope,no,reduced,none pre-presbyopic,myope,no,normal,soft pre-presbyopic,myope,yes,reduced,none pre-presbyopic,hypermetrope,no,reduced,none pre-presbyopic,hypermetrope,no,normal,soft pre-presbyopic,hypermetrope,yes,reduced,none pre-presbyopic,hypermetrope,yes,normal,none presbyopic,myope,no,reduced,none presbyopic,myope,no,normal,none presbyopic,myope,yes,reduced,none presbyopic,hypermetrope,no,reduced,none presbyopic,hypermetrope,no,normal,soft presbyopic,hypermetrope,yes,reduced,none presbyopic,hypermetrope,yes,normal,noneThis completes the first pass through the inner loop in the sequential covering algorithm. Since one instance with contact-lenses=hard still remains, another pass is made in order to generate a second rule that targets the hard class. This second rule is constructed by following the procedure described above. The second rule turns out to be:
r = (age=young) and (astigmatism=yes) and (tear-production-rate=normal) => hardThis rule covers the remaining instance in the hard class, which completes the first pass through the outer loop of the sequential covering algorithm. Subsequent passes generate rules that cover instances belonging to the soft and none classes.