Class SimpleCart

  • All Implemented Interfaces:
    java.io.Serializable, java.lang.Cloneable, AdditionalMeasureProducer, CapabilitiesHandler, OptionHandler, Randomizable, RevisionHandler, TechnicalInformationHandler

    public class SimpleCart
    extends RandomizableClassifier
    implements AdditionalMeasureProducer, TechnicalInformationHandler
    Class implementing minimal cost-complexity pruning.
    Note when dealing with missing values, use "fractional instances" method instead of surrogate split method.

    For more information, see:

    Leo Breiman, Jerome H. Friedman, Richard A. Olshen, Charles J. Stone (1984). Classification and Regression Trees. Wadsworth International Group, Belmont, California.

    BibTeX:

     @book{Breiman1984,
        address = {Belmont, California},
        author = {Leo Breiman and Jerome H. Friedman and Richard A. Olshen and Charles J. Stone},
        publisher = {Wadsworth International Group},
        title = {Classification and Regression Trees},
        year = {1984}
     }
     

    Valid options are:

     -S <num>
      Random number seed.
      (default 1)
     -D
      If set, classifier is run in debug mode and
      may output additional info to the console
     -M <min no>
      The minimal number of instances at the terminal nodes.
      (default 2)
     -N <num folds>
      The number of folds used in the minimal cost-complexity pruning.
      (default 5)
     -U
      Don't use the minimal cost-complexity pruning.
      (default yes).
     -H
      Don't use the heuristic method for binary split.
      (default true).
     -A
      Use 1 SE rule to make pruning decision.
      (default no).
     -C
      Percentage of training data size (0-1].
      (default 1).
    Version:
    $Revision: 10491 $
    Author:
    Haijian Shi (hs69@cs.waikato.ac.nz)
    See Also:
    Serialized Form
    • Constructor Summary

      Constructors 
      Constructor Description
      SimpleCart()  
    • Method Summary

      All Methods Static Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      void buildClassifier​(Instances data)
      Build the classifier.
      void calculateAlphas()
      Updates the alpha field for all nodes.
      double[] distributionForInstance​(Instance instance)
      Computes class probabilities for instance using the decision tree.
      java.util.Enumeration enumerateMeasures()
      Return an enumeration of the measure names.
      Capabilities getCapabilities()
      Returns default capabilities of the classifier.
      boolean getHeuristic()
      Get if use heuristic search for nominal attributes in multi-class problems.
      double getMeasure​(java.lang.String additionalMeasureName)
      Returns the value of the named measure.
      double getMinNumObj()
      Get minimal number of instances at the terminal nodes.
      int getNumFoldsPruning()
      Set number of folds in internal cross-validation.
      java.lang.String[] getOptions()
      Gets the current settings of the classifier.
      java.lang.String getRevision()
      Returns the revision string.
      double getSizePer()
      Get training set size.
      TechnicalInformation getTechnicalInformation()
      Returns an instance of a TechnicalInformation object, containing detailed information about the technical background of this class, e.g., paper reference or book this class is based on.
      boolean getUseOneSE()
      Get if use the 1SE rule to choose final model.
      boolean getUsePrune()
      Get if use minimal cost-complexity pruning.
      java.lang.String globalInfo()
      Return a description suitable for displaying in the explorer/experimenter.
      java.lang.String heuristicTipText()
      Returns the tip text for this property
      java.util.Enumeration listOptions()
      Returns an enumeration describing the available options.
      static void main​(java.lang.String[] args)
      Main method.
      double measureTreeSize()
      Return number of tree size.
      java.lang.String minNumObjTipText()
      Returns the tip text for this property
      void modelErrors()
      Updates the numIncorrectModel field for all nodes when subtree (to be pruned) is rooted.
      java.lang.String numFoldsPruningTipText()
      Returns the tip text for this property
      int numInnerNodes()
      Method to count the number of inner nodes in the tree.
      int numLeaves()
      Compute number of leaf nodes.
      int numNodes()
      Compute size of the tree.
      void prune​(double alpha)
      Prunes the original tree using the CART pruning scheme, given a cost-complexity parameter alpha.
      int prune​(double[] alphas, double[] errors, Instances test)
      Method for performing one fold in the cross-validation of minimal cost-complexity pruning.
      void setHeuristic​(boolean value)
      Set if use heuristic search for nominal attributes in multi-class problems.
      void setMinNumObj​(double value)
      Set minimal number of instances at the terminal nodes.
      void setNumFoldsPruning​(int value)
      Set number of folds in internal cross-validation.
      void setOptions​(java.lang.String[] options)
      Parses a given list of options.
      void setSizePer​(double value)
      Set training set size.
      void setUseOneSE​(boolean value)
      Set if use the 1SE rule to choose final model.
      void setUsePrune​(boolean value)
      Set if use minimal cost-complexity pruning.
      java.lang.String sizePerTipText()
      Returns the tip text for this property
      java.lang.String toString()
      Prints the decision tree using the protected toString method from below.
      void treeErrors()
      Updates the numIncorrectTree field for all nodes.
      java.lang.String useOneSETipText()
      Returns the tip text for this property
      java.lang.String usePruneTipText()
      Return the tip text for this property
      • Methods inherited from class java.lang.Object

        equals, getClass, hashCode, notify, notifyAll, wait, wait, wait
    • Constructor Detail

      • SimpleCart

        public SimpleCart()
    • Method Detail

      • globalInfo

        public java.lang.String globalInfo()
        Return a description suitable for displaying in the explorer/experimenter.
        Returns:
        a description suitable for displaying in the explorer/experimenter
      • getTechnicalInformation

        public TechnicalInformation getTechnicalInformation()
        Returns an instance of a TechnicalInformation object, containing detailed information about the technical background of this class, e.g., paper reference or book this class is based on.
        Specified by:
        getTechnicalInformation in interface TechnicalInformationHandler
        Returns:
        the technical information about this class
      • buildClassifier

        public void buildClassifier​(Instances data)
                             throws java.lang.Exception
        Build the classifier.
        Specified by:
        buildClassifier in class Classifier
        Parameters:
        data - the training instances
        Throws:
        java.lang.Exception - if something goes wrong
      • prune

        public void prune​(double alpha)
                   throws java.lang.Exception
        Prunes the original tree using the CART pruning scheme, given a cost-complexity parameter alpha.
        Parameters:
        alpha - the cost-complexity parameter
        Throws:
        java.lang.Exception - if something goes wrong
      • prune

        public int prune​(double[] alphas,
                         double[] errors,
                         Instances test)
                  throws java.lang.Exception
        Method for performing one fold in the cross-validation of minimal cost-complexity pruning. Generates a sequence of alpha-values with error estimates for the corresponding (partially pruned) trees, given the test set of that fold.
        Parameters:
        alphas - array to hold the generated alpha-values
        errors - array to hold the corresponding error estimates
        test - test set of that fold (to obtain error estimates)
        Returns:
        the iteration of the pruning
        Throws:
        java.lang.Exception - if something goes wrong
      • modelErrors

        public void modelErrors()
                         throws java.lang.Exception
        Updates the numIncorrectModel field for all nodes when subtree (to be pruned) is rooted. This is needed for calculating the alpha-values.
        Throws:
        java.lang.Exception - if something goes wrong
      • treeErrors

        public void treeErrors()
                        throws java.lang.Exception
        Updates the numIncorrectTree field for all nodes. This is needed for calculating the alpha-values.
        Throws:
        java.lang.Exception - if something goes wrong
      • calculateAlphas

        public void calculateAlphas()
                             throws java.lang.Exception
        Updates the alpha field for all nodes.
        Throws:
        java.lang.Exception - if something goes wrong
      • distributionForInstance

        public double[] distributionForInstance​(Instance instance)
                                         throws java.lang.Exception
        Computes class probabilities for instance using the decision tree.
        Overrides:
        distributionForInstance in class Classifier
        Parameters:
        instance - the instance for which class probabilities is to be computed
        Returns:
        the class probabilities for the given instance
        Throws:
        java.lang.Exception - if something goes wrong
      • toString

        public java.lang.String toString()
        Prints the decision tree using the protected toString method from below.
        Overrides:
        toString in class java.lang.Object
        Returns:
        a textual description of the classifier
      • numNodes

        public int numNodes()
        Compute size of the tree.
        Returns:
        size of the tree
      • numInnerNodes

        public int numInnerNodes()
        Method to count the number of inner nodes in the tree.
        Returns:
        the number of inner nodes
      • numLeaves

        public int numLeaves()
        Compute number of leaf nodes.
        Returns:
        number of leaf nodes
      • setOptions

        public void setOptions​(java.lang.String[] options)
                        throws java.lang.Exception
        Parses a given list of options.

        Valid options are:

         -S <num>
          Random number seed.
          (default 1)
         -D
          If set, classifier is run in debug mode and
          may output additional info to the console
         -M <min no>
          The minimal number of instances at the terminal nodes.
          (default 2)
         -N <num folds>
          The number of folds used in the minimal cost-complexity pruning.
          (default 5)
         -U
          Don't use the minimal cost-complexity pruning.
          (default yes).
         -H
          Don't use the heuristic method for binary split.
          (default true).
         -A
          Use 1 SE rule to make pruning decision.
          (default no).
         -C
          Percentage of training data size (0-1].
          (default 1).
        Specified by:
        setOptions in interface OptionHandler
        Overrides:
        setOptions in class RandomizableClassifier
        Parameters:
        options - the list of options as an array of strings
        Throws:
        java.lang.Exception - if an options is not supported
      • enumerateMeasures

        public java.util.Enumeration enumerateMeasures()
        Return an enumeration of the measure names.
        Specified by:
        enumerateMeasures in interface AdditionalMeasureProducer
        Returns:
        an enumeration of the measure names
      • measureTreeSize

        public double measureTreeSize()
        Return number of tree size.
        Returns:
        number of tree size
      • getMeasure

        public double getMeasure​(java.lang.String additionalMeasureName)
        Returns the value of the named measure.
        Specified by:
        getMeasure in interface AdditionalMeasureProducer
        Parameters:
        additionalMeasureName - the name of the measure to query for its value
        Returns:
        the value of the named measure
        Throws:
        java.lang.IllegalArgumentException - if the named measure is not supported
      • minNumObjTipText

        public java.lang.String minNumObjTipText()
        Returns the tip text for this property
        Returns:
        tip text for this property suitable for displaying in the explorer/experimenter gui
      • setMinNumObj

        public void setMinNumObj​(double value)
        Set minimal number of instances at the terminal nodes.
        Parameters:
        value - minimal number of instances at the terminal nodes
      • getMinNumObj

        public double getMinNumObj()
        Get minimal number of instances at the terminal nodes.
        Returns:
        minimal number of instances at the terminal nodes
      • numFoldsPruningTipText

        public java.lang.String numFoldsPruningTipText()
        Returns the tip text for this property
        Returns:
        tip text for this property suitable for displaying in the explorer/experimenter gui
      • setNumFoldsPruning

        public void setNumFoldsPruning​(int value)
        Set number of folds in internal cross-validation.
        Parameters:
        value - number of folds in internal cross-validation.
      • getNumFoldsPruning

        public int getNumFoldsPruning()
        Set number of folds in internal cross-validation.
        Returns:
        number of folds in internal cross-validation.
      • usePruneTipText

        public java.lang.String usePruneTipText()
        Return the tip text for this property
        Returns:
        tip text for this property suitable for displaying in the explorer/experimenter gui.
      • setUsePrune

        public void setUsePrune​(boolean value)
        Set if use minimal cost-complexity pruning.
        Parameters:
        value - if use minimal cost-complexity pruning
      • getUsePrune

        public boolean getUsePrune()
        Get if use minimal cost-complexity pruning.
        Returns:
        if use minimal cost-complexity pruning
      • heuristicTipText

        public java.lang.String heuristicTipText()
        Returns the tip text for this property
        Returns:
        tip text for this property suitable for displaying in the explorer/experimenter gui.
      • setHeuristic

        public void setHeuristic​(boolean value)
        Set if use heuristic search for nominal attributes in multi-class problems.
        Parameters:
        value - if use heuristic search for nominal attributes in multi-class problems
      • getHeuristic

        public boolean getHeuristic()
        Get if use heuristic search for nominal attributes in multi-class problems.
        Returns:
        if use heuristic search for nominal attributes in multi-class problems
      • useOneSETipText

        public java.lang.String useOneSETipText()
        Returns the tip text for this property
        Returns:
        tip text for this property suitable for displaying in the explorer/experimenter gui.
      • setUseOneSE

        public void setUseOneSE​(boolean value)
        Set if use the 1SE rule to choose final model.
        Parameters:
        value - if use the 1SE rule to choose final model
      • getUseOneSE

        public boolean getUseOneSE()
        Get if use the 1SE rule to choose final model.
        Returns:
        if use the 1SE rule to choose final model
      • sizePerTipText

        public java.lang.String sizePerTipText()
        Returns the tip text for this property
        Returns:
        tip text for this property suitable for displaying in the explorer/experimenter gui.
      • setSizePer

        public void setSizePer​(double value)
        Set training set size.
        Parameters:
        value - training set size
      • getSizePer

        public double getSizePer()
        Get training set size.
        Returns:
        training set size
      • main

        public static void main​(java.lang.String[] args)
        Main method.
        Parameters:
        args - the options for the classifier