Class ThresholdSelector

  • All Implemented Interfaces:
    java.io.Serializable, java.lang.Cloneable, CapabilitiesHandler, Drawable, OptionHandler, Randomizable, RevisionHandler

    public class ThresholdSelector
    extends RandomizableSingleClassifierEnhancer
    implements OptionHandler, Drawable
    A metaclassifier that selecting a mid-point threshold on the probability output by a Classifier. The midpoint threshold is set so that a given performance measure is optimized. Currently this is the F-measure. Performance is measured either on the training data, a hold-out set or using cross-validation. In addition, the probabilities returned by the base learner can have their range expanded so that the output probabilities will reside between 0 and 1 (this is useful if the scheme normally produces probabilities in a very narrow range).

    Valid options are:

     -C <integer>
      The class for which threshold is determined. Valid values are:
      1, 2 (for first and second classes, respectively), 3 (for whichever
      class is least frequent), and 4 (for whichever class value is most
      frequent), and 5 (for the first class named any of "yes","pos(itive)"
      "1", or method 3 if no matches). (default 5).
     -X <number of folds>
      Number of folds used for cross validation. If just a
      hold-out set is used, this determines the size of the hold-out set
      (default 3).
     -R <integer>
      Sets whether confidence range correction is applied. This
      can be used to ensure the confidences range from 0 to 1.
      Use 0 for no range correction, 1 for correction based on
      the min/max values seen during threshold selection
      (default 0).
     -E <integer>
      Sets the evaluation mode. Use 0 for
      evaluation using cross-validation,
      1 for evaluation using hold-out set,
      and 2 for evaluation on the
      training data (default 1).
     -M [FMEASURE|ACCURACY|TRUE_POS|TRUE_NEG|TP_RATE|PRECISION|RECALL]
      Measure used for evaluation (default is FMEASURE).
     
     -manual <real>
      Set a manual threshold to use. This option overrides
      automatic selection and options pertaining to
      automatic selection will be ignored.
      (default -1, i.e. do not use a manual threshold).
     -S <num>
      Random number seed.
      (default 1)
     -D
      If set, classifier is run in debug mode and
      may output additional info to the console
     -W
      Full name of base classifier.
      (default: weka.classifiers.functions.Logistic)
     
     Options specific to classifier weka.classifiers.functions.Logistic:
     
     -D
      Turn on debugging output.
     -R <ridge>
      Set the ridge in the log-likelihood.
     -M <number>
      Set the maximum number of iterations (default -1, until convergence).
    Options after -- are passed to the designated sub-classifier.

    Version:
    $Revision: 1.43 $
    Author:
    Eibe Frank (eibe@cs.waikato.ac.nz)
    See Also:
    Serialized Form
    • Field Detail

      • RANGE_BOUNDS

        public static final int RANGE_BOUNDS
        Correct based on min/max observed
        See Also:
        Constant Field Values
      • TAGS_RANGE

        public static final Tag[] TAGS_RANGE
        Type of correction applied to threshold range
      • EVAL_TRAINING_SET

        public static final int EVAL_TRAINING_SET
        entire training set
        See Also:
        Constant Field Values
      • EVAL_TUNED_SPLIT

        public static final int EVAL_TUNED_SPLIT
        single tuned fold
        See Also:
        Constant Field Values
      • EVAL_CROSS_VALIDATION

        public static final int EVAL_CROSS_VALIDATION
        n-fold cross-validation
        See Also:
        Constant Field Values
      • TAGS_EVAL

        public static final Tag[] TAGS_EVAL
        The evaluation modes
      • OPTIMIZE_LFREQ

        public static final int OPTIMIZE_LFREQ
        least frequent class value
        See Also:
        Constant Field Values
      • OPTIMIZE_MFREQ

        public static final int OPTIMIZE_MFREQ
        most frequent class value
        See Also:
        Constant Field Values
      • OPTIMIZE_POS_NAME

        public static final int OPTIMIZE_POS_NAME
        class value name, either 'yes' or 'pos(itive)'
        See Also:
        Constant Field Values
      • TAGS_OPTIMIZE

        public static final Tag[] TAGS_OPTIMIZE
        How to determine which class value to optimize for
      • TAGS_MEASURE

        public static final Tag[] TAGS_MEASURE
        the measure to use
    • Constructor Detail

      • ThresholdSelector

        public ThresholdSelector()
        Constructor.
    • Method Detail

      • measureTipText

        public java.lang.String measureTipText()
        Tooltip for this property.
        Returns:
        tip text for this property suitable for displaying in the explorer/experimenter gui
      • setMeasure

        public void setMeasure​(SelectedTag newMeasure)
        set measure used for determining threshold
        Parameters:
        newMeasure - Tag representing measure to be used
      • getMeasure

        public SelectedTag getMeasure()
        get measure used for determining threshold
        Returns:
        Tag representing measure used
      • setOptions

        public void setOptions​(java.lang.String[] options)
                        throws java.lang.Exception
        Parses a given list of options.

        Valid options are:

         -C <integer>
          The class for which threshold is determined. Valid values are:
          1, 2 (for first and second classes, respectively), 3 (for whichever
          class is least frequent), and 4 (for whichever class value is most
          frequent), and 5 (for the first class named any of "yes","pos(itive)"
          "1", or method 3 if no matches). (default 5).
         -X <number of folds>
          Number of folds used for cross validation. If just a
          hold-out set is used, this determines the size of the hold-out set
          (default 3).
         -R <integer>
          Sets whether confidence range correction is applied. This
          can be used to ensure the confidences range from 0 to 1.
          Use 0 for no range correction, 1 for correction based on
          the min/max values seen during threshold selection
          (default 0).
         -E <integer>
          Sets the evaluation mode. Use 0 for
          evaluation using cross-validation,
          1 for evaluation using hold-out set,
          and 2 for evaluation on the
          training data (default 1).
         -M [FMEASURE|ACCURACY|TRUE_POS|TRUE_NEG|TP_RATE|PRECISION|RECALL]
          Measure used for evaluation (default is FMEASURE).
         
         -manual <real>
          Set a manual threshold to use. This option overrides
          automatic selection and options pertaining to
          automatic selection will be ignored.
          (default -1, i.e. do not use a manual threshold).
         -S <num>
          Random number seed.
          (default 1)
         -D
          If set, classifier is run in debug mode and
          may output additional info to the console
         -W
          Full name of base classifier.
          (default: weka.classifiers.functions.Logistic)
         
         Options specific to classifier weka.classifiers.functions.Logistic:
         
         -D
          Turn on debugging output.
         -R <ridge>
          Set the ridge in the log-likelihood.
         -M <number>
          Set the maximum number of iterations (default -1, until convergence).
        Options after -- are passed to the designated sub-classifier.

        Specified by:
        setOptions in interface OptionHandler
        Overrides:
        setOptions in class RandomizableSingleClassifierEnhancer
        Parameters:
        options - the list of options as an array of strings
        Throws:
        java.lang.Exception - if an option is not supported
      • buildClassifier

        public void buildClassifier​(Instances instances)
                             throws java.lang.Exception
        Generates the classifier.
        Specified by:
        buildClassifier in class Classifier
        Parameters:
        instances - set of instances serving as training data
        Throws:
        java.lang.Exception - if the classifier has not been generated successfully
      • distributionForInstance

        public double[] distributionForInstance​(Instance instance)
                                         throws java.lang.Exception
        Calculates the class membership probabilities for the given test instance.
        Overrides:
        distributionForInstance in class Classifier
        Parameters:
        instance - the instance to be classified
        Returns:
        predicted class probability distribution
        Throws:
        java.lang.Exception - if instance could not be classified successfully
      • globalInfo

        public java.lang.String globalInfo()
        Returns:
        a description of the classifier suitable for displaying in the explorer/experimenter gui
      • designatedClassTipText

        public java.lang.String designatedClassTipText()
        Returns:
        tip text for this property suitable for displaying in the explorer/experimenter gui
      • getDesignatedClass

        public SelectedTag getDesignatedClass()
        Gets the method to determine which class value to optimize. Will be one of OPTIMIZE_0, OPTIMIZE_1, OPTIMIZE_LFREQ, OPTIMIZE_MFREQ, OPTIMIZE_POS_NAME.
        Returns:
        the class selection mode.
      • setDesignatedClass

        public void setDesignatedClass​(SelectedTag newMethod)
        Sets the method to determine which class value to optimize. Will be one of OPTIMIZE_0, OPTIMIZE_1, OPTIMIZE_LFREQ, OPTIMIZE_MFREQ, OPTIMIZE_POS_NAME.
        Parameters:
        newMethod - the new class selection mode.
      • evaluationModeTipText

        public java.lang.String evaluationModeTipText()
        Returns:
        tip text for this property suitable for displaying in the explorer/experimenter gui
      • setEvaluationMode

        public void setEvaluationMode​(SelectedTag newMethod)
        Sets the evaluation mode used. Will be one of EVAL_TRAINING, EVAL_TUNED_SPLIT, or EVAL_CROSS_VALIDATION
        Parameters:
        newMethod - the new evaluation mode.
      • getEvaluationMode

        public SelectedTag getEvaluationMode()
        Gets the evaluation mode used. Will be one of EVAL_TRAINING, EVAL_TUNED_SPLIT, or EVAL_CROSS_VALIDATION
        Returns:
        the evaluation mode.
      • rangeCorrectionTipText

        public java.lang.String rangeCorrectionTipText()
        Returns:
        tip text for this property suitable for displaying in the explorer/experimenter gui
      • setRangeCorrection

        public void setRangeCorrection​(SelectedTag newMethod)
        Sets the confidence range correction mode used. Will be one of RANGE_NONE, or RANGE_BOUNDS
        Parameters:
        newMethod - the new correciton mode.
      • getRangeCorrection

        public SelectedTag getRangeCorrection()
        Gets the confidence range correction mode used. Will be one of RANGE_NONE, or RANGE_BOUNDS
        Returns:
        the confidence correction mode.
      • numXValFoldsTipText

        public java.lang.String numXValFoldsTipText()
        Returns:
        tip text for this property suitable for displaying in the explorer/experimenter gui
      • getNumXValFolds

        public int getNumXValFolds()
        Get the number of folds used for cross-validation.
        Returns:
        the number of folds used for cross-validation.
      • setNumXValFolds

        public void setNumXValFolds​(int newNumFolds)
        Set the number of folds used for cross-validation.
        Parameters:
        newNumFolds - the number of folds used for cross-validation.
      • graphType

        public int graphType()
        Returns the type of graph this classifier represents.
        Specified by:
        graphType in interface Drawable
        Returns:
        the type of graph this classifier represents
      • graph

        public java.lang.String graph()
                               throws java.lang.Exception
        Returns graph describing the classifier (if possible).
        Specified by:
        graph in interface Drawable
        Returns:
        the graph of the classifier in dotty format
        Throws:
        java.lang.Exception - if the classifier cannot be graphed
      • manualThresholdValueTipText

        public java.lang.String manualThresholdValueTipText()
        Returns:
        tip text for this property suitable for displaying in the explorer/experimenter gui
      • setManualThresholdValue

        public void setManualThresholdValue​(double threshold)
                                     throws java.lang.Exception
        Sets the value for a manual threshold. If this option is set (non-negative value between 0 and 1), then options pertaining to automatic threshold selection are ignored.
        Parameters:
        threshold - the manual threshold to use
        Throws:
        java.lang.Exception
      • getManualThresholdValue

        public double getManualThresholdValue()
        Returns the value of the manual threshold. (a negative value indicates that no manual threshold is being used.
        Returns:
        the value of the manual threshold.
      • toString

        public java.lang.String toString()
        Returns description of the cross-validated classifier.
        Overrides:
        toString in class java.lang.Object
        Returns:
        description of the cross-validated classifier as a string
      • main

        public static void main​(java.lang.String[] argv)
        Main method for testing this class.
        Parameters:
        argv - the options