Class InterquartileRange

  • All Implemented Interfaces:
    java.io.Serializable, CapabilitiesHandler, OptionHandler, RevisionHandler

    public class InterquartileRange
    extends SimpleBatchFilter
    A filter for detecting outliers and extreme values based on interquartile ranges. The filter skips the class attribute.

    Outliers:
    Q3 + OF*IQR < x <= Q3 + EVF*IQR
    or
    Q1 - EVF*IQR <= x < Q1 - OF*IQR

    Extreme values:
    x > Q3 + EVF*IQR
    or
    x < Q1 - EVF*IQR

    Key:
    Q1 = 25% quartile
    Q3 = 75% quartile
    IQR = Interquartile Range, difference between Q1 and Q3
    OF = Outlier Factor
    EVF = Extreme Value Factor

    Valid options are:

     -D
      Turns on output of debugging information.
     -R <col1,col2-col4,...>
      Specifies list of columns to base outlier/extreme value detection
      on. If an instance is considered in at least one of those
      attributes an outlier/extreme value, it is tagged accordingly.
      'first' and 'last' are valid indexes.
      (default none)
     -O <num>
      The factor for outlier detection.
      (default: 3)
     -E <num>
      The factor for extreme values detection.
      (default: 2*Outlier Factor)
     -E-as-O
      Tags extreme values also as outliers.
      (default: off)
     -P
      Generates Outlier/ExtremeValue pair for each numeric attribute in
      the range, not just a single indicator pair for all the attributes.
      (default: off)
     -M
      Generates an additional attribute 'Offset' per Outlier/ExtremeValue
      pair that contains the multiplier that the value is off the median.
         value = median + 'multiplier' * IQR
     Note: implicitely sets '-P'. (default: off)
    Thanks to Dale for a few brainstorming sessions.
    Version:
    $Revision: 9529 $
    Author:
    Dale Fletcher (dale at cs dot waikato dot ac dot nz), fracpete (fracpete at waikato dot ac dot nz)
    See Also:
    Serialized Form
    • Field Detail

      • NON_NUMERIC

        public static final int NON_NUMERIC
        indicator for non-numeric attributes
        See Also:
        Constant Field Values
    • Constructor Detail

      • InterquartileRange

        public InterquartileRange()
    • Method Detail

      • globalInfo

        public java.lang.String globalInfo()
        Returns a string describing this filter
        Specified by:
        globalInfo in class SimpleFilter
        Returns:
        a description of the filter suitable for displaying in the explorer/experimenter gui
      • listOptions

        public java.util.Enumeration listOptions()
        Returns an enumeration describing the available options.
        Specified by:
        listOptions in interface OptionHandler
        Overrides:
        listOptions in class SimpleFilter
        Returns:
        an enumeration of all the available options.
      • setOptions

        public void setOptions​(java.lang.String[] options)
                        throws java.lang.Exception
        Parses a list of options for this object.

        Valid options are:

         -D
          Turns on output of debugging information.
         -R <col1,col2-col4,...>
          Specifies list of columns to base outlier/extreme value detection
          on. If an instance is considered in at least one of those
          attributes an outlier/extreme value, it is tagged accordingly.
          'first' and 'last' are valid indexes.
          (default none)
         -O <num>
          The factor for outlier detection.
          (default: 3)
         -E <num>
          The factor for extreme values detection.
          (default: 2*Outlier Factor)
         -E-as-O
          Tags extreme values also as outliers.
          (default: off)
         -P
          Generates Outlier/ExtremeValue pair for each numeric attribute in
          the range, not just a single indicator pair for all the attributes.
          (default: off)
         -M
          Generates an additional attribute 'Offset' per Outlier/ExtremeValue
          pair that contains the multiplier that the value is off the median.
             value = median + 'multiplier' * IQR
         Note: implicitely sets '-P'. (default: off)
        Specified by:
        setOptions in interface OptionHandler
        Overrides:
        setOptions in class SimpleFilter
        Parameters:
        options - the list of options as an array of strings
        Throws:
        java.lang.Exception - if an option is not supported
        See Also:
        SimpleFilter.reset()
      • getOptions

        public java.lang.String[] getOptions()
        Gets the current settings of the filter.
        Specified by:
        getOptions in interface OptionHandler
        Overrides:
        getOptions in class SimpleFilter
        Returns:
        an array of strings suitable for passing to setOptions
      • attributeIndicesTipText

        public java.lang.String attributeIndicesTipText()
        Returns the tip text for this property
        Returns:
        tip text for this property suitable for displaying in the explorer/experimenter gui
      • getAttributeIndices

        public java.lang.String getAttributeIndices()
        Gets the current range selection
        Returns:
        a string containing a comma separated list of ranges
      • setAttributeIndices

        public void setAttributeIndices​(java.lang.String value)
        Sets which attributes are to be used for interquartile calculations and outlier/extreme value detection (only numeric attributes among the selection will be used).
        Parameters:
        value - a string representing the list of attributes. Since the string will typically come from a user, attributes are indexed from 1.
        eg: first-3,5,6-last
        Throws:
        java.lang.IllegalArgumentException - if an invalid range list is supplied
      • setAttributeIndicesArray

        public void setAttributeIndicesArray​(int[] value)
        Sets which attributes are to be used for interquartile calculations and outlier/extreme value detection (only numeric attributes among the selection will be used).
        Parameters:
        value - an array containing indexes of attributes to work on. Since the array will typically come from a program, attributes are indexed from 0.
        Throws:
        java.lang.IllegalArgumentException - if an invalid set of ranges is supplied
      • outlierFactorTipText

        public java.lang.String outlierFactorTipText()
        Returns the tip text for this property
        Returns:
        tip text for this property suitable for displaying in the explorer/experimenter gui
      • setOutlierFactor

        public void setOutlierFactor​(double value)
        Sets the factor for determining the thresholds for outliers.
        Parameters:
        value - the factor.
      • getOutlierFactor

        public double getOutlierFactor()
        Gets the factor for determining the thresholds for outliers.
        Returns:
        the factor.
      • extremeValuesFactorTipText

        public java.lang.String extremeValuesFactorTipText()
        Returns the tip text for this property
        Returns:
        tip text for this property suitable for displaying in the explorer/experimenter gui
      • setExtremeValuesFactor

        public void setExtremeValuesFactor​(double value)
        Sets the factor for determining the thresholds for extreme values.
        Parameters:
        value - the factor.
      • getExtremeValuesFactor

        public double getExtremeValuesFactor()
        Gets the factor for determining the thresholds for extreme values.
        Returns:
        the factor.
      • extremeValuesAsOutliersTipText

        public java.lang.String extremeValuesAsOutliersTipText()
        Returns the tip text for this property
        Returns:
        tip text for this property suitable for displaying in the explorer/experimenter gui
      • setExtremeValuesAsOutliers

        public void setExtremeValuesAsOutliers​(boolean value)
        Set whether extreme values are also tagged as outliers.
        Parameters:
        value - whether or not to tag extreme values also as outliers.
      • getExtremeValuesAsOutliers

        public boolean getExtremeValuesAsOutliers()
        Get whether extreme values are also tagged as outliers.
        Returns:
        true if extreme values are also tagged as outliers.
      • detectionPerAttributeTipText

        public java.lang.String detectionPerAttributeTipText()
        Returns the tip text for this property
        Returns:
        tip text for this property suitable for displaying in the explorer/experimenter gui
      • setDetectionPerAttribute

        public void setDetectionPerAttribute​(boolean value)
        Set whether an Outlier/ExtremeValue attribute pair is generated for each numeric attribute ("true") or just one pair for all numeric attributes together ("false").
        Parameters:
        value - whether or not to generate indicator attribute pairs for each numeric attribute.
      • getDetectionPerAttribute

        public boolean getDetectionPerAttribute()
        Gets whether an Outlier/ExtremeValue attribute pair is generated for each numeric attribute ("true") or just one pair for all numeric attributes together ("false").
        Returns:
        true if indicator attribute pairs are generated for each numeric attribute.
      • outputOffsetMultiplierTipText

        public java.lang.String outputOffsetMultiplierTipText()
        Returns the tip text for this property
        Returns:
        tip text for this property suitable for displaying in the explorer/experimenter gui
      • setOutputOffsetMultiplier

        public void setOutputOffsetMultiplier​(boolean value)
        Set whether an additional attribute "Offset" is generated per Outlier/ExtremeValue attribute pair that lists the multiplier the value is off the median: value = median + 'multiplier' * IQR.
        Parameters:
        value - whether or not to generate the additional attribute.
      • getOutputOffsetMultiplier

        public boolean getOutputOffsetMultiplier()
        Gets whether an additional attribute "Offset" is generated per Outlier/ExtremeValue attribute pair that lists the multiplier the value is off the median: value = median + 'multiplier' * IQR.
        Returns:
        true if the additional attribute is generated.
      • main

        public static void main​(java.lang.String[] args)
        Main method for testing this class.
        Parameters:
        args - should contain arguments to the filter: use -h for help