Class FarthestFirst

  • All Implemented Interfaces:
    java.io.Serializable, java.lang.Cloneable, Clusterer, CapabilitiesHandler, OptionHandler, Randomizable, RevisionHandler, TechnicalInformationHandler

    public class FarthestFirst
    extends RandomizableClusterer
    implements TechnicalInformationHandler
    Cluster data using the FarthestFirst algorithm.

    For more information see:

    Hochbaum, Shmoys (1985). A best possible heuristic for the k-center problem. Mathematics of Operations Research. 10(2):180-184.

    Sanjoy Dasgupta: Performance Guarantees for Hierarchical Clustering. In: 15th Annual Conference on Computational Learning Theory, 351-363, 2002.

    Notes:
    - works as a fast simple approximate clusterer
    - modelled after SimpleKMeans, might be a useful initializer for it

    BibTeX:

     @article{Hochbaum1985,
        author = {Hochbaum and Shmoys},
        journal = {Mathematics of Operations Research},
        number = {2},
        pages = {180-184},
        title = {A best possible heuristic for the k-center problem},
        volume = {10},
        year = {1985}
     }
     
     @inproceedings{Dasgupta2002,
        author = {Sanjoy Dasgupta},
        booktitle = {15th Annual Conference on Computational Learning Theory},
        pages = {351-363},
        publisher = {Springer},
        title = {Performance Guarantees for Hierarchical Clustering},
        year = {2002}
     }
     

    Valid options are:

     -N <num>
      number of clusters. (default = 2).
     -S <num>
      Random number seed.
      (default 1)
    Version:
    $Revision: 5538 $
    Author:
    Bernhard Pfahringer (bernhard@cs.waikato.ac.nz)
    See Also:
    RandomizableClusterer, Serialized Form
    • Constructor Detail

      • FarthestFirst

        public FarthestFirst()
    • Method Detail

      • globalInfo

        public java.lang.String globalInfo()
        Returns a string describing this clusterer
        Returns:
        a description of the evaluator suitable for displaying in the explorer/experimenter gui
      • getTechnicalInformation

        public TechnicalInformation getTechnicalInformation()
        Returns an instance of a TechnicalInformation object, containing detailed information about the technical background of this class, e.g., paper reference or book this class is based on.
        Specified by:
        getTechnicalInformation in interface TechnicalInformationHandler
        Returns:
        the technical information about this class
      • buildClusterer

        public void buildClusterer​(Instances data)
                            throws java.lang.Exception
        Generates a clusterer. Has to initialize all fields of the clusterer that are not being set via options.
        Specified by:
        buildClusterer in interface Clusterer
        Specified by:
        buildClusterer in class AbstractClusterer
        Parameters:
        data - set of instances serving as training data
        Throws:
        java.lang.Exception - if the clusterer has not been generated successfully
      • clusterInstance

        public int clusterInstance​(Instance instance)
                            throws java.lang.Exception
        Classifies a given instance.
        Specified by:
        clusterInstance in interface Clusterer
        Overrides:
        clusterInstance in class AbstractClusterer
        Parameters:
        instance - the instance to be assigned to a cluster
        Returns:
        the number of the assigned cluster as an integer if the class is enumerated, otherwise the predicted value
        Throws:
        java.lang.Exception - if instance could not be classified successfully
      • numberOfClusters

        public int numberOfClusters()
                             throws java.lang.Exception
        Returns the number of clusters.
        Specified by:
        numberOfClusters in interface Clusterer
        Specified by:
        numberOfClusters in class AbstractClusterer
        Returns:
        the number of clusters generated for a training dataset.
        Throws:
        java.lang.Exception - if number of clusters could not be returned successfully
      • listOptions

        public java.util.Enumeration listOptions()
        Returns an enumeration describing the available options.
        Specified by:
        listOptions in interface OptionHandler
        Overrides:
        listOptions in class RandomizableClusterer
        Returns:
        an enumeration of all the available options.
      • numClustersTipText

        public java.lang.String numClustersTipText()
        Returns the tip text for this property
        Returns:
        tip text for this property suitable for displaying in the explorer/experimenter gui
      • setNumClusters

        public void setNumClusters​(int n)
                            throws java.lang.Exception
        set the number of clusters to generate
        Parameters:
        n - the number of clusters to generate
        Throws:
        java.lang.Exception - if number of clusters is negative
      • getNumClusters

        public int getNumClusters()
        gets the number of clusters to generate
        Returns:
        the number of clusters to generate
      • setOptions

        public void setOptions​(java.lang.String[] options)
                        throws java.lang.Exception
        Parses a given list of options.

        Valid options are:

         -N <num>
          number of clusters. (default = 2).
         -S <num>
          Random number seed.
          (default 1)
        Specified by:
        setOptions in interface OptionHandler
        Overrides:
        setOptions in class RandomizableClusterer
        Parameters:
        options - the list of options as an array of strings
        Throws:
        java.lang.Exception - if an option is not supported
      • getOptions

        public java.lang.String[] getOptions()
        Gets the current settings of FarthestFirst
        Specified by:
        getOptions in interface OptionHandler
        Overrides:
        getOptions in class RandomizableClusterer
        Returns:
        an array of strings suitable for passing to setOptions()
      • toString

        public java.lang.String toString()
        return a string describing this clusterer
        Overrides:
        toString in class java.lang.Object
        Returns:
        a description of the clusterer as a string
      • main

        public static void main​(java.lang.String[] argv)
        Main method for testing this class.
        Parameters:
        argv - should contain the following arguments:

        -t training file [-N number of clusters]