Interface MatchKit


  • public interface MatchKit
    Performs the operations required for object matching.

    This interface consists of two methods. One tests whether two tuples count as matching or not, and assigns a closeness score if they are (in practice, this is likely to compare corresponding elements of the two submitted tuples allowing for some error in each one). The second is a bit more subtle: it must identify a set of bins into which possible matches for the tuple might fall. For the case of coordinate matching with errors, you would need to chop the whole possible space into a discrete set of zones, each with a given key, and return the key for each zone near enough to the submitted tuple (point) that it might contain a match for it.

    Formally, the requirements for correct implementations of this interface are as follows:

    1. matchScore(t1,t2) == matchScore(t2,t1)
    2. matchScore(t1,t2)>=0 implies a non-zero intersection of getBins(t1) and getBins(t2)
    The best efficiency will be achieved when:
    1. the intersection of getBins(t1) and getBins(t2) is as small as possible for non-matching t1 and t2 (preferably 0)
    2. the number of bins returned by getBins is as small as possible (preferably 1)
    These two efficiency requirements are usually conflicting to some extent.

    It may help to think of all this as a sort of fuzzy hash.

    Instances of this class are not thread-safe, and should not be used from multiple threads concurrently.

    Since:
    9 May 2022
    Author:
    Mark Taylor
    • Field Summary

      Fields 
      Modifier and Type Field Description
      static java.lang.Object[] NO_BINS
      Convenience constant - it's a zero-length array of objects, suitable for returning from getBins(java.lang.Object[]) if no match can result.
    • Method Summary

      All Methods Instance Methods Abstract Methods 
      Modifier and Type Method Description
      java.lang.Object[] getBins​(java.lang.Object[] tuple)
      Returns a set of keys for bins into which possible matches for a given tuple might fall.
      double matchScore​(java.lang.Object[] tuple1, java.lang.Object[] tuple2)
      Indicates whether two tuples count as matching each other, and if so how closely.
    • Field Detail

      • NO_BINS

        static final java.lang.Object[] NO_BINS
        Convenience constant - it's a zero-length array of objects, suitable for returning from getBins(java.lang.Object[]) if no match can result.
    • Method Detail

      • getBins

        java.lang.Object[] getBins​(java.lang.Object[] tuple)
        Returns a set of keys for bins into which possible matches for a given tuple might fall. The returned objects can be anything, but should have their equals and hashCode methods implemented properly for comparison.
        Parameters:
        tuple - tuple
        Returns:
        set of bin keys which might be returned by invoking this method on other tuples which count as matches for the submitted tuple
      • matchScore

        double matchScore​(java.lang.Object[] tuple1,
                          java.lang.Object[] tuple2)
        Indicates whether two tuples count as matching each other, and if so how closely. If tuple1 and tuple2 are considered as a matching pair, then a non-negative value should be returned indicating how close the match is - the higher the number the worse the match, and a return value of zero indicates a 'perfect' match. If the two tuples do not consitute a matching pair, then a negative number (conventionally -1.0) should be returned. This return value can be thought of as (and will often correspond physically with) the distance in some real or notional space between the points represented by the two submitted tuples.

        If there's no reason to do otherwise, the range 0..1 is recommended for successul matches. However, if the result has some sort of physical meaning (such as a distance in real space) that may be used instead.

        Parameters:
        tuple1 - one tuple
        tuple2 - the other tuple
        Returns:
        'distance' between tuple1 and tuple2; 0 is a perfect match, larger values indicate worse matches, negative values indicate no match