weka.filters.supervised.attribute
Class Discretize

java.lang.Object
  extended by weka.filters.Filter
      extended by weka.filters.supervised.attribute.Discretize
All Implemented Interfaces:
java.io.Serializable, CapabilitiesHandler, OptionHandler, RevisionHandler, TechnicalInformationHandler, WeightedInstancesHandler, SupervisedFilter

public class Discretize
extends Filter
implements SupervisedFilter, OptionHandler, WeightedInstancesHandler, TechnicalInformationHandler

An instance filter that discretizes a range of numeric attributes in the dataset into nominal attributes. Discretization is by Fayyad & Irani's MDL method (the default).

For more information, see:

Usama M. Fayyad, Keki B. Irani: Multi-interval discretization of continuousvalued attributes for classification learning. In: Thirteenth International Joint Conference on Articial Intelligence, 1022-1027, 1993.

Igor Kononenko: On Biases in Estimating Multi-Valued Attributes. In: 14th International Joint Conference on Articial Intelligence, 1034-1040, 1995.

BibTeX:

 @inproceedings{Fayyad1993,
    author = {Usama M. Fayyad and Keki B. Irani},
    booktitle = {Thirteenth International Joint Conference on Articial Intelligence},
    pages = {1022-1027},
    publisher = {Morgan Kaufmann Publishers},
    title = {Multi-interval discretization of continuousvalued attributes for classification learning},
    volume = {2},
    year = {1993}
 }
 
 @inproceedings{Kononenko1995,
    author = {Igor Kononenko},
    booktitle = {14th International Joint Conference on Articial Intelligence},
    pages = {1034-1040},
    title = {On Biases in Estimating Multi-Valued Attributes},
    year = {1995},
    PS = {http://ai.fri.uni-lj.si/papers/kononenko95-ijcai.ps.gz}
 }
 

Valid options are:

 -R <col1,col2-col4,...>
  Specifies list of columns to Discretize. First and last are valid indexes.
  (default none)
 -V
  Invert matching sense of column indexes.
 -D
  Output binary attributes for discretized attributes.
 -E
  Use better encoding of split point for MDL.
 -K
  Use Kononenko's MDL criterion.

Version:
$Revision: 6565 $
Author:
Len Trigg (trigg@cs.waikato.ac.nz), Eibe Frank (eibe@cs.waikato.ac.nz)
See Also:
Serialized Form

Constructor Summary
Discretize()
          Constructor - initialises the filter
 
Method Summary
 java.lang.String attributeIndicesTipText()
          Returns the tip text for this property
 boolean batchFinished()
          Signifies that this batch of input to the filter is finished.
 java.lang.String getAttributeIndices()
          Gets the current range selection
 Capabilities getCapabilities()
          Returns the Capabilities of this filter.
 double[] getCutPoints(int attributeIndex)
          Gets the cut points for an attribute
 boolean getInvertSelection()
          Gets whether the supplied columns are to be removed or kept
 boolean getMakeBinary()
          Gets whether binary attributes should be made for discretized ones.
 java.lang.String[] getOptions()
          Gets the current settings of the filter.
 java.lang.String getRevision()
          Returns the revision string.
 TechnicalInformation getTechnicalInformation()
          Returns an instance of a TechnicalInformation object, containing detailed information about the technical background of this class, e.g., paper reference or book this class is based on.
 boolean getUseBetterEncoding()
          Gets whether better encoding is to be used for MDL.
 boolean getUseKononenko()
          Gets whether Kononenko's MDL criterion is to be used.
 java.lang.String globalInfo()
          Returns a string describing this filter
 boolean input(Instance instance)
          Input an instance for filtering.
 java.lang.String invertSelectionTipText()
          Returns the tip text for this property
 java.util.Enumeration listOptions()
          Gets an enumeration describing the available options.
static void main(java.lang.String[] argv)
          Main method for testing this class.
 java.lang.String makeBinaryTipText()
          Returns the tip text for this property
 void setAttributeIndices(java.lang.String rangeList)
          Sets which attributes are to be Discretized (only numeric attributes among the selection will be Discretized).
 void setAttributeIndicesArray(int[] attributes)
          Sets which attributes are to be Discretized (only numeric attributes among the selection will be Discretized).
 boolean setInputFormat(Instances instanceInfo)
          Sets the format of the input instances.
 void setInvertSelection(boolean invert)
          Sets whether selected columns should be removed or kept.
 void setMakeBinary(boolean makeBinary)
          Sets whether binary attributes should be made for discretized ones.
 void setOptions(java.lang.String[] options)
          Parses a given list of options.
 void setUseBetterEncoding(boolean useBetterEncoding)
          Sets whether better encoding is to be used for MDL.
 void setUseKononenko(boolean useKon)
          Sets whether Kononenko's MDL criterion is to be used.
 java.lang.String useBetterEncodingTipText()
          Returns the tip text for this property
 java.lang.String useKononenkoTipText()
          Returns the tip text for this property
 
Methods inherited from class weka.filters.Filter
batchFilterFile, filterFile, getCapabilities, getOutputFormat, isFirstBatchDone, isNewBatch, isOutputFormatDefined, makeCopies, makeCopy, numPendingOutput, output, outputPeek, toString, useFilter, wekaStaticWrapper
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Constructor Detail

Discretize

public Discretize()
Constructor - initialises the filter

Method Detail

listOptions

public java.util.Enumeration listOptions()
Gets an enumeration describing the available options.

Specified by:
listOptions in interface OptionHandler
Returns:
an enumeration of all the available options.

setOptions

public void setOptions(java.lang.String[] options)
                throws java.lang.Exception
Parses a given list of options.

Valid options are:

 -R <col1,col2-col4,...>
  Specifies list of columns to Discretize. First and last are valid indexes.
  (default none)
 -V
  Invert matching sense of column indexes.
 -D
  Output binary attributes for discretized attributes.
 -E
  Use better encoding of split point for MDL.
 -K
  Use Kononenko's MDL criterion.

Specified by:
setOptions in interface OptionHandler
Parameters:
options - the list of options as an array of strings
Throws:
java.lang.Exception - if an option is not supported

getOptions

public java.lang.String[] getOptions()
Gets the current settings of the filter.

Specified by:
getOptions in interface OptionHandler
Returns:
an array of strings suitable for passing to setOptions

getCapabilities

public Capabilities getCapabilities()
Returns the Capabilities of this filter.

Specified by:
getCapabilities in interface CapabilitiesHandler
Overrides:
getCapabilities in class Filter
Returns:
the capabilities of this object
See Also:
Capabilities

setInputFormat

public boolean setInputFormat(Instances instanceInfo)
                       throws java.lang.Exception
Sets the format of the input instances.

Overrides:
setInputFormat in class Filter
Parameters:
instanceInfo - an Instances object containing the input instance structure (any instances contained in the object are ignored - only the structure is required).
Returns:
true if the outputFormat may be collected immediately
Throws:
java.lang.Exception - if the input format can't be set successfully

input

public boolean input(Instance instance)
Input an instance for filtering. Ordinarily the instance is processed and made available for output immediately. Some filters require all instances be read before producing output.

Overrides:
input in class Filter
Parameters:
instance - the input instance
Returns:
true if the filtered instance may now be collected with output().
Throws:
java.lang.IllegalStateException - if no input format has been defined.

batchFinished

public boolean batchFinished()
Signifies that this batch of input to the filter is finished. If the filter requires all instances prior to filtering, output() may now be called to retrieve the filtered instances.

Overrides:
batchFinished in class Filter
Returns:
true if there are instances pending output
Throws:
java.lang.IllegalStateException - if no input structure has been defined

globalInfo

public java.lang.String globalInfo()
Returns a string describing this filter

Returns:
a description of the filter suitable for displaying in the explorer/experimenter gui

getTechnicalInformation

public TechnicalInformation getTechnicalInformation()
Returns an instance of a TechnicalInformation object, containing detailed information about the technical background of this class, e.g., paper reference or book this class is based on.

Specified by:
getTechnicalInformation in interface TechnicalInformationHandler
Returns:
the technical information about this class

makeBinaryTipText

public java.lang.String makeBinaryTipText()
Returns the tip text for this property

Returns:
tip text for this property suitable for displaying in the explorer/experimenter gui

getMakeBinary

public boolean getMakeBinary()
Gets whether binary attributes should be made for discretized ones.

Returns:
true if attributes will be binarized

setMakeBinary

public void setMakeBinary(boolean makeBinary)
Sets whether binary attributes should be made for discretized ones.

Parameters:
makeBinary - if binary attributes are to be made

useKononenkoTipText

public java.lang.String useKononenkoTipText()
Returns the tip text for this property

Returns:
tip text for this property suitable for displaying in the explorer/experimenter gui

getUseKononenko

public boolean getUseKononenko()
Gets whether Kononenko's MDL criterion is to be used.

Returns:
true if Kononenko's criterion will be used.

setUseKononenko

public void setUseKononenko(boolean useKon)
Sets whether Kononenko's MDL criterion is to be used.

Parameters:
useKon - true if Kononenko's one is to be used

useBetterEncodingTipText

public java.lang.String useBetterEncodingTipText()
Returns the tip text for this property

Returns:
tip text for this property suitable for displaying in the explorer/experimenter gui

getUseBetterEncoding

public boolean getUseBetterEncoding()
Gets whether better encoding is to be used for MDL.

Returns:
true if the better MDL encoding will be used

setUseBetterEncoding

public void setUseBetterEncoding(boolean useBetterEncoding)
Sets whether better encoding is to be used for MDL.

Parameters:
useBetterEncoding - true if better encoding to be used.

invertSelectionTipText

public java.lang.String invertSelectionTipText()
Returns the tip text for this property

Returns:
tip text for this property suitable for displaying in the explorer/experimenter gui

getInvertSelection

public boolean getInvertSelection()
Gets whether the supplied columns are to be removed or kept

Returns:
true if the supplied columns will be kept

setInvertSelection

public void setInvertSelection(boolean invert)
Sets whether selected columns should be removed or kept. If true the selected columns are kept and unselected columns are deleted. If false selected columns are deleted and unselected columns are kept.

Parameters:
invert - the new invert setting

attributeIndicesTipText

public java.lang.String attributeIndicesTipText()
Returns the tip text for this property

Returns:
tip text for this property suitable for displaying in the explorer/experimenter gui

getAttributeIndices

public java.lang.String getAttributeIndices()
Gets the current range selection

Returns:
a string containing a comma separated list of ranges

setAttributeIndices

public void setAttributeIndices(java.lang.String rangeList)
Sets which attributes are to be Discretized (only numeric attributes among the selection will be Discretized).

Parameters:
rangeList - a string representing the list of attributes. Since the string will typically come from a user, attributes are indexed from 1.
eg: first-3,5,6-last
Throws:
java.lang.IllegalArgumentException - if an invalid range list is supplied

setAttributeIndicesArray

public void setAttributeIndicesArray(int[] attributes)
Sets which attributes are to be Discretized (only numeric attributes among the selection will be Discretized).

Parameters:
attributes - an array containing indexes of attributes to Discretize. Since the array will typically come from a program, attributes are indexed from 0.
Throws:
java.lang.IllegalArgumentException - if an invalid set of ranges is supplied

getCutPoints

public double[] getCutPoints(int attributeIndex)
Gets the cut points for an attribute

Parameters:
attributeIndex - the index (from 0) of the attribute to get the cut points of
Returns:
an array containing the cutpoints (or null if the attribute requested isn't being Discretized

getRevision

public java.lang.String getRevision()
Returns the revision string.

Specified by:
getRevision in interface RevisionHandler
Overrides:
getRevision in class Filter
Returns:
the revision

main

public static void main(java.lang.String[] argv)
Main method for testing this class.

Parameters:
argv - should contain arguments to the filter: use -h for help