vhammll #

GitHub

VHamMLL

A machine learning (ML) library for classification using a nearest neighbor algorithm based on Hamming distances.

You can incorporate the VHamMLL functions into your own code, or use the included Command Line Interface app (cli.v).

Link to html documentation for the library functions and structs

You can use VHamMLL with your own datasets, or with a selection of publicly available datasets that are widely used for demonstrating and testing ML classifiers, in the datasets directory. These files are mostly in Orange file format; there are also datasets in ARFF (Attribute-Relation File Format) or in comma-separated-values (CSV) as used in Kaggle.

This table reports balanced accuracy results for classification of a variety of publicly available datasets.

What, another AI package? Is that necessary? And have a look here for a more complete description and potential use cases.

Glossary of terms

For interactive descriptions of the two key algorithms used by VHamMLL, download the Numbers app spreadsheets: Description of Ranking Algorithm and Description of Classification Algorithm.

Usage:

To use the VHamMLL library in an existing Vlang project:

v install holder66.vhammll

In your v code, add: import holder66.vhammll

To use the library with the Command Line Interface (CLI):

First, install V, if not already installed. On MacOS, Linux etc. you need git and a C compiler (For windows or android environments, see the v lang documentation).

In a terminal:

git clone https://github.com/vlang/v
cd v
make
sudo ./v symlink	##v install holder66.vhammll

On older macs, if the make process fails, you may need to also do:

brew install bdw-gc    ##cp /usr/local/Cellar/bdw-gc/8.2.8/lib/libgc.a  .thirdparty/tcc/lib/libgc.a  ##

Then repeat the make in the v directory. Finally, export VFLAGS="-d dynamic_boehm"

See above re needed dependencies.

In a folder or directory that you want to use for your project, you will need to create a file with module main, and a function main(). You can do this in the terminal, or with a text editor. The file should contain:

module main
import holder66.vhammll

fn main() {
    vhammll.cli()!
}

Assuming you've named the directory or folder vhamml and the file within main.v, in the terminal: v run . followed by the command line arguments, eg v run . --help or v run . analyze <path_to_dataset_file> Command-specific help is available, like so: v run . explore --help or v run . explore -h

Note that the publicly available datasets included with the VHamMLL distribution can be found at ~/.vmodules/holder66/vhammll/datasets.

That's it!

Tutorial:

v run . examples go

Updating:

v up        ##v update    ##v .         ##

Getting help:

The V lang community meets on Discord

For bug reports, feature requests, etc., please raise an issue on github

Speed things up:

Use the -c (--concurrent) argument (in the CLI) to make use of available CPU cores for some vhammll functions; this may speed things up (timings are on a MacBook Pro 2019)

v main.v
./main explore ~/.vmodules/holder66/vhammll/datasets/iris.tab  ##./main explore -c  ~/.vmodules/holder66/vhammll/datasets/iris.tab   ##

A huge speedup usually happens if you compile using the -prod (for production) option. The compilation itself takes longer, but the resulting code is highly optimized.

v -prod main.v
./main explore ~/.vmodules/holder66/vhammll/datasets/iris.tab  ##./main explore -c  ~/.vmodules/holder66/vhammll/datasets/iris.tab   ##

Note that in this case, there is no speedup for -prod when the -c argument is used.

Examples showing use of the Command Line Interface

Please see examples_of_command_line_usage.md

Example: typical use case, a clinical risk calculator

Health care professionals frequently make use of calculators to inform clinical decision-making. Data regarding symptoms, findings on physical examination, laboratory and imaging results, and outcome information such as diagnosis, risk for developing a condition, or response to specific treatments, is collected for a sample of patients, and then used to form the basis of a formula that can be used to predict the outcome information of interest for a new patient, based on how their symptoms and findings, etc. compare to those in the dataset.

Please see clinical_calculator_example.md.

Example: finding useful information embedded in noise

Please see a worked example here: noisy_data.md

MNIST dataset

The mnist_train.tab file is too large to keep in the repository. If you wish to experiment with it, it can be downloaded by right-clicking on this link in a web browser, or downloaded via the command line:

wget https://henry.olders.ca/datasets/mnist_train.tab

The process of development in its early stages is described in this essay written in 1989.

fn analyze_dataset #

fn analyze_dataset(ds Dataset, opts Options) AnalyzeResult

analyze_dataset returns a struct with information about a datafile.

Optional:
if show_flag is true, displays on the console (using show_analyze):
1. a list of attributes, their types, the unique values, and a count of
missing values;
2. a table with counts for each type of attribute;
3. a list of discrete attributes useful for training a classifier;
4. a list of continuous attributes useful for training a classifier;
5. a breakdown of the class attribute, showing counts for each class.

outputfile_path: if specified, saves the analysis results.

fn append_instances #

fn append_instances(cl Classifier, instances_to_append ValidateResult, opts Options) Classifier

append_instances extends a classifier by adding more instances. It returns the extended classifier struct.

Output options:
show_flag: display results on the console;
outputfile_path: saves the extended classifier to a file.

fn cli #

fn cli(cli_options CliOptions) !

the command line interface app for the holder66.vhamml ML library. In a terminal, type: v run . --help

Usage: v run . [command] [flags] <path_to_datafile>
Datafiles should be either tab-delimited, or have extension .csv or .arff
Commands: analyze | append | cross | display | examples | explore
| make | optimals | orange | query | rank | validate | verify
Flags and options:
-a --attributes, can be one, two, or 3 integers; a single integer will
   be used by make_classifier to produce a classifier with that number
   of attributes. More than one integer will be used by
   explore to provide a range and an interval.
-b --bins, can be one, two, or 3 integers; a single integer for one bin
   value to be used for all attributes; two integers for a range of bin
   values; a third integer specifies an interval for the range (note that
   the binning range is from the upper to the lower value);
   note: when doing an explore, the first integer specifies the lower
   limit for the number of bins, and the second gives the upper value
   for the explore range. Example: explore -b 3,6 would first use 3 - 3,
   then 3 - 4, then 3 - 5, and finally 3 - 6 for the binning ranges.
   If the uniform flag is true, then a single integer specifies
   the number of bins for all continuous attributes; two integers for a
   range of uniform bin values for the explore command; a third integer
   for the interval to be used over the explore range;
-bp, --balanced-prevalences, multiply the number of instances for classes
   with low prevalence, to more closely balance prevalences;
-c --concurrent, permit parallel processing to use multiple cores;
-e --expanded, expanded results on the console;
-ea display information re trained attributes on the console, for
   classification operations;
-f --folds, default is leave-one-out;
-g --graph, displays a plot;
-h --help,
-k --classifier, followed by the path to a file for a saved Classifier
-ka --kaggle, followed by the path to a file. Used with the 'validate' command,
      a csv file suitable for submission to a Kaggle competition is created;
-m --multiple, classify using more than one trained classifier, followed by
   the path to a json file with parameters to generate each classifier;
-ma when multiple classifiers are used, stop classifying when matches
   have been found for all classifiers;
-mc when multiple classifiers are used, combine the possible hamming
   distances for each classifier into a single list;
-mr for multiclass datasets, perform classification using a classifier for
   each class, based on cases for that class set against all the other cases;
-mt when multiple classifiers are used, add the nearest neighbors from
   each classifier, weight by class prevalences, and then infer
   from the totals;
-m##      fication run (zero-indexed); also used to specify which classifiers to
   append to a settings file;
-ms append the settings to a file (path follows flag) for use in multiple
   classification (with -m#   cases identified in the analytics are appended;
-o --output, followed by the path to a file in which a classifier, a
   result, instances used for validation, or a query instance will be
   stored;
-p --purge, removes instances which after binning are duplicates
-pmc --purge-missing-classes, removes instances for which the class value
 is missing;
-r --reps, number of repetitions; if > 1, a random selection of
    instances to be included in each fold will be applied
-s --show, output results to the console;
-t --test, followed by the path to the datafile to be verified or validated;
-u --uniform, specifies if uniform binning is to be used for the explore
   command (note: to obtain uniform binning with verify, validate, query, or
   or cross-validate, specify the same value for binning, eg -b 4,4)
-v --verbose
-w --weight, when classifying, weight the nearest neighbour counts by class prevalences;
-wr when ranking attributes, weight contributions by class prevalences;
-x --exclude, do not take into account missing values when ranking attributes;

fn close #

fn close[T](a T, b T) bool

fn combine_raw_and_inferred_types #

fn combine_raw_and_inferred_types(ds Dataset) []string

fn cross_validate #

fn cross_validate(ds Dataset, opts Options) CrossVerifyResult

cross_validate performs n-fold cross-validation on a dataset: it partitions the instances in a dataset into a fold, trains a classifier on all the dataset instances not in the fold, and then uses this classifier to classify the fold cases. This process is repeated for each of n folds, and the classification results are summarized.

Options (also see the Options struct):
bins: range for binning or slicing of continuous attributes;
number_of_attributes: the number of attributes to use, in descending
    order of rank value;
exclude_flag: excludes missing values when ranking attributes;
weighting_flag: nearest neighbor counts are weighted by
    class prevalences;
folds: number of folds n to use for n-fold cross-validation (default
    is leave-one-out cross-validation);
repetitions: number of times to repeat n-fold cross-validations;
random-pick: choose instances randomly for n-fold cross-validations.
Output options:
show_flag: prints results to the console;
expanded_flag: prints additional information to the console, including
    a confusion matrix.
outputfile_path: saves the result as a json file.

fn display_file #

fn display_file(path string, in_opts Options)

display_file displays on the console, a results file as produced by other hamnn functions; a multiple classifier settings file; or graphs for explore, ranking, or crossvalidation results.

display_file('path_to_saved_results_file', expanded_flag: true)
Output options:
expanded_flag: display additional information on the console, including
    a confusion matrix for cross-validation or verification operations;
graph_flag: generates plots for display in the default web browser.

fn explore #

fn explore(ds Dataset, opts Options) ExploreResult

explore runs a series of cross-validations or verifications, over a range of attributes and a range of binning values.

Options (also see the Options struct):
bins: range for binning or slicing of continuous attributes;
uniform_bins: same number of bins for all continuous attributes;
number_of_attributes: range for attributes to include;
exclude_flag: excludes missing values when ranking attributes;
weighting_flag: nearest neighbor counts are weighted by
    class prevalences;
folds: number of folds n to use for n-fold cross-validation (default
    is leave-one-out cross-validation);
repetitions: number of times to repeat n-fold cross-validations;
random-pick: choose instances randomly for n-fold cross-validations.
Output options:
show_flag: display results on the console;
expanded_flag: display additional information on the console, including
    a confusion matrix for each explore step;
graph_flag: generate plots of Receiver Operating Characteristics (ROC)
    by attributes used; ROC by bins used, and accuracy by attributes
    used.
outputfile_path: saves the result to a file.

fn file_type #

fn file_type(path string) string

file_type returns a string identifying how a dataset is structured or formatted, eg 'orange_newer', 'orange_older', 'arff', or 'csv'. On the assumption that an 'orange_older' file will always identify a class attribute by having 'c' or 'class' in the third header line, all other tab-delimited datafiles will be typed as 'orange_newer'.

Example

assert file_type('datasets/iris.tab') == 'orange_older'

fn get_useful_continuous_attributes #

fn get_useful_continuous_attributes(ds Dataset) map[int][]f32

get_useful_continuous_attributes

fn get_useful_discrete_attributes #

fn get_useful_discrete_attributes(ds Dataset) map[int][]string

get_useful_discrete_attributes

fn is_nan #

fn is_nan[T](f T) bool

fn load_classifier_file #

fn load_classifier_file(path string) !Classifier

load_classifier_file loads a file generated by make_classifier(); returns a Classifier struct.

Example

cl := load_classifier_file('tempfolder/saved_classifier.txt')

fn load_file #

fn load_file(path string, opts LoadOptions) Dataset

load_file returns a struct containing the datafile's contents, suitable for generating a classifier

Example

ds := load_file('datasets/iris.tab')

fn load_instances_file #

fn load_instances_file(path string) !ValidateResult

load_instances_file loads a file generated by validate() or query(), and returns it as a struct, suitable for appending to a classifier.

Example

instances := load_instances_file('tempfolder/saved_validate_result.txt')

fn make_classifier #

fn make_classifier(dds Dataset, opts Options) Classifier

make_classifier returns a Classifier struct, given a Dataset (as created by load_file).

Options (also see the Options struct):
bins: range for binning or slicing of continuous attributes;
uniform_bins: same number of bins for continuous attributes;
number_of_attributes: the number of highest-ranked attributes to include;
exclude_flag: excludes missing values when ranking attributes;
purge_flag: remove those instances which are duplicates, after
    binning and based on only the attributes to be used;
outputfile_path: if specified, saves the classifier to this file.

fn nan #

fn nan[T]() T

fn one_vs_rest_verify #

fn one_vs_rest_verify(opts Options) CrossVerifyResult

one_vs_rest_verify classifies all the cases in a verification datafile (specified by opts.testfile_path) using an array of trained Classifiers, one per class; each classifier is trained using a one class vs all the other classes. It returns metrics comparing the inferred classes to the labeled (assigned) classes of the verification datafile.

Optional (also see `make_classifier.v` for options in training a classifier)
weighting_flag: nearest neighbor counts are weighted by
    class prevalences.
Output options:
show_flag: display results on the console;
expanded_flag: display additional information on the console, including
        a confusion matrix.
outputfile_path: saves the result as a json file

fn purge_instances_for_missing_class_values_not_inline #

fn purge_instances_for_missing_class_values_not_inline(mut ds Dataset) Dataset

fn query #

fn query(cl Classifier, opts Options) ClassifyResult

query takes a trained classifier and performs an interactive session with the user at the console, asking the user to input a value for each trained attribute. It then asks to confirm or redo the responses. Once confirmed, the instance is classified and the inferred class is shown. The classified instance can optionally be saved in a file. The saved instance can be appended to the classifier using append_instances().

fn rank_attributes #

fn rank_attributes(ds Dataset, opts Options) RankingResult

rank_attributes takes a Dataset and returns a list of all the dataset's usable attributes, ranked in order of each attribute's ability to separate the classes.

Algorithm:
for each attribute:
    create a matrix with attribute values for row headers, and
    class values for column headers;
    for each unique value `val` for that attribute:
        for each unique value `class` of the class attribute:
            for each instance:
                accumulate a count for those instances whose class value
                equals `class`;
                populate the matrix with these accumulated counts;
    for each `val`:
        get the absolute values of the differences between accumulated
        counts for each pair of `class` values`;
        add those absolute differences;
    total those added absolute differences to get the raw rank value
for that attribute.
To obtain rank values weighted by class prevalences, use the same algorithm
except before taking the difference of each pair of accumulated counts,
multiply each count of the pair by the class prevalence of the other class.
(Note: rank_attributes always uses class prevalences as weights)

Obtain a maximum rank value by calculating a rank value for the class
attribute itself.

To obtain normalized rank values:
for each attribute:
    divide its raw rank value by the maximum rank value and multiply by 100.

Sort the attributes by descending rank values.

Options:
-b --bins: specifies the range for binning (slicing) continous attributes;
-x --exclude:  to exclude missing values when calculating rank values;
Output options:
`show_flag` to print the ranked list to the console;
`graph_flag` to generate plots of rank values for each attribute on the
    y axis, with number of bins on the x axis.
`outputfile_path`, saves the result as json.

fn rank_one_vs_rest #

fn rank_one_vs_rest(ds Dataset, opts Options) RankingResult

rank_attributes takes a Dataset and returns a list of all the dataset's usable attributes, ranked in order of each attribute's ability to separate the classes.

Algorithm:
for each attribute:
    create a matrix with attribute values for row headers, and
    class values for column headers;
    for each unique value `val` for that attribute:
        for each unique value `class` of the class attribute:
            for each instance:
                accumulate a count for those instances whose class value
                equals `class`;
                populate the matrix with these accumulated counts;
    for each `val`:
        get the absolute values of the differences between accumulated
        counts for each pair of `class` values`;
        add those absolute differences;
    total those added absolute differences to get the raw rank value
for that attribute.
To obtain rank values weighted by class prevalences, use the same algorithm
except before taking the difference of each pair of accumulated counts,
multiply each count of the pair by the class prevalence of the other class.
(Note: rank_attributes always uses class prevalences as weights)

Obtain a maximum rank value by calculating a rank value for the class
attribute itself.

To obtain normalized rank values:
for each attribute:
    divide its raw rank value by the maximum rank value and multiply by 100.

Sort the attributes by descending rank values.

Options:
-b --bins: specifies the range for binning (slicing) continous attributes;
-x --exclude:  to exclude missing values when calculating rank values;
Output options:
`show_flag` to print the ranked list to the console;
`graph_flag` to generate plots of rank values for each attribute on the
    y axis, with number of bins on the x axis.
`outputfile_path`, saves the result as json.

fn save_json_file #

fn save_json_file[T](u T, path string)

save_json_file

fn set_class_struct #

fn set_class_struct(ds Dataset) Class

set_class_struct

fn show_analyze #

fn show_analyze(result AnalyzeResult)

show_analyze prints out to the console, a series of tables detailing a dataset. It takes as input an AnalyzeResult struct generated by analyze_dataset().

fn show_classifier #

fn show_classifier(cl Classifier)

show_classifier outputs to the console information about a classifier

fn show_crossvalidation #

fn show_crossvalidation(result CrossVerifyResult, opts Options)

show_crossvalidation

fn show_rank_attributes #

fn show_rank_attributes(result RankingResult)

show_rank_attributes

fn show_validate #

fn show_validate(result ValidateResult)

show_validate

fn show_verify #

fn show_verify(result CrossVerifyResult, opts Options)

show_verify

fn transpose #

fn transpose[T](matrix [][]T) [][]T

transpose a 2d array

fn validate #

fn validate(cl Classifier, opts Options) !ValidateResult

validate classifies each instance of a validation datafile against a trained Classifier; returns the predicted classes for each case of the validation_set. The file to be validated is specified by opts.testfile_path. Optionally, saves the cases and their predicted classes in a file. This file can be used to append these cases to the classifier.

fn verify #

fn verify(opts Options) CrossVerifyResult

verify classifies all the instances in a verification datafile (specified by opts.testfile_path) using a trained Classifier; returns metrics comparing the inferred classes to the labeled (assigned) classes of the verification datafile.

Optional (also see `make_classifier.v` for options in training a classifier)
weighting_flag: nearest neighbor counts are weighted by
    class prevalences.
Output options:
show_flag: display results on the console;
expanded_flag: display additional information on the console, including
        a confusion matrix.
show_attributes_flag: list the trained attributes for the classifier used
outputfile_path: saves the result as a json file

type StringFloatMap #

type StringFloatMap = map[string]f64

struct AnalyzeResult #

struct AnalyzeResult {
	LoadOptions
pub mut:
	struct_type             string = '.AnalyzeResult'
	environment             Environment
	datafile_path           string
	datafile_type           string
	class_name              string
	class_index             int
	class_counts            map[string]int
	attributes              []Attribute
	overall_min             f32
	overall_max             f32
	use_inferred_types_flag bool
}

struct Attribute #

struct Attribute {
pub mut:
	id            int
	name          string
	count         int
	counts_map    map[string]int
	uniques       int
	missing       int
	raw_type      string
	att_type      string
	inferred_type string
	for_training  bool
	min           f32
	max           f32
	mean          f32
	median        f32
}

struct Binning #

struct Binning {
mut:
	lower    int
	upper    int
	interval int
}

struct Class #

struct Class {
pub mut:
	class_name                 string // the attribute which holds the class
	class_index                int
	classes                    []string // to ensure that the ordering remains the same
	class_values               []string
	missing_class_values       []int // these are the indices of the original class values array
	class_counts               map[string]int
	lcm_class_counts           i64
	prepurge_class_values_len  int
	postpurge_class_counts     map[string]int
	postpurge_lcm_class_counts i64
}

struct Classifier #

struct Classifier {
	History
	Parameters
	LoadOptions
	Class
pub mut:
	struct_type        string = '.Classifier'
	datafile_path      string
	attribute_ordering []string
	trained_attributes map[string]TrainedAttribute
	// maximum_hamming_distance int
	indices   []int
	instances [][]u8
	// history   []HistoryEvent
}

struct ClassifierSettings #

struct ClassifierSettings {
	Parameters
	BinaryMetrics
	Metrics
	LoadOptions
pub mut:
	classifier_index int
	datafile_path    string
}

struct ClassifyResult #

struct ClassifyResult {
	LoadOptions
	Class
pub mut:
	struct_type                string = '.ClassifyResult'
	index                      int
	inferred_class             string
	inferred_class_array       []string
	labeled_class              string
	nearest_neighbors_by_class []int
	nearest_neighbors_array    [][]int
	classes                    []string
	class_counts               map[string]int
	weighting_flag             bool
	weighting_flag_array       []bool
	multiple_flag              bool
	hamming_distance           int
	sphere_index               int
}

struct CliOptions #

@[params]

struct CliOptions {
	LoadOptions
pub mut:
	args []string
}

struct CrossVerifyResult #

struct CrossVerifyResult {
	Parameters
	LoadOptions
	DisplaySettings
	Metrics
	BinaryMetrics
	MultipleOptions // MultipleClassifierSettingsArray
pub mut:
	struct_type                         string = '.CrossVerifyResult'
	command                             string
	datafile_path                       string
	testfile_path                       string
	multiple_classify_options_file_path string
	multiple_classifier_settings        []ClassifierSettings
	labeled_classes                     []string
	actual_classes                      []string
	inferred_classes                    []string
	nearest_neighbors_by_class          [][]int
	instance_indices                    []int
	classes                             []string
	class_counts                        map[string]int
	labeled_instances                   map[string]int
	correct_inferences                  map[string]int
	incorrect_inferences                map[string]int
	wrong_inferences                    map[string]int
	true_positives                      map[string]int
	false_positives                     map[string]int
	true_negatives                      map[string]int
	false_negatives                     map[string]int
	// outer key: actual class; inner key: predicted class
	confusion_matrix_map            map[string]StringFloatMap
	pos_neg_classes                 []string
	correct_count                   int
	incorrects_count                int
	wrong_count                     int
	total_count                     int
	bin_values                      []int // used for displaying the binning range for explore
	attributes_used                 int
	prepurge_instances_counts_array []int
	classifier_instances_counts     []int
	repetitions                     int
	confusion_matrix                [][]string
	// trained_attribute_maps_array    []map[string]TrainedAttribute
	trained_attribute_maps_array []map[string]TrainedAttribute
}

Returned by cross_validate() and verify()

struct Dataset #

struct Dataset {
	Class // DataDict
	LoadOptions
pub mut:
	struct_type                  string = '.Dataset'
	path                         string
	attribute_names              []string
	attribute_flags              []string
	raw_attribute_types          []string
	attribute_types              []string
	inferred_attribute_types     []string
	data                         [][]string
	useful_continuous_attributes map[int][]f32
	useful_discrete_attributes   map[int][]string
	row_identifiers              []string
}

fn (Dataset) purge_instances_for_missing_class_values #

fn (mut ds Dataset) purge_instances_for_missing_class_values() Dataset

struct DefaultVals #

struct DefaultVals {
pub mut:
	missings                   []string = ['?', '', 'NA', ' ']
	integer_range_for_discrete []int    = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
}

pub const missings = ['?', '', 'NA', ' '] pub const integer_range_for_discrete = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

struct DisplaySettings #

@[params]

struct DisplaySettings {
pub mut:
	show_flag            bool
	expanded_flag        bool
	show_attributes_flag bool
	graph_flag           bool
	verbose_flag         bool
}

struct Environment #

struct Environment {
pub mut:
	hamnn_version string
	// cached_cpuinfo map[string]string
	os_kind        string
	os_details     string
	arch_details   []string
	vexe_mtime     string
	v_full_version string
	vflags         string
}

struct ExploreResult #

struct ExploreResult {
	Class
	Parameters
	LoadOptions
	AttributeRange
	DisplaySettings
pub mut:
	struct_type      string = '.ExploreResult'
	path             string
	testfile_path    string
	pos_neg_classes  []string
	array_of_results []CrossVerifyResult
	// accuracy_types   []string = ['raw accuracy', 'balanced accuracy', ' MCC (Matthews Correlation Coefficient)']
	// analytics        []MaxSettings
	// analytics map[string]Analytics
	args []string
}

struct History #

struct History {
pub mut:
	history_events []HistoryEvent
}

struct HistoryEvent #

struct HistoryEvent {
	Environment
pub mut:
	event_date               string
	instances_count          int
	prepurge_instances_count int
	// event_environment        Environment
	event     string
	file_path string
}

struct LoadOptions #

@[params]

struct LoadOptions {
	DefaultVals
pub mut:
	class_missing_purge_flag bool
}

struct MultipleClassifierSettingsFileStruct #

struct MultipleClassifierSettingsFileStruct {
pub mut:
	multiple_classifier_settings []ClassifierSettings
	datafile_path                string
}

struct MultipleOptions #

struct MultipleOptions {
	TotalNnParams
pub mut:
	break_on_all_flag    bool
	combined_radii_flag  bool
	total_nn_counts_flag bool
	classifier_indices   []int
}

struct OneVsRestClassifier #

struct OneVsRestClassifier {
	Parameters
	LoadOptions
	Class
	History
pub mut:
	struct_type   string = '.OneVsRestClassifier'
	datafile_path string
	// history       []HistoryEvent
}

struct OptimalsResult #

struct OptimalsResult {
pub mut:
	class_counts                                []int
	balanced_accuracy_max                       f64
	balanced_accuracy_max_classifiers           []int
	mcc_max                                     f64
	mcc_max_classifiers                         []int
	correct_inferences_total_max                int
	correct_inferences_total_max_classifiers    []int
	classes                                     []string
	correct_inferences_by_class_max             []int
	correct_inferences_by_class_max_classifiers [][]int
}

struct Options #

@[params]

struct Options {
	Parameters
	LoadOptions
	DisplaySettings
	MultipleOptions // MultipleClassifierSettingsArray
pub mut:
	struct_type                         string = '.Options'
	non_options                         []string
	bins                                []int = [1, 16]
	concurrency_flag                    bool
	datafile_path                       string = 'datasets/developer.tab'
	testfile_path                       string
	outputfile_path                     string
	classifierfile_path                 string
	instancesfile_path                  string
	multiple_classify_options_file_path string
	multiple_classifier_settings        []ClassifierSettings
	settingsfile_path                   string
	help_flag                           bool
	// multiple_flag                       bool
	append_settings_flag bool
	command              string
	args                 []string
	kagglefile_path      string
}

Options struct: can be used as the last parameter in a function's parameter list, to enable default values to be passed to functions.

struct PlotResult #

struct PlotResult {
pub mut:
	bin             int
	attributes_used int
	correct_count   int
	total_count     int
}

struct RankedAttribute #

struct RankedAttribute {
pub mut:
	attribute_index  int
	attribute_name   string
	attribute_type   string
	rank_value       f32
	rank_value_array []f32
	bins             int
}

struct RankingResult #

struct RankingResult {
	LoadOptions
pub mut:
	struct_type                string = '.RankingResult'
	path                       string
	exclude_flag               bool
	weight_ranking_flag        bool
	binning                    Binning
	array_of_ranked_attributes []RankedAttribute
}

struct ResultForClass #

struct ResultForClass {
pub mut:
	labeled_instances    int
	correct_inferences   int
	incorrect_inferences int
	wrong_inferences     int
	confusion_matrix_row map[string]int
}

struct TrainedAttribute #

struct TrainedAttribute {
pub mut:
	attribute_type    string
	translation_table map[string]int
	minimum           f32
	maximum           f32
	bins              int
	rank_value        f32
	index             int
	folds_count       int // for cross-validations, this tracks how many folds use this attribute
}

struct ValidateResult #

struct ValidateResult {
	Class
	Parameters
	LoadOptions
pub mut:
	struct_type                     string = '.ValidateResult'
	datafile_path                   string
	validate_file_path              string
	row_identifiers                 []string
	inferred_classes                []string
	counts                          [][]int
	instances                       [][]u8
	attributes_used                 int
	prepurge_instances_counts_array []int
	classifier_instances_counts     []int
}

README
fn analyze_dataset
fn append_instances
fn cli
fn close
fn combine_raw_and_inferred_types
fn cross_validate
fn display_file
fn explore
fn file_type
fn get_useful_continuous_attributes
fn get_useful_discrete_attributes
fn is_nan
fn load_classifier_file
fn load_file
fn load_instances_file
fn make_classifier
fn nan
fn one_vs_rest_verify
fn purge_instances_for_missing_class_values_not_inline
fn query
fn rank_attributes
fn rank_one_vs_rest
fn save_json_file
fn set_class_struct
fn show_analyze
fn show_classifier
fn show_crossvalidation
fn show_rank_attributes
fn show_validate
fn show_verify
fn transpose
fn validate
fn verify
type StringFloatMap
struct AnalyzeResult
struct Attribute
struct Binning
struct Class
struct Classifier
struct ClassifierSettings
struct ClassifyResult
struct CliOptions
struct CrossVerifyResult
struct Dataset
- fn purge_instances_for_missing_class_values
struct DefaultVals
struct DisplaySettings
struct Environment
struct ExploreResult
struct History
struct HistoryEvent
struct LoadOptions
struct MultipleClassifierSettingsFileStruct
struct MultipleOptions
struct OneVsRestClassifier
struct OptimalsResult
struct Options
struct PlotResult
struct RankedAttribute
struct RankingResult
struct ResultForClass
struct TrainedAttribute
struct ValidateResult