Skip to content

vhammll #

VHamMLLL Continuous Integration GitHub# VHamMLLA machine learning (ML) library for classification using a nearest neighbor algorithm based on Hamming distances.

You can incorporate the VHamMLL functions into your own code, or use the included Command Line Interface app (cli.v).

Link to html documentation for the library functions and structs

You can use VHamMLL with your own datasets, or with a selection of publicly available datasets that are widely used for demonstrating and testing ML classifiers, in the datasets directory. These files are mostly in Orange file format; there are also datasets in ARFF (Attribute-Relation File Format) or in comma-separated-values (CSV) as used in Kaggle.

What, another AI package? Is that necessary? And have a look here for a more complete description and potential use cases.

Glossary of terms

For interactive descriptions of the two key algorithms used by VHamMLL, download the Numbers app spreadsheets: Description of Ranking Algorithm and Description of Classification Algorithm.

Usage:

To use the VHamMLL library in an existing Vlang project:

v install holder66.vhammll

You may also need to install its dependencies, if not automatically installed:

v install vsl
v install Mewzax.chalk

In your v code, add: import holder66.vhammll

To use the library with the Command Line Interface (CLI):

First, install V, if not already installed. On MacOS, Linux etc. you need git and a C compiler (For windows or android environments, see the v lang documentation).

In a terminal:

git clone https://github.com/vlang/v
cd v
make
sudo ./v symlink	# add v to your PATH
v install holder66.vhammll

See above re needed dependencies.

In a folder or directory that you want to use for your project, you will need to create a file with module main, and a function main(). You can do this in the terminal, or with a text editor. The file should contain:

module main
import holder66.vhammll

fn main() {
vhammll.cli()!
}

Assuming you've named the directory or folder vhamml and the file within main.v, in the terminal: v run . followed by the command line arguments, eg v run . --help or v run . analyze <path_to_dataset_file> Command-specific help is available, like so: v run . explore --help or v run . explore -h

Note that the publicly available datasets included with the VHamMLL distribution can be found at ~/.vmodules/holder66/vhammll/datasets.

That's it!

Tutorial:

v run . examples go

Updating:

v up        # installs the latest release of V
v update    # get the latest version of the libraries, including holder66.vhammll
v .         # recompile

Getting help:

The V lang community meets on Discord

For bug reports, feature requests, etc., please raise an issue on github

Speed things up:

Use the -c (--concurrent) argument (in the CLI) to make use of available CPU cores for some vhammll functions; this may speed things up (timings are on a MacBook Pro 2019)

v main.v
./main explore ~/.vmodules/holder66/vhammll/datasets/iris.tab  # 10.157 sec
./main explore -c  ~/.vmodules/holder66/vhammll/datasets/iris.tab   # 4.910 sec

A huge speedup usually happens if you compile using the -prod (for production) option. The compilation itself takes longer, but the resulting code is highly optimized.

v -prod main.v
./main explore ~/.vmodules/holder66/vhammll/datasets/iris.tab  # 3.899 sec
./main explore -c  ~/.vmodules/holder66/vhammll/datasets/iris.tab   # 4.849 sec!!

Note that in this case, there is no speedup for -prod when the -c argument is used.

Examples showing use of the Command Line Interface

Please see examples_of_command_line_usage.md

Example: typical use case, a clinical risk calculator

Health care professionals frequently make use of calculators to inform clinical decision-making. Data regarding symptoms, findings on physical examination, laboratory and imaging results, and outcome information such as diagnosis, risk for developing a condition, or response to specific treatments, is collected for a sample of patients, and then used to form the basis of a formula that can be used to predict the outcome information of interest for a new patient, based on how their symptoms and findings, etc. compare to those in the dataset.

Please see clinical_calculator_example.md.

Example: finding useful information embedded in noise

Please see a worked example here: noisy_data.md

MNIST dataset

The mnist_train.tab file is too large to keep in the repository. If you wish to experiment with it, it can be downloaded by right-clicking on this link in a web browser, or downloaded via the command line:

wget https://henry.olders.ca/datasets/mnist_train.tab

The process of development in its early stages is described in this essay written in 1989.

Copyright (c) 2017, 2024: Henry Olders.

fn verify #

fn verify(opts Options, disp DisplaySettings) CrossVerifyResult

verify classifies all the instances in a verification datafile (specified by opts.testfile_path) using a trained Classifier; returns metrics comparing the inferred classes to the labeled (assigned) classes of the verification datafile.

Optional (also see `make_classifier.v` for options in training a classifier)
weighting_flag: nearest neighbor counts are weighted by
class prevalences.
Output options:
show_flag: display results on the console;
expanded_flag: display additional information on the console, including
a confusion matrix.
outputfile_path: saves the result as a json file

fn explore #

fn explore(ds Dataset, opts Options, disp DisplaySettings) ExploreResult

explore runs a series of cross-validations or verifications, over a range of attributes and a range of binning values.

Options (also see the Options struct):
bins: range for binning or slicing of continuous attributes;
uniform_bins: same number of bins for all continuous attributes;
number_of_attributes: range for attributes to include;
exclude_flag: excludes missing values when ranking attributes;
weighting_flag: nearest neighbor counts are weighted by
class prevalences;
folds: number of folds n to use for n-fold cross-validation (default
is leave-one-out cross-validation);
repetitions: number of times to repeat n-fold cross-validations;
random-pick: choose instances randomly for n-fold cross-validations.
Output options:
show_flag: display results on the console;
expanded_flag: display additional information on the console, including
a confusion matrix for each explore step;
graph_flag: generate plots of Receiver Operating Characteristics (ROC)
by attributes used; ROC by bins used, and accuracy by attributes
used.
outputfile_path: saves the result to a file.

fn file_type #

fn file_type(path string) string

file_type returns a string identifying how a dataset is structured or formatted, eg 'orange_newer', 'orange_older', 'arff', or 'csv'. On the assumption that an 'orange_older' file will always identify a class attribute by having 'c' or 'class' in the third header line, all other tab-delimited datafiles will be typed as 'orange_newer'.

Example

assert file_type('datasets/iris.tab') == 'orange_older'

fn get_useful_continuous_attributes #

fn get_useful_continuous_attributes(ds Dataset) map[int][]f32

get_useful_continuous_attributes

fn get_useful_discrete_attributes #

fn get_useful_discrete_attributes(ds Dataset) map[int][]string

get_useful_discrete_attributes

fn is_nan #

fn is_nan[T](f T) bool

fn load_classifier_file #

fn load_classifier_file(path string) !Classifier

load_classifier_file loads a file generated by make_classifier(); returns a Classifier struct.

Example

cl := load_classifier_file('tempfolder/saved_classifier.txt')

fn load_file #

fn load_file(path string, opts LoadOptions) Dataset

load_file returns a struct containing the datafile's contents, suitable for generating a classifier

Example

ds := load_file('datasets/iris.tab')

fn load_instances_file #

fn load_instances_file(path string) !ValidateResult

load_instances_file loads a file generated by validate() or query(), and returns it as a struct, suitable for appending to a classifier.

Example

instances := load_instances_file('tempfolder/saved_validate_result.txt')

fn make_classifier #

fn make_classifier(dds Dataset, opts Options, disp DisplaySettings) Classifier

make_classifier returns a Classifier struct, given a Dataset (as created by load_file).

Options (also see the Options struct):
bins: range for binning or slicing of continuous attributes;
uniform_bins: same number of bins for continuous attributes;
number_of_attributes: the number of highest-ranked attributes to include;
exclude_flag: excludes missing values when ranking attributes;
purge_flag: remove those instances which are duplicates, after
binning and based on only the attributes to be used;
outputfile_path: if specified, saves the classifier to this file.

fn nan #

fn nan[T]() T

fn one_vs_rest_verify #

fn one_vs_rest_verify(opts Options, disp DisplaySettings) CrossVerifyResult

one_vs_rest_verify classifies all the cases in a verification datafile (specified by opts.testfile_path) using an array of trained Classifiers, one per class; each classifier is trained using a one class vs all the other classes. It returns metrics comparing the inferred classes to the labeled (assigned) classes of the verification datafile.

Optional (also see `make_classifier.v` for options in training a classifier)
weighting_flag: nearest neighbor counts are weighted by
class prevalences.
Output options:
show_flag: display results on the console;
expanded_flag: display additional information on the console, including
a confusion matrix.
outputfile_path: saves the result as a json file

fn optimals #

fn optimals(path string, in_opts Options, disp DisplaySettings) OptimalsResult

optimals determines which classifiers provide the best balanced accuracy, highest total for correct inferences, and highest correct inferences per class, for multiple classifiers whose settings are stored in a settings file.

fn purge_instances_for_missing_class_values_not_inline #

fn purge_instances_for_missing_class_values_not_inline(mut ds Dataset) Dataset

fn query #

fn query(cl Classifier, opts Options, disp DisplaySettings) ClassifyResult

query takes a trained classifier and performs an interactive session with the user at the console, asking the user to input a value for each trained attribute. It then asks to confirm or redo the responses. Once confirmed, the instance is classified and the inferred class is shown. The classified instance can optionally be saved in a file. The saved instance can be appended to the classifier using append_instances().

fn rank_attributes #

fn rank_attributes(ds Dataset, opts Options, disp DisplaySettings) RankingResult

rank_attributes takes a Dataset and returns a list of all the dataset's usable attributes, ranked in order of each attribute's ability to separate the classes.

Algorithm:
for each attribute:
create a matrix with attribute values for row headers, and
class values for column headers;
for each unique value `val` for that attribute:
for each unique value `class` of the class attribute:
for each instance:
accumulate a count for those instances whose class value
equals `class`;
populate the matrix with these accumulated counts;
for each `val`:
get the absolute values of the differences between accumulated
counts for each pair of `class` values`;
add those absolute differences;
total those added absolute differences to get the raw rank value
for that attribute.
To obtain rank values weighted by class prevalences, use the same algorithm
except before taking the difference of each pair of accumulated counts,
multiply each count of the pair by the class prevalence of the other class.
(Note: rank_attributes always uses class prevalences as weights)

Obtain a maximum rank value by calculating a rank value for the class
attribute itself.

To obtain normalized rank values:
for each attribute:
divide its raw rank value by the maximum rank value and multiply by 100.

Sort the attributes by descending rank values.
Options:
-b --bins: specifies the range for binning (slicing) continous attributes;
-x --exclude:  to exclude missing values when calculating rank values;
Output options:
`show_flag` to print the ranked list to the console;
`graph_flag` to generate plots of rank values for each attribute on the
y axis, with number of bins on the x axis.
`outputfile_path`, saves the result as json.

fn rank_one_vs_rest #

fn rank_one_vs_rest(ds Dataset, opts Options, disp DisplaySettings) RankingResult

rank_attributes takes a Dataset and returns a list of all the dataset's usable attributes, ranked in order of each attribute's ability to separate the classes.

Algorithm:
for each attribute:
create a matrix with attribute values for row headers, and
class values for column headers;
for each unique value `val` for that attribute:
for each unique value `class` of the class attribute:
for each instance:
accumulate a count for those instances whose class value
equals `class`;
populate the matrix with these accumulated counts;
for each `val`:
get the absolute values of the differences between accumulated
counts for each pair of `class` values`;
add those absolute differences;
total those added absolute differences to get the raw rank value
for that attribute.
To obtain rank values weighted by class prevalences, use the same algorithm
except before taking the difference of each pair of accumulated counts,
multiply each count of the pair by the class prevalence of the other class.
(Note: rank_attributes always uses class prevalences as weights)

Obtain a maximum rank value by calculating a rank value for the class
attribute itself.

To obtain normalized rank values:
for each attribute:
divide its raw rank value by the maximum rank value and multiply by 100.

Sort the attributes by descending rank values.
Options:
-b --bins: specifies the range for binning (slicing) continous attributes;
-x --exclude:  to exclude missing values when calculating rank values;
Output options:
`show_flag` to print the ranked list to the console;
`graph_flag` to generate plots of rank values for each attribute on the
y axis, with number of bins on the x axis.
`outputfile_path`, saves the result as json.

fn save_json_file #

fn save_json_file[T](u T, path string)

save_json_file

fn set_class_struct #

fn set_class_struct(ds Dataset) Class

set_class_struct

fn show_analyze #

fn show_analyze(result AnalyzeResult)

show_analyze prints out to the console, a series of tables detailing a dataset. It takes as input an AnalyzeResult struct generated by analyze_dataset().

fn show_classifier #

fn show_classifier(cl Classifier)

show_classifier outputs to the console information about a classifier

fn show_crossvalidation #

fn show_crossvalidation(result CrossVerifyResult, opts Options, disp DisplaySettings)

show_crossvalidation

fn show_rank_attributes #

fn show_rank_attributes(result RankingResult)

show_rank_attributes

fn show_validate #

fn show_validate(result ValidateResult)

show_validate

fn show_verify #

fn show_verify(result CrossVerifyResult, opts Options, disp DisplaySettings)

show_verify

fn transpose #

fn transpose[T](matrix [][]T) [][]T

transpose a 2d array

fn validate #

fn validate(cl Classifier, opts Options, disp DisplaySettings) !ValidateResult

validate classifies each instance of a validation datafile against a trained Classifier; returns the predicted classes for each case of the validation_set. The file to be validated is specified by opts.testfile_path. Optionally, saves the cases and their predicted classes in a file. This file can be used to append these cases to the classifier.

fn analyze_dataset #

fn analyze_dataset(ds Dataset, opts Options, disp DisplaySettings) AnalyzeResult

analyze_dataset returns a struct with information about a datafile.

Optional:
if show_flag is true, displays on the console (using show_analyze):
1. a list of attributes, their types, the unique values, and a count of
missing values;
2. a table with counts for each type of attribute;
3. a list of discrete attributes useful for training a classifier;
4. a list of continuous attributes useful for training a classifier;
5. a breakdown of the class attribute, showing counts for each class.

outputfile_path: if specified, saves the analysis results.

fn append_instances #

fn append_instances(cl Classifier, instances_to_append ValidateResult, opts Options, disp DisplaySettings) Classifier

append_instances extends a classifier by adding more instances. It returns the extended classifier struct.

Output options:
show_flag: display results on the console;
outputfile_path: saves the extended classifier to a file.

fn cli #

fn cli(cli_options CliOptions) !

the command line interface app for the holder66.vhamml ML library. In a terminal, type: v run . --help

Usage: v run . [command] [flags] <path_to_datafile>
Datafiles should be either tab-delimited, or have extension .csv or .arff
Commands: analyze | append | cross | display | examples | explore
| make | optimals | orange | query | rank | validate | verify
Flags and options:
-a --attributes, can be one, two, or 3 integers; a single integer will
be used by make_classifier to produce a classifier with that number
of attributes. More than one integer will be used by
explore to provide a range and an interval.
-b --bins, can be one, two, or 3 integers; a single integer for one bin
value to be used for all attributes; two integers for a range of bin
values; a third integer specifies an interval for the range (note that
the binning range is from the upper to the lower value);
note: when doing an explore, the first integer specifies the lower
limit for the number of bins, and the second gives the upper value
for the explore range. Example: explore -b 3,6 would first use 3 - 3,
then 3 - 4, then 3 - 5, and finally 3 - 6 for the binning ranges.
If the uniform flag is true, then a single integer specifies
the number of bins for all continuous attributes; two integers for a
range of uniform bin values for the explore command; a third integer
for the interval to be used over the explore range;
-bp, --balanced-prevalences, multiply the number of instances for classes
with low prevalence, to more closely balance prevalences;
-c --concurrent, permit parallel processing to use multiple cores;
-e --expanded, expanded results on the console;
-ea display information re trained attributes on the console, for
classification operations;
-f --folds, default is leave-one-out;
-g --graph, displays a plot;
-h --help,
-k --classifier, followed by the path to a file for a saved Classifier
-ka --kaggle, followed by the path to a file. Used with the "validate" command,
a csv file suitable for submission to a Kaggle competition is created;
-m --multiple, classify using more than one trained classifier, followed by
the path to a json file with parameters to generate each classifier;
-ma when multiple classifiers are used, stop classifying when matches
have been found for all classifiers;
-mc when multiple classifiers are used, combine the possible hamming
distances for each classifier into a single list;
-mr for multiclass datasets, perform classification using a classifier for
each class, based on cases for that class set against all the other cases;
-mt when multiple classifiers are used, add the nearest neighbors from
each classifier, weight by class prevalences, and then infer
from the totals;
-m# followed by a list of which classifiers to apply in a multiple classi-
fication run (zero-indexed); also used to specify which classifiers to
append to a settings file;
-ms append the settings to a file (path follows flag) for use in multiple
classification (with -m#). When used with 'explore', the settings for
cases identified in the analytics are appended;
-o --output, followed by the path to a file in which a classifier, a
result, instances used for validation, or a query instance will be
stored;
-p --purge, removes instances which after binning are duplicates
-pmc --purge-missing-classes, removes instances for which the class value
is missing;
-r --reps, number of repetitions; if > 1, a random selection of
instances to be included in each fold will be applied
-s --show, output results to the console;
-t --test, followed by the path to the datafile to be verified or validated;
-u --uniform, specifies if uniform binning is to be used for the explore
command (note: to obtain uniform binning with verify, validate, query, or
or cross-validate, specify the same value for binning, eg -b 4,4)
-v --verbose
-w --weight, when classifying, weight the nearest neighbour counts by class prevalences;
-wr when ranking attributes, weight contributions by class prevalences;
-x --exclude, do not take into account missing values when ranking attributes;

fn close #

fn close[T](a T, b T) bool

fn combine_raw_and_inferred_types #

fn combine_raw_and_inferred_types(ds Dataset) []string

fn cross_validate #

fn cross_validate(ds Dataset, opts Options, disp DisplaySettings) CrossVerifyResult

cross_validate performs n-fold cross-validation on a dataset: it partitions the instances in a dataset into a fold, trains a classifier on all the dataset instances not in the fold, and then uses this classifier to classify the fold cases. This process is repeated for each of n folds, and the classification results are summarized.

Options (also see the Options struct):
bins: range for binning or slicing of continuous attributes;
number_of_attributes: the number of attributes to use, in descending
order of rank value;
exclude_flag: excludes missing values when ranking attributes;
weighting_flag: nearest neighbor counts are weighted by
class prevalences;
folds: number of folds n to use for n-fold cross-validation (default
is leave-one-out cross-validation);
repetitions: number of times to repeat n-fold cross-validations;
random-pick: choose instances randomly for n-fold cross-validations.
Output options:
show_flag: prints results to the console;
expanded_flag: prints additional information to the console, including
a confusion matrix.
outputfile_path: saves the result as a json file.

fn display_file #

fn display_file(path string, in_opts Options, disp DisplaySettings)

display_file displays on the console, a results file as produced by other hamnn functions; a multiple classifier settings file; or graphs for explore, ranking, or crossvalidation results.

display_file('path_to_saved_results_file', expanded_flag: true)
Output options:
expanded_flag: display additional information on the console, including
a confusion matrix for cross-validation or verification operations;
graph_flag: generates plots for display in the default web browser.

struct Options #

@[params]
struct Options {
	Parameters
	LoadOptions // DisplaySettings
	MultipleOptions
	MultipleClassifierSettingsArray
pub mut:
	struct_type                         string = '.Options'
	non_options                         []string
	bins                                []int = [1, 16]
	concurrency_flag                    bool
	datafile_path                       string = 'datasets/developer.tab'
	testfile_path                       string
	outputfile_path                     string
	classifierfile_path                 string
	instancesfile_path                  string
	multiple_classify_options_file_path string
	settingsfile_path                   string
	help_flag                           bool
	multiple_flag                       bool
	append_settings_flag                bool
	command                             string
	args                                []string
	kagglefile_path                     string
}

Options struct: can be used as the last parameter in a function's parameter list, to enable default values to be passed to functions.

struct PlotResult #

struct PlotResult {
pub mut:
	bin             int
	attributes_used int
	correct_count   int
	total_count     int
}

struct Attribute #

struct Attribute {
pub mut:
	id            int
	name          string
	count         int
	counts_map    map[string]int
	uniques       int
	missing       int
	raw_type      string
	att_type      string
	inferred_type string
	for_training  bool
	min           f32
	max           f32
	mean          f32
	median        f32
}

struct Binning #

struct Binning {
mut:
	lower    int
	upper    int
	interval int
}

struct Class #

struct Class {
pub mut:
	class_name                 string // the attribute which holds the class
	class_index                int
	classes                    []string // to ensure that the ordering remains the same
	class_values               []string
	missing_class_values       []int // these are the indices of the original class values array
	class_counts               map[string]int
	lcm_class_counts           i64
	prepurge_class_values_len  int
	postpurge_class_counts     map[string]int
	postpurge_lcm_class_counts i64
}

struct Classifier #

struct Classifier {
	Parameters
	LoadOptions
	Class
pub mut:
	struct_type        string = '.Classifier'
	datafile_path      string
	attribute_ordering []string
	trained_attributes map[string]TrainedAttribute
	// maximum_hamming_distance int
	indices   []int
	instances [][]u8
	history   []HistoryEvent
}

struct ClassifierSettings #

struct ClassifierSettings {
	Parameters
	BinaryMetrics
	Metrics
}

struct ClassifyResult #

struct ClassifyResult {
	LoadOptions
	Class
pub mut:
	struct_type                string = '.ClassifyResult'
	index                      int
	inferred_class             string
	inferred_class_array       []string
	labeled_class              string
	nearest_neighbors_by_class []int
	nearest_neighbors_array    [][]int
	classes                    []string
	class_counts               map[string]int
	weighting_flag             bool
	weighting_flag_array       []bool
	multiple_flag              bool
	hamming_distance           int
	sphere_index               int
}

struct RankingResult #

struct RankingResult {
	LoadOptions
pub mut:
	struct_type                string = '.RankingResult'
	path                       string
	exclude_flag               bool
	weight_ranking_flag        bool
	binning                    Binning
	array_of_ranked_attributes []RankedAttribute
}

struct CliOptions #

@[params]
struct CliOptions {
	LoadOptions
pub mut:
	args []string
}

struct LoadOptions #

@[params]
struct LoadOptions {
	DefaultVals
pub mut:
	class_missing_purge_flag bool
}

struct ResultForClass #

struct ResultForClass {
pub mut:
	labeled_instances    int
	correct_inferences   int
	incorrect_inferences int
	wrong_inferences     int
	confusion_matrix_row map[string]int
}

struct MultipleClassifierSettingsArray #

struct MultipleClassifierSettingsArray {
pub mut:
	multiple_classifier_settings []ClassifierSettings
}

struct CrossVerifyResult #

struct CrossVerifyResult {
	Parameters
	LoadOptions
	DisplaySettings
	Metrics
	BinaryMetrics
	MultipleOptions
	MultipleClassifierSettingsArray
pub mut:
	struct_type                         string = '.CrossVerifyResult'
	command                             string
	datafile_path                       string
	testfile_path                       string
	multiple_classify_options_file_path string
	labeled_classes                     []string
	actual_classes                      []string
	inferred_classes                    []string
	nearest_neighbors_by_class          [][]int
	instance_indices                    []int
	classes                             []string
	class_counts                        map[string]int
	labeled_instances                   map[string]int
	correct_inferences                  map[string]int
	incorrect_inferences                map[string]int
	wrong_inferences                    map[string]int
	true_positives                      map[string]int
	false_positives                     map[string]int
	true_negatives                      map[string]int
	false_negatives                     map[string]int
	// outer key: actual class; inner key: predicted class
	confusion_matrix_map            map[string]map[string]f64
	pos_neg_classes                 []string
	correct_count                   int
	incorrects_count                int
	wrong_count                     int
	total_count                     int
	bin_values                      []int // used for displaying the binning range for explore
	attributes_used                 int
	prepurge_instances_counts_array []int
	classifier_instances_counts     []int
	repetitions                     int
	confusion_matrix                [][]string
	trained_attributes_array        []map[string]TrainedAttribute
}

Returned by cross_validate() and verify()

struct Dataset #

struct Dataset {
	Class // DataDict
	LoadOptions
pub mut:
	struct_type                  string = '.Dataset'
	path                         string
	attribute_names              []string
	attribute_flags              []string
	raw_attribute_types          []string
	attribute_types              []string
	inferred_attribute_types     []string
	data                         [][]string
	useful_continuous_attributes map[int][]f32
	useful_discrete_attributes   map[int][]string
	row_identifiers              []string
}

fn (Dataset) purge_instances_for_missing_class_values #

fn (mut ds Dataset) purge_instances_for_missing_class_values() Dataset

struct DefaultVals #

struct DefaultVals {
pub mut:
	missings                   []string = ['?', '', 'NA', ' ']
	integer_range_for_discrete []int    = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
}

pub const missings = ['?', '', 'NA', ' '] pub const integer_range_for_discrete = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

struct MultipleOptions #

struct MultipleOptions {
	TotalNnParams
pub mut:
	break_on_all_flag    bool
	combined_radii_flag  bool
	total_nn_counts_flag bool
	classifier_indices   []int
}

struct DisplaySettings #

@[params]
struct DisplaySettings {
pub mut:
	show_flag            bool
	expanded_flag        bool
	show_attributes_flag bool
	graph_flag           bool
	verbose_flag         bool
}

struct Environment #

struct Environment {
pub mut:
	hamnn_version  string
	cached_cpuinfo map[string]string
	os_kind        string
	os_details     string
	arch_details   []string
	vexe_mtime     string
	v_full_version string
	vflags         string
}

struct TrainedAttribute #

struct TrainedAttribute {
pub mut:
	attribute_type    string
	translation_table map[string]int
	minimum           f32
	maximum           f32
	bins              int
	rank_value        f32
	index             int
	folds_count       int // for cross-validations, this tracks how many folds use this attribute
}

struct ExploreResult #

struct ExploreResult {
	Class
	Parameters
	LoadOptions
	AttributeRange
	DisplaySettings
pub mut:
	struct_type      string = '.ExploreResult'
	path             string
	testfile_path    string
	pos_neg_classes  []string
	array_of_results []CrossVerifyResult
	accuracy_types   []string = ['raw accuracy', 'balanced accuracy']
	analytics        []MaxSettings
	args             []string
}

struct ValidateResult #

struct ValidateResult {
	Class
	Parameters
	LoadOptions
pub mut:
	struct_type                     string = '.ValidateResult'
	datafile_path                   string
	validate_file_path              string
	row_identifiers                 []string
	inferred_classes                []string
	counts                          [][]int
	instances                       [][]u8
	attributes_used                 int
	prepurge_instances_counts_array []int
	classifier_instances_counts     []int
}

struct OneVsRestClassifier #

struct OneVsRestClassifier {
	Parameters
	LoadOptions
	Class
pub mut:
	struct_type   string = '.OneVsRestClassifier'
	datafile_path string
	history       []HistoryEvent
}

struct RankedAttribute #

struct RankedAttribute {
pub mut:
	attribute_index  int
	attribute_name   string
	attribute_type   string
	rank_value       f32
	rank_value_array []f32
	bins             int
}

struct HistoryEvent #

struct HistoryEvent {
pub mut:
	event_date               time.Time
	instances_count          int
	prepurge_instances_count int
	event_environment        Environment
	event                    string
	file_path                string
}

struct OptimalsResult #

struct OptimalsResult {
pub mut:
	class_counts                                []int
	balanced_accuracy_max                       f64
	balanced_accuracy_max_classifiers           []int
	mcc_max                                     f64
	mcc_max_classifiers                         []int
	correct_inferences_total_max                int
	correct_inferences_total_max_classifiers    []int
	classes                                     []string
	correct_inferences_by_class_max             []int
	correct_inferences_by_class_max_classifiers [][]int
}

struct AnalyzeResult #

struct AnalyzeResult {
	LoadOptions
pub mut:
	struct_type             string = '.AnalyzeResult'
	environment             Environment
	datafile_path           string
	datafile_type           string
	class_name              string
	class_index             int
	class_counts            map[string]int
	attributes              []Attribute
	overall_min             f32
	overall_max             f32
	use_inferred_types_flag bool
}