vhammll #
# VHamMLLA machine learning (ML) library for classification using a nearest neighbor algorithm based on Hamming distances.
You can incorporate the VHamMLL
functions into your own code, or use the included Command Line Interface app (cli.v
).
Link to html documentation for the library functions and structs
You can use VHamMLL
with your own datasets, or with a selection of publicly available datasets that are widely used for demonstrating and testing ML classifiers, in the datasets
directory. These files are mostly in Orange file format; there are also datasets in ARFF (Attribute-Relation File Format) or in comma-separated-values (CSV) as used in Kaggle.
What, another AI package? Is that necessary? And have a look here for a more complete description and potential use cases.
For interactive descriptions of the two key algorithms used by VHamMLL, download the Numbers app spreadsheets: Description of Ranking Algorithm and Description of Classification Algorithm.
Usage:
To use the VHamMLL library in an existing Vlang project:
v install holder66.vhammll
You may also need to install its dependencies, if not automatically installed:
v install vsl
v install Mewzax.chalk
In your v code, add: import holder66.vhammll
To use the library with the Command Line Interface (CLI):
First, install V, if not already installed. On MacOS, Linux etc. you need git
and a C compiler (For windows or android environments, see the v lang documentation).
In a terminal:
git clone https://github.com/vlang/v
cd v
make
sudo ./v symlink # add v to your PATH
v install holder66.vhammll
See above re needed dependencies.
In a folder or directory that you want to use for your project, you will need to create a file with module main
, and a function main()
. You can do this in the terminal, or with a text editor. The file should contain:
module main
import holder66.vhammll
fn main() {
vhammll.cli()!
}
Assuming you've named the directory or folder vhamml
and the file within main.v
, in the terminal: v run .
followed by the command line arguments, eg v run . --help
or v run . analyze <path_to_dataset_file>
Command-specific help is available, like so: v run . explore --help
or v run . explore -h
Note that the publicly available datasets included with the VHamMLL distribution can be found at ~/.vmodules/holder66/vhammll/datasets
.
That's it!
Tutorial:
v run . examples go
Updating:
v up # installs the latest release of V
v update # get the latest version of the libraries, including holder66.vhammll
v . # recompile
Getting help:
The V lang community meets on Discord
For bug reports, feature requests, etc., please raise an issue on github
Speed things up:
Use the -c (--concurrent) argument (in the CLI) to make use of available CPU cores for some vhammll functions; this may speed things up (timings are on a MacBook Pro 2019)
v main.v
./main explore ~/.vmodules/holder66/vhammll/datasets/iris.tab # 10.157 sec
./main explore -c ~/.vmodules/holder66/vhammll/datasets/iris.tab # 4.910 sec
A huge speedup usually happens if you compile using the -prod (for production) option. The compilation itself takes longer, but the resulting code is highly optimized.
v -prod main.v
./main explore ~/.vmodules/holder66/vhammll/datasets/iris.tab # 3.899 sec
./main explore -c ~/.vmodules/holder66/vhammll/datasets/iris.tab # 4.849 sec!!
Note that in this case, there is no speedup for -prod
when the -c
argument is used.
Examples showing use of the Command Line Interface
Please see examples_of_command_line_usage.md
Example: typical use case, a clinical risk calculator
Health care professionals frequently make use of calculators to inform clinical decision-making. Data regarding symptoms, findings on physical examination, laboratory and imaging results, and outcome information such as diagnosis, risk for developing a condition, or response to specific treatments, is collected for a sample of patients, and then used to form the basis of a formula that can be used to predict the outcome information of interest for a new patient, based on how their symptoms and findings, etc. compare to those in the dataset.
Please see clinical_calculator_example.md.
Example: finding useful information embedded in noise
Please see a worked example here: noisy_data.md
MNIST dataset
The mnist_train.tab file is too large to keep in the repository. If you wish to experiment with it, it can be downloaded by right-clicking on this link in a web browser, or downloaded via the command line:
wget https://henry.olders.ca/datasets/mnist_train.tab
The process of development in its early stages is described in this essay written in 1989.
Copyright (c) 2017, 2024: Henry Olders.
fn verify #
fn verify(opts Options, disp DisplaySettings) CrossVerifyResult
verify classifies all the instances in a verification datafile (specified by opts.testfile_path
) using a trained Classifier; returns metrics comparing the inferred classes to the labeled (assigned) classes of the verification datafile.
Optional (also see `make_classifier.v` for options in training a classifier)
weighting_flag: nearest neighbor counts are weighted by
class prevalences.
Output options:
show_flag: display results on the console;
expanded_flag: display additional information on the console, including
a confusion matrix.
outputfile_path: saves the result as a json file
fn explore #
fn explore(ds Dataset, opts Options, disp DisplaySettings) ExploreResult
explore runs a series of cross-validations or verifications, over a range of attributes and a range of binning values.
Options (also see the Options struct):
bins: range for binning or slicing of continuous attributes;
uniform_bins: same number of bins for all continuous attributes;
number_of_attributes: range for attributes to include;
exclude_flag: excludes missing values when ranking attributes;
weighting_flag: nearest neighbor counts are weighted by
class prevalences;
folds: number of folds n to use for n-fold cross-validation (default
is leave-one-out cross-validation);
repetitions: number of times to repeat n-fold cross-validations;
random-pick: choose instances randomly for n-fold cross-validations.
Output options:
show_flag: display results on the console;
expanded_flag: display additional information on the console, including
a confusion matrix for each explore step;
graph_flag: generate plots of Receiver Operating Characteristics (ROC)
by attributes used; ROC by bins used, and accuracy by attributes
used.
outputfile_path: saves the result to a file.
fn file_type #
fn file_type(path string) string
file_type returns a string identifying how a dataset is structured or formatted, eg 'orange_newer', 'orange_older', 'arff', or 'csv'. On the assumption that an 'orange_older' file will always identify a class attribute by having 'c' or 'class' in the third header line, all other tab-delimited datafiles will be typed as 'orange_newer'.
Example
assert file_type('datasets/iris.tab') == 'orange_older'
fn get_useful_continuous_attributes #
fn get_useful_continuous_attributes(ds Dataset) map[int][]f32
get_useful_continuous_attributes
fn get_useful_discrete_attributes #
fn get_useful_discrete_attributes(ds Dataset) map[int][]string
get_useful_discrete_attributes
fn is_nan #
fn is_nan[T](f T) bool
fn load_classifier_file #
fn load_classifier_file(path string) !Classifier
load_classifier_file loads a file generated by make_classifier(); returns a Classifier struct.
Example
cl := load_classifier_file('tempfolder/saved_classifier.txt')
fn load_file #
fn load_file(path string, opts LoadOptions) Dataset
load_file returns a struct containing the datafile's contents, suitable for generating a classifier
Example
ds := load_file('datasets/iris.tab')
fn load_instances_file #
fn load_instances_file(path string) !ValidateResult
load_instances_file loads a file generated by validate() or query(), and returns it as a struct, suitable for appending to a classifier.
Example
instances := load_instances_file('tempfolder/saved_validate_result.txt')
fn make_classifier #
fn make_classifier(dds Dataset, opts Options, disp DisplaySettings) Classifier
make_classifier returns a Classifier struct, given a Dataset (as created by load_file).
Options (also see the Options struct):
bins: range for binning or slicing of continuous attributes;
uniform_bins: same number of bins for continuous attributes;
number_of_attributes: the number of highest-ranked attributes to include;
exclude_flag: excludes missing values when ranking attributes;
purge_flag: remove those instances which are duplicates, after
binning and based on only the attributes to be used;
outputfile_path: if specified, saves the classifier to this file.
fn nan #
fn nan[T]() T
fn one_vs_rest_verify #
fn one_vs_rest_verify(opts Options, disp DisplaySettings) CrossVerifyResult
one_vs_rest_verify classifies all the cases in a verification datafile (specified by opts.testfile_path
) using an array of trained Classifiers, one per class; each classifier is trained using a one class vs all the other classes. It returns metrics comparing the inferred classes to the labeled (assigned) classes of the verification datafile.
Optional (also see `make_classifier.v` for options in training a classifier)
weighting_flag: nearest neighbor counts are weighted by
class prevalences.
Output options:
show_flag: display results on the console;
expanded_flag: display additional information on the console, including
a confusion matrix.
outputfile_path: saves the result as a json file
fn optimals #
fn optimals(path string, in_opts Options, disp DisplaySettings) OptimalsResult
optimals determines which classifiers provide the best balanced accuracy, highest total for correct inferences, and highest correct inferences per class, for multiple classifiers whose settings are stored in a settings file.
fn purge_instances_for_missing_class_values_not_inline #
fn purge_instances_for_missing_class_values_not_inline(mut ds Dataset) Dataset
fn query #
fn query(cl Classifier, opts Options, disp DisplaySettings) ClassifyResult
query takes a trained classifier and performs an interactive session with the user at the console, asking the user to input a value for each trained attribute. It then asks to confirm or redo the responses. Once confirmed, the instance is classified and the inferred class is shown. The classified instance can optionally be saved in a file. The saved instance can be appended to the classifier using append_instances().
fn rank_attributes #
fn rank_attributes(ds Dataset, opts Options, disp DisplaySettings) RankingResult
rank_attributes takes a Dataset and returns a list of all the dataset's usable attributes, ranked in order of each attribute's ability to separate the classes.
Algorithm:
for each attribute:
create a matrix with attribute values for row headers, and
class values for column headers;
for each unique value `val` for that attribute:
for each unique value `class` of the class attribute:
for each instance:
accumulate a count for those instances whose class value
equals `class`;
populate the matrix with these accumulated counts;
for each `val`:
get the absolute values of the differences between accumulated
counts for each pair of `class` values`;
add those absolute differences;
total those added absolute differences to get the raw rank value
for that attribute.
To obtain rank values weighted by class prevalences, use the same algorithm
except before taking the difference of each pair of accumulated counts,
multiply each count of the pair by the class prevalence of the other class.
(Note: rank_attributes always uses class prevalences as weights)
Obtain a maximum rank value by calculating a rank value for the class
attribute itself.
To obtain normalized rank values:
for each attribute:
divide its raw rank value by the maximum rank value and multiply by 100.
Sort the attributes by descending rank values.
Options:
-b --bins: specifies the range for binning (slicing) continous attributes;
-x --exclude: to exclude missing values when calculating rank values;
Output options:
`show_flag` to print the ranked list to the console;
`graph_flag` to generate plots of rank values for each attribute on the
y axis, with number of bins on the x axis.
`outputfile_path`, saves the result as json.
fn rank_one_vs_rest #
fn rank_one_vs_rest(ds Dataset, opts Options, disp DisplaySettings) RankingResult
rank_attributes takes a Dataset and returns a list of all the dataset's usable attributes, ranked in order of each attribute's ability to separate the classes.
Algorithm:
for each attribute:
create a matrix with attribute values for row headers, and
class values for column headers;
for each unique value `val` for that attribute:
for each unique value `class` of the class attribute:
for each instance:
accumulate a count for those instances whose class value
equals `class`;
populate the matrix with these accumulated counts;
for each `val`:
get the absolute values of the differences between accumulated
counts for each pair of `class` values`;
add those absolute differences;
total those added absolute differences to get the raw rank value
for that attribute.
To obtain rank values weighted by class prevalences, use the same algorithm
except before taking the difference of each pair of accumulated counts,
multiply each count of the pair by the class prevalence of the other class.
(Note: rank_attributes always uses class prevalences as weights)
Obtain a maximum rank value by calculating a rank value for the class
attribute itself.
To obtain normalized rank values:
for each attribute:
divide its raw rank value by the maximum rank value and multiply by 100.
Sort the attributes by descending rank values.
Options:
-b --bins: specifies the range for binning (slicing) continous attributes;
-x --exclude: to exclude missing values when calculating rank values;
Output options:
`show_flag` to print the ranked list to the console;
`graph_flag` to generate plots of rank values for each attribute on the
y axis, with number of bins on the x axis.
`outputfile_path`, saves the result as json.
fn save_json_file #
fn save_json_file[T](u T, path string)
save_json_file
fn set_class_struct #
fn set_class_struct(ds Dataset) Class
set_class_struct
fn show_analyze #
fn show_analyze(result AnalyzeResult)
show_analyze prints out to the console, a series of tables detailing a dataset. It takes as input an AnalyzeResult struct generated by analyze_dataset().
fn show_classifier #
fn show_classifier(cl Classifier)
show_classifier outputs to the console information about a classifier
fn show_crossvalidation #
fn show_crossvalidation(result CrossVerifyResult, opts Options, disp DisplaySettings)
show_crossvalidation
fn show_rank_attributes #
fn show_rank_attributes(result RankingResult)
show_rank_attributes
fn show_validate #
fn show_validate(result ValidateResult)
show_validate
fn show_verify #
fn show_verify(result CrossVerifyResult, opts Options, disp DisplaySettings)
show_verify
fn transpose #
fn transpose[T](matrix [][]T) [][]T
transpose a 2d array
fn validate #
fn validate(cl Classifier, opts Options, disp DisplaySettings) !ValidateResult
validate classifies each instance of a validation datafile against a trained Classifier; returns the predicted classes for each case of the validation_set. The file to be validated is specified by opts.testfile_path
. Optionally, saves the cases and their predicted classes in a file. This file can be used to append these cases to the classifier.
fn analyze_dataset #
fn analyze_dataset(ds Dataset, opts Options, disp DisplaySettings) AnalyzeResult
analyze_dataset returns a struct with information about a datafile.
Optional:
if show_flag is true, displays on the console (using show_analyze):
1. a list of attributes, their types, the unique values, and a count of
missing values;
2. a table with counts for each type of attribute;
3. a list of discrete attributes useful for training a classifier;
4. a list of continuous attributes useful for training a classifier;
5. a breakdown of the class attribute, showing counts for each class.
outputfile_path: if specified, saves the analysis results.
fn append_instances #
fn append_instances(cl Classifier, instances_to_append ValidateResult, opts Options, disp DisplaySettings) Classifier
append_instances extends a classifier by adding more instances. It returns the extended classifier struct.
Output options:
show_flag: display results on the console;
outputfile_path: saves the extended classifier to a file.
fn cli #
fn cli(cli_options CliOptions) !
the command line interface app for the holder66.vhamml ML library. In a terminal, type: v run . --help
Usage: v run . [command] [flags] <path_to_datafile>
Datafiles should be either tab-delimited, or have extension .csv or .arff
Commands: analyze | append | cross | display | examples | explore
| make | optimals | orange | query | rank | validate | verify
Flags and options:
-a --attributes, can be one, two, or 3 integers; a single integer will
be used by make_classifier to produce a classifier with that number
of attributes. More than one integer will be used by
explore to provide a range and an interval.
-b --bins, can be one, two, or 3 integers; a single integer for one bin
value to be used for all attributes; two integers for a range of bin
values; a third integer specifies an interval for the range (note that
the binning range is from the upper to the lower value);
note: when doing an explore, the first integer specifies the lower
limit for the number of bins, and the second gives the upper value
for the explore range. Example: explore -b 3,6 would first use 3 - 3,
then 3 - 4, then 3 - 5, and finally 3 - 6 for the binning ranges.
If the uniform flag is true, then a single integer specifies
the number of bins for all continuous attributes; two integers for a
range of uniform bin values for the explore command; a third integer
for the interval to be used over the explore range;
-bp, --balanced-prevalences, multiply the number of instances for classes
with low prevalence, to more closely balance prevalences;
-c --concurrent, permit parallel processing to use multiple cores;
-e --expanded, expanded results on the console;
-ea display information re trained attributes on the console, for
classification operations;
-f --folds, default is leave-one-out;
-g --graph, displays a plot;
-h --help,
-k --classifier, followed by the path to a file for a saved Classifier
-ka --kaggle, followed by the path to a file. Used with the "validate" command,
a csv file suitable for submission to a Kaggle competition is created;
-m --multiple, classify using more than one trained classifier, followed by
the path to a json file with parameters to generate each classifier;
-ma when multiple classifiers are used, stop classifying when matches
have been found for all classifiers;
-mc when multiple classifiers are used, combine the possible hamming
distances for each classifier into a single list;
-mr for multiclass datasets, perform classification using a classifier for
each class, based on cases for that class set against all the other cases;
-mt when multiple classifiers are used, add the nearest neighbors from
each classifier, weight by class prevalences, and then infer
from the totals;
-m# followed by a list of which classifiers to apply in a multiple classi-
fication run (zero-indexed); also used to specify which classifiers to
append to a settings file;
-ms append the settings to a file (path follows flag) for use in multiple
classification (with -m#). When used with 'explore', the settings for
cases identified in the analytics are appended;
-o --output, followed by the path to a file in which a classifier, a
result, instances used for validation, or a query instance will be
stored;
-p --purge, removes instances which after binning are duplicates
-pmc --purge-missing-classes, removes instances for which the class value
is missing;
-r --reps, number of repetitions; if > 1, a random selection of
instances to be included in each fold will be applied
-s --show, output results to the console;
-t --test, followed by the path to the datafile to be verified or validated;
-u --uniform, specifies if uniform binning is to be used for the explore
command (note: to obtain uniform binning with verify, validate, query, or
or cross-validate, specify the same value for binning, eg -b 4,4)
-v --verbose
-w --weight, when classifying, weight the nearest neighbour counts by class prevalences;
-wr when ranking attributes, weight contributions by class prevalences;
-x --exclude, do not take into account missing values when ranking attributes;
fn close #
fn close[T](a T, b T) bool
fn combine_raw_and_inferred_types #
fn combine_raw_and_inferred_types(ds Dataset) []string
fn cross_validate #
fn cross_validate(ds Dataset, opts Options, disp DisplaySettings) CrossVerifyResult
cross_validate performs n-fold cross-validation on a dataset: it partitions the instances in a dataset into a fold, trains a classifier on all the dataset instances not in the fold, and then uses this classifier to classify the fold cases. This process is repeated for each of n folds, and the classification results are summarized.
Options (also see the Options struct):
bins: range for binning or slicing of continuous attributes;
number_of_attributes: the number of attributes to use, in descending
order of rank value;
exclude_flag: excludes missing values when ranking attributes;
weighting_flag: nearest neighbor counts are weighted by
class prevalences;
folds: number of folds n to use for n-fold cross-validation (default
is leave-one-out cross-validation);
repetitions: number of times to repeat n-fold cross-validations;
random-pick: choose instances randomly for n-fold cross-validations.
Output options:
show_flag: prints results to the console;
expanded_flag: prints additional information to the console, including
a confusion matrix.
outputfile_path: saves the result as a json file.
fn display_file #
fn display_file(path string, in_opts Options, disp DisplaySettings)
display_file displays on the console, a results file as produced by other hamnn functions; a multiple classifier settings file; or graphs for explore, ranking, or crossvalidation results.
display_file('path_to_saved_results_file', expanded_flag: true)
Output options:
expanded_flag: display additional information on the console, including
a confusion matrix for cross-validation or verification operations;
graph_flag: generates plots for display in the default web browser.
struct Options #
struct Options {
Parameters
LoadOptions // DisplaySettings
MultipleOptions
MultipleClassifierSettingsArray
pub mut:
struct_type string = '.Options'
non_options []string
bins []int = [1, 16]
concurrency_flag bool
datafile_path string = 'datasets/developer.tab'
testfile_path string
outputfile_path string
classifierfile_path string
instancesfile_path string
multiple_classify_options_file_path string
settingsfile_path string
help_flag bool
multiple_flag bool
append_settings_flag bool
command string
args []string
kagglefile_path string
}
Options struct: can be used as the last parameter in a function's parameter list, to enable default values to be passed to functions.
struct PlotResult #
struct PlotResult {
pub mut:
bin int
attributes_used int
correct_count int
total_count int
}
struct Attribute #
struct Attribute {
pub mut:
id int
name string
count int
counts_map map[string]int
uniques int
missing int
raw_type string
att_type string
inferred_type string
for_training bool
min f32
max f32
mean f32
median f32
}
struct Binning #
struct Binning {
mut:
lower int
upper int
interval int
}
struct Class #
struct Class {
pub mut:
class_name string // the attribute which holds the class
class_index int
classes []string // to ensure that the ordering remains the same
class_values []string
missing_class_values []int // these are the indices of the original class values array
class_counts map[string]int
lcm_class_counts i64
prepurge_class_values_len int
postpurge_class_counts map[string]int
postpurge_lcm_class_counts i64
}
struct Classifier #
struct Classifier {
Parameters
LoadOptions
Class
pub mut:
struct_type string = '.Classifier'
datafile_path string
attribute_ordering []string
trained_attributes map[string]TrainedAttribute
// maximum_hamming_distance int
indices []int
instances [][]u8
history []HistoryEvent
}
struct ClassifierSettings #
struct ClassifierSettings {
Parameters
BinaryMetrics
Metrics
}
struct ClassifyResult #
struct ClassifyResult {
LoadOptions
Class
pub mut:
struct_type string = '.ClassifyResult'
index int
inferred_class string
inferred_class_array []string
labeled_class string
nearest_neighbors_by_class []int
nearest_neighbors_array [][]int
classes []string
class_counts map[string]int
weighting_flag bool
weighting_flag_array []bool
multiple_flag bool
hamming_distance int
sphere_index int
}
struct RankingResult #
struct RankingResult {
LoadOptions
pub mut:
struct_type string = '.RankingResult'
path string
exclude_flag bool
weight_ranking_flag bool
binning Binning
array_of_ranked_attributes []RankedAttribute
}
struct CliOptions #
struct CliOptions {
LoadOptions
pub mut:
args []string
}
struct LoadOptions #
struct LoadOptions {
DefaultVals
pub mut:
class_missing_purge_flag bool
}
struct ResultForClass #
struct ResultForClass {
pub mut:
labeled_instances int
correct_inferences int
incorrect_inferences int
wrong_inferences int
confusion_matrix_row map[string]int
}
struct MultipleClassifierSettingsArray #
struct MultipleClassifierSettingsArray {
pub mut:
multiple_classifier_settings []ClassifierSettings
}
struct CrossVerifyResult #
struct CrossVerifyResult {
Parameters
LoadOptions
DisplaySettings
Metrics
BinaryMetrics
MultipleOptions
MultipleClassifierSettingsArray
pub mut:
struct_type string = '.CrossVerifyResult'
command string
datafile_path string
testfile_path string
multiple_classify_options_file_path string
labeled_classes []string
actual_classes []string
inferred_classes []string
nearest_neighbors_by_class [][]int
instance_indices []int
classes []string
class_counts map[string]int
labeled_instances map[string]int
correct_inferences map[string]int
incorrect_inferences map[string]int
wrong_inferences map[string]int
true_positives map[string]int
false_positives map[string]int
true_negatives map[string]int
false_negatives map[string]int
// outer key: actual class; inner key: predicted class
confusion_matrix_map map[string]map[string]f64
pos_neg_classes []string
correct_count int
incorrects_count int
wrong_count int
total_count int
bin_values []int // used for displaying the binning range for explore
attributes_used int
prepurge_instances_counts_array []int
classifier_instances_counts []int
repetitions int
confusion_matrix [][]string
trained_attributes_array []map[string]TrainedAttribute
}
Returned by cross_validate() and verify()
struct Dataset #
struct Dataset {
Class // DataDict
LoadOptions
pub mut:
struct_type string = '.Dataset'
path string
attribute_names []string
attribute_flags []string
raw_attribute_types []string
attribute_types []string
inferred_attribute_types []string
data [][]string
useful_continuous_attributes map[int][]f32
useful_discrete_attributes map[int][]string
row_identifiers []string
}
fn (Dataset) purge_instances_for_missing_class_values #
fn (mut ds Dataset) purge_instances_for_missing_class_values() Dataset
struct DefaultVals #
struct DefaultVals {
pub mut:
missings []string = ['?', '', 'NA', ' ']
integer_range_for_discrete []int = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
}
pub const missings = ['?', '', 'NA', ' '] pub const integer_range_for_discrete = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
struct MultipleOptions #
struct MultipleOptions {
TotalNnParams
pub mut:
break_on_all_flag bool
combined_radii_flag bool
total_nn_counts_flag bool
classifier_indices []int
}
struct DisplaySettings #
struct DisplaySettings {
pub mut:
show_flag bool
expanded_flag bool
show_attributes_flag bool
graph_flag bool
verbose_flag bool
}
struct Environment #
struct Environment {
pub mut:
hamnn_version string
cached_cpuinfo map[string]string
os_kind string
os_details string
arch_details []string
vexe_mtime string
v_full_version string
vflags string
}
struct TrainedAttribute #
struct TrainedAttribute {
pub mut:
attribute_type string
translation_table map[string]int
minimum f32
maximum f32
bins int
rank_value f32
index int
folds_count int // for cross-validations, this tracks how many folds use this attribute
}
struct ExploreResult #
struct ExploreResult {
Class
Parameters
LoadOptions
AttributeRange
DisplaySettings
pub mut:
struct_type string = '.ExploreResult'
path string
testfile_path string
pos_neg_classes []string
array_of_results []CrossVerifyResult
accuracy_types []string = ['raw accuracy', 'balanced accuracy']
analytics []MaxSettings
args []string
}
struct ValidateResult #
struct ValidateResult {
Class
Parameters
LoadOptions
pub mut:
struct_type string = '.ValidateResult'
datafile_path string
validate_file_path string
row_identifiers []string
inferred_classes []string
counts [][]int
instances [][]u8
attributes_used int
prepurge_instances_counts_array []int
classifier_instances_counts []int
}
struct OneVsRestClassifier #
struct OneVsRestClassifier {
Parameters
LoadOptions
Class
pub mut:
struct_type string = '.OneVsRestClassifier'
datafile_path string
history []HistoryEvent
}
struct RankedAttribute #
struct RankedAttribute {
pub mut:
attribute_index int
attribute_name string
attribute_type string
rank_value f32
rank_value_array []f32
bins int
}
struct HistoryEvent #
struct HistoryEvent {
pub mut:
event_date time.Time
instances_count int
prepurge_instances_count int
event_environment Environment
event string
file_path string
}
struct OptimalsResult #
struct OptimalsResult {
pub mut:
class_counts []int
balanced_accuracy_max f64
balanced_accuracy_max_classifiers []int
mcc_max f64
mcc_max_classifiers []int
correct_inferences_total_max int
correct_inferences_total_max_classifiers []int
classes []string
correct_inferences_by_class_max []int
correct_inferences_by_class_max_classifiers [][]int
}
struct AnalyzeResult #
struct AnalyzeResult {
LoadOptions
pub mut:
struct_type string = '.AnalyzeResult'
environment Environment
datafile_path string
datafile_type string
class_name string
class_index int
class_counts map[string]int
attributes []Attribute
overall_min f32
overall_max f32
use_inferred_types_flag bool
}
- README
- fn verify
- fn explore
- fn file_type
- fn get_useful_continuous_attributes
- fn get_useful_discrete_attributes
- fn is_nan
- fn load_classifier_file
- fn load_file
- fn load_instances_file
- fn make_classifier
- fn nan
- fn one_vs_rest_verify
- fn optimals
- fn purge_instances_for_missing_class_values_not_inline
- fn query
- fn rank_attributes
- fn rank_one_vs_rest
- fn save_json_file
- fn set_class_struct
- fn show_analyze
- fn show_classifier
- fn show_crossvalidation
- fn show_rank_attributes
- fn show_validate
- fn show_verify
- fn transpose
- fn validate
- fn analyze_dataset
- fn append_instances
- fn cli
- fn close
- fn combine_raw_and_inferred_types
- fn cross_validate
- fn display_file
- struct Options
- struct PlotResult
- struct Attribute
- struct Binning
- struct Class
- struct Classifier
- struct ClassifierSettings
- struct ClassifyResult
- struct RankingResult
- struct CliOptions
- struct LoadOptions
- struct ResultForClass
- struct MultipleClassifierSettingsArray
- struct CrossVerifyResult
- struct Dataset
- struct DefaultVals
- struct MultipleOptions
- struct DisplaySettings
- struct Environment
- struct TrainedAttribute
- struct ExploreResult
- struct ValidateResult
- struct OneVsRestClassifier
- struct RankedAttribute
- struct HistoryEvent
- struct OptimalsResult
- struct AnalyzeResult