Split Records for Training and Test

Splitting Records for Training and Test

Calculation Parameters

  • CPUs (cpu_count) type: integer: The number of CPUs to run this cube with
    Default: 1 , Min: 1, Max: 128
  • Cube Metrics (cube_metrics) type: string: Set of metrics to be collected

    Choices: cpu, disk, memory, network
  • Temporary Disk Space (MiB) (disk_space) type: decimal: The minimum amount of disk space in MiB (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.
    Default: 5120.0 , Min: 128.0, Max: 8589934592
  • Do External Validation (do_ext_valid) type: boolean: Whether to do external validation. If true, floe will look for specified tag field with specified tag value to identify external validation set.
    Default: False
  • GPUs (gpu_count) type: integer: The number of GPUs to run this cube with
    Default: 0 , Max: 16
  • Instance Tags (instance_tags) type: string: Only run on machines with matching tags (comma separated)
    Default: “”
  • Instance Type (instance_type) type: string: The type of instance that this cube needs to be run on
  • Memory (MiB) (memory_mb) type: decimal: The minimum amount of memory in MiBs (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.
    Default: 1800 , Min: 256.0, Max: 8589934592
  • Metric Period (metric_period) type: decimal: How often to sample metrics, in seconds
    Default: 60
    Choices: 1, 5, 10, 30, 60, 120, 180, 240, 300, Min: 1, Max: 300
  • Minimum probability (min_prob) type: decimal: Minimum pose probability for a valid training set record
    Default: 0.5 , Max: 1.0
  • Number of Split Sets (Random Split) (num_random_set) type: integer: Number of times the random split to perform
    Default: 50 , Min: 1
  • Percentage (Random Split) (percentage) type: decimal: The percentage of records used for training in random split
    Default: 90.0 , Min: 1.0, Max: 99.0
  • Thread limit per CPU (pids_per_cpu_limit) type: integer: The number of threads per CPU
    Default: 32
  • Random number seed (random_seed) type: integer: Random number seed for random dispatch
    Default: 0
  • Shared Memory (MiB) (shared_memory_mb) type: decimal: The amount of shared memory to allow a container to address
    Default: 64
  • Split Method (split_method) type: string: Way to split the dataset into training and validation set
    Default: leave one out
    Choices: random, leave one out
  • Spot policy (spot_policy) type: string: Control cube placement on spot market instances
    Default: Prohibited
    Choices: Allowed, Preferred, NotPreferred, Prohibited, Required
  • External Validation Set Tag Value (test_tag_value) type: integer: Value of tag field for external validation set
    Default: 1

Field parameters

  • Extended Log Field (ext_log_field) type: Field Type: StringVec: Message extended log field
    Default: Extended Log Field
  • Input probability field (in_prob_field) type: Field Type: Float: Field containing input Posit probability
    Default: Posit Probability
  • External Validation Tag Field (in_test_tag_field) type: Field Type: Int: Field containing tag for external validation set
    Default: External validation tag
  • Log Field (log_field) type: Field Type: String: The field to store messages to floe report
    Default: Log Field
  • Split counter (out_counter_field) type: Field Type: Int: Counter index of the split
    Default: Split counter
  • Validation set (test_set_field) type: Field Type: RecordVec: Output validation set records vector
    Default: Validation
  • Training set (training_set_field) type: Field Type: RecordVec: Output training set records vector
    Default: Training

Hardware Parameters

Machine hardware requirements
  • Memory (MiB) (memory_mb) type: decimal: The minimum amount of memory in MiBs (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.
    Default: 1800 , Min: 256.0, Max: 8589934592
  • Shared Memory (MiB) (shared_memory_mb) type: decimal: The amount of shared memory to allow a container to address
    Default: 64
  • Thread limit per CPU (pids_per_cpu_limit) type: integer: The number of threads per CPU
    Default: 32
  • Temporary Disk Space (MiB) (disk_space) type: decimal: The minimum amount of disk space in MiB (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.
    Default: 5120.0 , Min: 128.0, Max: 8589934592
  • GPUs (gpu_count) type: integer: The number of GPUs to run this cube with
    Default: 0 , Max: 16
  • CPUs (cpu_count) type: integer: The number of CPUs to run this cube with
    Default: 1 , Min: 1, Max: 128
  • Instance Type (instance_type) type: string: The type of instance that this cube needs to be run on
  • Spot policy (spot_policy) type: string: Control cube placement on spot market instances
    Default: Prohibited
    Choices: Allowed, Preferred, NotPreferred, Prohibited, Required
  • Instance Tags (instance_tags) type: string: Only run on machines with matching tags (comma separated)
    Default: “”

Metrics Parameters

Cube Metric Parameters
  • Metric Period (None) type: decimal: How often to sample metrics, in seconds
    Default: 60
    Choices: 1, 5, 10, 30, 60, 120, 180, 240, 300, Min: 1, Max: 300
  • Cube Metrics (None) type: string: Set of metrics to be collected

    Choices: cpu, disk, memory, network