Layer reference

This chapter describes the layers and their properties that can be used with the Neural Network Libraries.

Color of each layer

  • IO

    Input/output data.

  • Basic

    Perform computatoins on artificial neurons.
    (Binary layer required for binary neural networks)

  • Activation

    Apply non-linear conversion to input data for activation.
    (Binary layer required for binary neural networks)

  • Pooling

    Perform pooling.

  • Parameter

    Parameters to be optimized.

  • LoopControl

    Control repetitive processes.

  • Unit, Math, Loss, Others, Others (Pre Process)

    For various purposes, from arithmetic operations on tensor elements to preprocessing data.

  • Arithmetic (Scalar, 2 Inputs), Logical, Validation

    For arithmetic/logical opertaions, precision computation, etc.

 

Common layer properties

Common layer properties include the layer name and the properties that are automatically calculated based on the inter-layer link status and layer-specific properties.

Name This indicates the layer name.

Each layer name must be unique in the graph.

Input This indicates the layer’s input size.
Output This indicates the layer’s output size.
CostParameter This indicates the number of parameters that the layer contains.
CostAdd This indicates the number of multiplications required in forward calculation and the number of additions that cannot be performed during the same time.
CostMultiply This indicates the number of additions required in forward calculation and the number of multiplications that cannot be performed during the same time.
CostMultiplyAdd This indicates the number of multiplications and additions required in forward calculation (the number of multiplications and the number of additions that can be performed during the same time).
CostDivision This indicates the number of divisions required in forward calculation.
CostExp This indicates the number of exponentiation required in forward calculation.
CostIf This indicates the number of conditional decisions required in forward calculation.

 

Input and output layers

 

Input

This is the neural network input layer.

Size Specifies the input size.

For image data, the size is specified in the “the number of colors,height,width” format. For example, for a RGB color image whose width is 32 and height is 24, specify “3,24,32”. For a monochrome image whose width is 64 and height is 48, specify “1,48,64”.

For CSV data, the size is specified in the “the number of rows, the number of columns” format. For example, for CSV data consisting of 16 rows and 1 column, specify “16,1”. For CSV data consisting of 12 rows and 3 columns, specify “12,3”.

Dataset Specifies the name of the variable to input into this Input layer.
Generator Specifies the generator to use in this input layer. If the Generator property is not set to None, the data that the generator generates is used in place of the variable specified by Dataset during optimization.

None: Data generation is not performed.

Uniform: Uniform random numbers between -1.0 and 1.0 are generated.

Normal: Gaussian random numbers with 0.0 mean and 1.0 variance are generated.

Constant: Data whose elements are all constant (1.0) is generated.

GeneratorMultiplier Specifies the multiplier to apply to the values that the generator generates.

 

Loss layer

 

SquaredError

This is the output layer of a neural network that minimizes the squared errors between the variables and dataset variables. It is used when solving regression problems with neural networks (when optimizing neural networks that output continuous values).

T.Dataset Specifies the name of the variable expected to be the output of this SquaredError layer.
T.Generator Specifies the generator to use in place of the dataset. If the Generator property is not set to None, the data that the generator generates is used in place of the variable specified by T.Dataset during optimization.

None: Data generation is not performed.

Uniform: Uniform random numbers between -1.0 and 1.0 are generated.

Normal: Gaussian random numbers with 0.0 mean and 1.0 variance are generated.

Constant: Data whose elements are all constant (1.0) is generated.

T.GeneratorMultiplier Specifies the multiplier to apply to the values that the generator generates.

 

HuberLoss

This is the output layer of a neural network that minimizes the Huber loss between the variables and dataset variables. Like Squared Error, this is used when solving regression problems with neural networks. Using this in place of Squared Error has the effect of stabilizing the training process.

$$y_i= \begin{cases}
d^2 & (|d| < \delta) \
\delta (2 |d| – \delta) & ({\rm otherwise})
\end{cases}$$

where
$$d = x^{(0)}_i – x^{(1)}_i$$

Delta Specify δ, which is used as a threshold for increasing the loss linearly.

Other properties are the same as those of SquaredError.

 

AbsoluteError

This is the output layer of a neural network that minimizes absolute error between the variables and dataset variables. Like Squared Error, this is used when solving regression problems with neural networks. The properties are the same as those of SquaredError.

 

EpsilonInsensitiveLoss

This is the output layer of a neural network that minimizes absolute error exceeding the range specified by Epsilon between the variables and dataset variables. Like Squared Error, this is used when solving regression problems with neural networks.

$$y_i= \begin{cases}
0 & (|d| < \epsilon) \
|d| – \epsilon & ({\rm otherwise})
\end{cases}$$

where
$$d = x^{(0)}_i – x^{(1)}_i$$

Epsilon Specify ε

Other properties are the same as those of SquaredError.

 

BinaryCrossEntropy

This is the output layer of a neural network that minimizes the mutual information between the variable and dataset variables. It is used to solve binary classification problems (0 or 1). The input to BinaryCrossEntropy must be between 0.0 and 1.0 (probability), and the dataset variable must be 0 or 1. All the properties are the same as those of SquaredError.

 

SigmoidCrossEntropy

This is the output layer of a neural network that minimizes the mutual information between the variable and dataset variables. SigmoidCrossEntropy is equivalent to Sigmoid+BinaryCrossEntropy during training, but computing them at once has the effect of reducing computational error. All the properties are the same as those of SquaredError.

Reference

When SigmoidCrossEntropy is used instead of Sigmoid+BinaryCrossEntropy, continuous values without undergoing Sigmoid processing will be output for the evaluation results.

 

CategoricalCrossEntropy

This is the output layer of a neural network that minimizes the mutual information between the variables and the variables of a dataset given by a category index. All the properties are the same as those of SquaredError.

 

SoftmaxCrossEntropy

This is the output layer of a neural network that minimizes the mutual information between the variables and the variables of a dataset given by a category index. SoftmaxCrossEntropy is equivalent to Softmax+CategoricalCrossEntropy during training, but computing them at once has the effect of reducing computational error. All the properties are the same as those of SquaredError.

Reference

When SoftmaxCrossEntropy is used instead of Softmax+CategoricalCrossEntropy, continuous values without undergoing Softmax processing will be output for the evaluation results.

 

KLMultinomial

This is the output layer of a neural network that minimizes the Kullback Leibler distance between the probability distribution (p), which is a polynomial distribution input, and the dataset variable (q). All the properties are the same as those of SquaredError.

 

Basic layers

 

Affine

The affine layer is a fully-connected layer that has connections from all inputs to all output neurons specified with the OutShape property.

o = Wi+b

(where i is the input, o is the output, W is the weight, and b is the bias term.)

OutShape Specifies the number of output neurons of the Affine layer.
WithBias Specifies whether to include a bias term (b).
ParameterScope Specifies the name of the parameter used by this layer.

The parameter is shared between layers with the same ParameterScope.

W.File When using pre-trained weight W, specifies the file containing W with an absolute path.

If a file is specified and weight W is to be loaded from a file, initialization with the initializer will be disabled.

W.Initializer Specifies the initialization method for weight W.

Uniform: Initialization is performed using uniform random numbers between -1.0 and 1.0.

UniformAffineGlorot: Initialization is performed by applying the multiplier recommended by Xavier Glorot to uniform random numbers.

Normal: Initialization is performed using Gaussian random numbers with 0.0 mean and 1.0 variance.

NormalAffineHeForward: Initialization is performed by applying the multiplier recommended by Kaiming He to Gaussian random numbers (Forward Case).

NormalAffineHeBackward: Initialization is performed by applying the multiplier recommended by Kaiming He to Gaussian random numbers (Backward Case).

NormalAffineGlorot: Initialization is performed by applying the multiplier recommended by Xavier Glorot to Gaussian random numbers (default).

Constant: All elements are initialized with a constant (1.0).

W.InitializerMultiplier Specifies the multiplier to apply to the values that the initializer generates.
W.LRateMultiplier Specifies the multiplier to apply to the Learning Rate specified on the CONFIG tab. This multiplier is used to update weight W.

For example, if the Learning Rate specified on the CONFIG tab is 0.01 and W.LRateMultiplier is set to 2, weight W will be updated using a Learning Rate of 0.02.

b.* This is used to set the bias b. The properties are the same as those of W.

 

Convolution

The convolution layer is used to convolve the input.

Ox,m = Σ_i,n Wi,n,m Ix+i,n + bm (one-dimensional convolution)

Ox,y,m = Σ_i,j,n Wi,j,n,m Ix+i,y+j,n + bm (two-dimensional convolution)

(where O is the output; I is the input; i,j is the kernel size; x,y,n is the input index; m is the output map (OutMaps property), W is the kernel weight, and b is the bias term of each kernel)

KernelShape Specifies the convolution kernel size.

For example, to convolve an image with a 3 (height) by 5 (width) two-dimensional kernel, specify “3,5”.

For example, to convolve a one-dimensional time series signal with a 7-tap filter, specify “7”.

WithBias Specifies whether to include the bias term (b).
OutMaps Specifies the number of convolution kernels (which is equal to the number of output data samples).

For example, to convolve an input with 16 types of filters, specify “16”.

BorderMode Specifies the type of convolution border.

valid: Convolution is performed within the border of the kernel shape for the input data size of each axis.

In this case, the size of each axis of the output data is equal to input size – kernel shape + 1.

full: Convolution is performed within the border even with a single sample for the input data of each axis.

Insufficient data (kernel shape – 1 to the top, bottom, left, and right) within the border are padded with zeros.

In this case, the size of each axis of the output data is equal to input size + kernel shape – 1.

same: Convolution is performed within a border that would make the input data size the same as the output data size.

Insufficient data (kernel shape/2 – 1 to the top, bottom, left, and right) within the border are padded with zeros.

Padding Specifies the size of zero padding to add to the ends of the arrays before the convolution process. For example, to insert 3 pixels vertically and 2 pixels horizontally, specify “3,2”.

* ConvolutionPaddingSize: A value calculated from BorderMode is used for Padding.

Strides Specifies the period of performing the convolution (after how many samples kernel convolution is performed)

The output size of an axis with Stride set to a value other than 1 will be downsampled by the specified value.

For example, to convolve every two samples in the X-axis direction and every three samples in the Y-axis direction, specify “3,2”.

Dilation Specifies the factor by which the kernel is to be dilated using a stride value in unit of kernel size. For example, to dilate a 3 (height) by 5 (width) two-dimensional kernel to three times the height and two times the width and perform convolution on a 7 by 9 area, specify “3,2”.
Group Specifies the unit for grouping OutMaps.
ParameterScope Specifies the name of the parameter used by this layer.

The parameter is shared between layers with the same ParameterScope.

W.File When using pre-trained weight W, specify the file containing W with an absolute path.

If a file is specified and weight W is to be loaded from a file, initialization with the initializer will be disabled.

W.Initializer Specifies the initialization method for weight W.

Uniform: Initialization is performed using uniform random numbers between -1.0 and 1.0.

UniformConvolutionGlorot: Initialization is performed by applying the multiplier recommended by Xavier Glorot to uniform random numbers.

Normal: Initialization is performed using Gaussian random numbers with 0.0 mean and 1.0 variance.

NormalConvolutionHeForward: Initialization is performed by applying the multiplier recommended by Kaiming He to Gaussian random numbers (Forward Case).

NormalConvolutionHeBackward: Initialization is performed by applying the multiplier recommended by Kaiming He to Gaussian random numbers (Backward Case).

NormalConvolutionGlorot: Initialization is performed by applying the multiplier recommended by Xavier Glorot to Gaussian random numbers (default).

Constant: All elements are initialized with a constant (1.0).

W.InitializerMultiplier Specifies the multiplier to apply to the values that the initializer generates.
W.LRateMultiplier Specify the multiplier to apply to the Learning Rate specified on the CONFIG tab. This multiplier is used to update weight W.

For example, if the Learning Rate specified on the CONFIG tab is 0.01 and W.LRateMultiplier is set to 2, weight W will be updated using a Learning Rate of 0.02.

b.* This is used to set the bias b. The properties are the same as those of W.

Notes

Currently, NNabla supports only two-dimensional convolution. If you need to perform one-dimensional convolution, use Reshape to convert the tensor into two dimensions.

 

DepthwiseConvolution

The convolution layer is used to convolve the input for each map. The operation of DepthwiseConvolution is equivalent to setting Group of a convolution whose number of input and output maps is the same to OutMaps.

Ox,y,n = Σ_i,j Wi,j Ix+i,y+j,n + bn (two-dimensional convolution, when Multiplier=1)

(where O is the output; I is the input; i,j is the kernel size; x,y,n is the input index; W is the kernel weight, and b is the bias term of each kernel)

KernelShape Specifies the convolution kernel size.

For example, to convolve an image with a 3 (height) by 5 (width) two-dimensional kernel, specify “3,5”.

For example, to convolve a one-dimensional time series signal with a 7-tap filter, specify “7”.

WithBias Specifies whether to include the bias term (b).
BorderMode Specifies the type of convolution border.

valid: Convolution is performed within the border of the kernel shape for the input data size of each axis.

In this case, the size of each axis of the output data is equal to input size – kernel shape + 1.

full: Convolution is performed within the border even with a single sample for the input data of each axis.

Insufficient data (kernel shape – 1 to the top, bottom, left, and right) within the border are padded with zeros.

In this case, the size of each axis of the output data is equal to input size + kernel shape – 1.

same: Convolution is performed within a border that would make the input data size the same as the output data size.

Insufficient data (kernel shape/2 – 1 to the top, bottom, left, and right) within the border are padded with zeros.

Padding Specifies the size of zero padding to add to the ends of the arrays before the convolution process. For example, to insert 3 pixels vertically and 2 pixels horizontally, specify “3,2”.

* ConvolutionPaddingSize: A value calculated from BorderMode is used for Padding.

Strides Specifies the period of performing the convolution (after how many samples kernel convolution is performed)

The output size of an axis with Stride set to a value other than 1 will be downsampled by the specified value.

For example, to convolve every two samples in the X-axis direction and every three samples in the Y-axis direction, specify “3,2”.

Dilation Specifies the factor by which the kernel is to be dilated using a stride value in unit of kernel size. For example, to dilate a 3 (height) by 5 (width) two-dimensional kernel to three times the height and two times the width and perform convolution on a 7 by 9 area, specify “3,2”.
Multiplier Specify the magnification of the number of output images relative to the number of input images.
ParameterScope Specifies the name of the parameter used by this layer.

The parameter is shared between layers with the same ParameterScope.

W.File When using pre-trained weight W, specify the file containing W with an absolute path.

If a file is specified and weight W is to be loaded from a file, initialization with the initializer will be disabled.

W.Initializer Specifies the initialization method for weight W.

Uniform: Initialization is performed using uniform random numbers between -1.0 and 1.0.

UniformConvolutionGlorot: Initialization is performed by applying the multiplier recommended by Xavier Glorot to uniform random numbers.

Normal: Initialization is performed using Gaussian random numbers with 0.0 mean and 1.0 variance.

NormalConvolutionHeForward: Initialization is performed by applying the multiplier recommended by Kaiming He to Gaussian random numbers (Forward Case).

NormalConvolutionHeBackward: Initialization is performed by applying the multiplier recommended by Kaiming He to Gaussian random numbers (Backward Case).

NormalConvolutionGlorot: Initialization is performed by applying the multiplier recommended by Xavier Glorot to Gaussian random numbers (default).

Constant: All elements are initialized with a constant (1.0).

W.InitializerMultiplier Specifies the multiplier to apply to the values that the initializer generates.
W.LRateMultiplier Specify the multiplier to apply to the Learning Rate specified on the CONFIG tab. This multiplier is used to update weight W.

For example, if the Learning Rate specified on the CONFIG tab is 0.01 and W.LRateMultiplier is set to 2, weight W will be updated using a Learning Rate of 0.02.

b.* This is used to set the bias b. The properties are the same as those of W.

Notes

Currently, NNabla supports only two-dimensional convolution. If you need to perform one-dimensional convolution, use Reshape to convert the tensor into two dimensions.

 

Deconvolution

The deconvolution layer is used to deconvolve the input. The properties of deconvolution are the same as those of convolution.

Notes

Currently, NNabla supports only two-dimensional deconvolution. If you need to perform one-dimensional deconvolution, use Reshape to convert the tensor into two dimensions.

 

Embed

Inputs are assumed to be discrete symbols represented by integers ranging from 0 to N-1 (where N is the number of classes), and arrays of a specified size are assigned to each symbol. For example, this is used when the inputs are word indexes, and each word is converted into vectors in the beginning of a network. The output size is equal to input size × array size.

NumClass Specifies the number of classes N.
Shape Specifies the size of the array assigned to a single symbol.
ParameterScope Specifies the name of the parameter used by this layer.

The parameter is shared between layers with the same ParameterScope.

W.File When using pre-trained weight W, specify the file containing W with an absolute path.

If a file is specified and weight W is to be loaded from a file, initialization with the initializer will be disabled.

W.Initializer Specifies the initialization method for weight W.

Uniform: Initialization is performed using uniform random numbers between -1.0 and 1.0.

Normal: Initialization is performed using Gaussian random numbers with 0.0 mean and 1.0 variance.

Constant: All elements are initialized with a constant (1.0).

W.InitializerMultiplier Specifies the multiplier to apply to the values that the initializer generates.
W.LRateMultiplier Specifies the multiplier to apply to the Learning Rate specified on the CONFIG tab. This multiplier is used to update weight W.

For example, if the Learning Rate specified on the CONFIG tab is 0.01 and W.LRateMultiplier is set to 2, weight W will be updated using a Learning Rate of 0.02.

 

Pooling layer

 

MaxPooling

MaxPooling outputs the maximum value of local inputs.

KernelShape Specifies the size of the local region for sampling the maximum value.

For example, to output the maximum value over a 3 (height) by 5 (width) region, specify “3,5”.

Strides Specifies the period for sampling maximum values (after how many samples maximum values are determined)

The output size of each axis will be downsampled by the specified value.

For example, to sample maximum values every two samples in the X-axis direction and every three samples in the Y-axis direction, specify “3,2”.

* Use the same value as KernelShape for KernelShape:Strides.

IgnoreBorder Specifies the border processing method.

True: Processing is performed over regions that have enough samples to fill KernelShape. Samples near the border where there are not enough samples to fill KernelShape are ignored.

False: Samples near the border are not discarded. Processing is performed even over regions that only have one sample.

Padding Specifies the size of zero padding to add to the ends of the arrays before the pooling process.

For example, to add two pixels of zero padding to the top and bottom and one pixel of zero padding to the left and right of an image, specify “2,1”.

 

SumPooling

SumPooling outputs the sum of local inputs. The properties are the same as those of MaxPooling.

 

AveragePooling

AveragePooling outputs the average of local inputs. The properties are the same as those of MaxPooling.

 

Unpooling

Unpooling copies a single input to multiple inputs in order to generate data larger in size than the input data.

KernelShape Specify the size of data to copy.

For example, if you want to copy a data sample twice in the vertical direction and three times in the horizontal direction (output data whose size is twice as large in the vertical direction and three times as large in the horizontal direction), specify “2,3”.

 

Activation layer

 

Tanh

Tanh outputs the result of taking the hyperbolic tangent of the input.

o=tanh(i)

(where o is the output and i is the input)

 

Sigmoid

Sigmoid outputs the result of taking the sigmoid of the input. This is used when you want to obtain probabilities or output values ranging from 0.0 to 1.0.

o=sigmoid(i)

(where o is the output and i is the input)

 

ReLU

ReLU outputs the result of applying the rectified linear unit (ReLU) to the input.

o=max(0, i)

(where o is the output and i is the input)

 

CReLU

Concatenated ReLU (CReLU) applies Relu to each of the negated input signals, concatenates each result on the axis indicated by the Axis property, and outputs the final result.

 

LeakyReLU

Unlike ReLU, which always outputs 0 for inputs less than 0, LeakyReLU multiplies inputs less than 0 with a constant value to output results.

o=max(0, i) + a min(0, i)

(where o is the output and i is the input)

alpha Specify negative gradient a.

 

PReLU

Unlike ReLU, which always outputs 0 for inputs less than 0, parametric ReLU (PReLU) multiplies inputs less than 0 with a constant value to output results. The value of a, which is a gradient less than 0, is obtained from training.

o=max(0, i) + a min(0, i)

(where o is the output and i is the input)

BaseAxis Of the available inputs, specifies the index (that starts at 0) of the axis that individual a’s are to be trained on. For example, for inputs 4,3,5, to train individual a’s for the first dimension (four elements), set BaseAxis to 0.
ParameterScope Specifies the name of the parameter used by this layer.

The parameter is shared between layers with the same ParameterScope.

slope.File When using pre-trained gradient a, specifies the file containing a with an absolute path.

If a file is specified and weight slope is to be loaded from a file, initialization with the initializer will be disabled.

slope.Initializer Specifies the initialization method for gradient a.

Uniform: Initialization is performed using uniform random numbers between -1.0 and 1.0.

Normal: Initialization is performed using Gaussian random numbers with 0.0 mean and 1.0 variance.

Constant: All elements are initialized with a constant (1.0).

slope.InitializerMultiplier Specifies the multiplier to apply to the values that the initializer generates.
slope.LRateMultiplier Specifies the multiplier to apply to the Learning Rate specified on the CONFIG tab. This multiplier is used to update gradient a.

For example, if the Learning Rate specified on the CONFIG tab is 0.01 and slope.LRateMultiplier is set to 2, gradient a will be updated using a Learning Rate of 0.02.

 

ELU

ELU outputs the result of applying the exponential linear unit (ELU) to the input.

o=max(0, i) + alpha(exp(min(0, i)) – 1)

(where o is the output and i is the input)

Alpha Specify coefficient alpha for negative outputs.

 

CELU

Concatenated ELU (CELU) applies ELU to each of the negated input signals, concatenates each result on the axis indicated by the Axis property, and outputs the final result.

 

SELU

SELU outputs the result of applying the scaled exponential linear unit (SELU) to the input.

o=lambda {max(0, i) + alpha(exp(min(0, i)) – 1)}

(where o is the output and i is the input)

Scale Specify the whole scale lambda.
Alpha Specify coefficient alpha for negative outputs.

 

Swish

Swish outputs the result of taking the swish of the input.

o=i/(1+exp(-i))

(where o is the output and i is the input)

 

Abs

Abs outputs the absolute values of inputs.

o=abs(i)

(where o is the output and i is the input)

 

Softmax

Softmax outputs the Softmax of inputs. This is used when you want to obtain probabilities in a categorization problem or output values ranging from 0.0 to 1.0 that sum up to 1.0.

ox=exp(ix) / Σ_jexp(ij)

(where o is the output, i is the input, and x is the data index)

 

Loop Control layer

The Loop Control layer is useful for configuring networks with a loop structure, such as residual networks and recurrent neural networks.

 

RepeatStart

RepeatStart is a layer that indicates the start position of a loop. Layers placed between RepeatStart and RepeatEnd are created repeatedly for the number of times specified by the Times property of Repeat Start.

Notes

The array sizes of the layers between RepeatStart and RepeatEnd must be the same. Structures whose array sizes differ between repetitions cannot be written.

 

RepeatEnd

RepeatEnd is a layer that indicates the end position of a loop.

 

RecurrentInput

RecurrentInput is a layer that indicates the time loop start position of a recurrent neural network. The axis specified by the Axis property is handled as the time axis. The length of a time loop is defined by the number of elements specified by the Axis property of the input data.

 

RecurrentOutput

RecurrentOutput is a layer that indicates the time loop end position of a recurrent neural network.

 

Delay

Delay is a layer that indicates the time delay signal in a recurrent neural network.

Size Specifies the size of the time delay signal.
Initial.Dataset Specifies the name of the variable to be used as the initial value of the time delay signal.
Initial.Generator Specifies the generator to use in place of the dataset. If the Generator property is not set to None, the data that the generator generates is used in place of the variable specified by T.Dataset during optimization.

None: Data generation is not performed.

Uniform: Uniform random numbers between -1.0 and 1.0 are generated.

Normal: Gaussian random numbers with 0.0 mean and 1.0 variance are generated.

Constant: Data whose elements are all constant (1.0) is generated.

Initial.GeneratorMultiplier Specifies the multiplier to apply to the values that the generator generates.

 

Quantization layer

The quantization layer is used to quantize weights and data.

 

FixedPointQuantize

FixedPointQuantize performs linear quantization.

Sign Specify whether to include signs.

When set to false, all values after quantization will be positive.

N Specify the number of quantization bits.
Delta Specify the quantization step size.
STEFineGrained Specify the gradient calculation method for backward processing.

True: The gradient is always 1.

False: The gradient is 1 between the maximum and minimum values of the range that can be expressed through quantization; otherwise it is 0.

 

Pow2Quantize

Pow2Quantize performs power-of-two quantization.

Sign Specify whether to include signs.

When set to false, all values after quantization will be positive.

WithZero Specify whether to include zeros.

When set to false, values after quantization will not include zeros.

N Specify the number of quantization bits.
M Specify the maximum value after quantization as 2^M.
STEFineGrained Specify the gradient calculation method for backward processing.

True: The gradient is always 1.

False: The gradient is 1 between the maximum and minimum values of the range that can be expressed through quantization; otherwise it is 0.

 

BinaryConnectAffine

BinaryConnectAffine is an Affine layer that uses W, which has been converted into the binary values of -1 and +1.

o = sign(W)i+b

(where i is the input, o is the output, W is the weight, and b is the bias term.)

Wb.* Specifies the weight Wb settings to use after the conversion into binary values. The properties are the same as those of W for the Affine layer.

For details on other properties, see the Affine layer.

 

BinaryConnectConvolution

BinaryConnectConvolution is a convolution layer that uses W, which has been converted into the binary values of -1 and +1.

Ox,y,m = Σ_i,j,n sign(Wi,j,n,m) Ix+i,y+j,n + bm (two-dimensional convolution)

(where O is the output; I is the input; i,j is the kernel size; x,y,n is the input index; m is the output map (OutMaps property), W is the kernel weight, and b is the bias term of each kernel)

Wb.* Specifies the weight Wb settings to use after the conversion into binary values. The properties are the same as those of W for the Convolution layer.

For details on other properties, see the Convolution layer.

 

BinaryWeightAffine

BinaryWeightAffine is an affine layer that uses W, which has been converted into binary values of -1 and +1, and then scaled in order to make the output closer to the normal Affine layer.

o = a sign(W)i +b

(where i is the input, o is the output, W is the weight, a is the scale, and b is the bias term.)

 

BinaryWeightConvolution

BinaryWeightConvolution is an affine layer that uses W, which has been converted into binary values of -1 and +1, and then scaled to make the output closer to the normal Convolution layer.

Ox,y,m = Σ_i,j,n sign(Wi,j,n,m) Ix+i,y+j,n * a+ bm (two-dimensional convolution)

(where O is the output; I is the input; i,j is the kernel size; x,y,n is the input index; m is the output map (OutMaps property), W is the kernel weight, a is the scale, and b is the bias term of each kernel)

 

BinaryTanh

Binarytanh outputs -1 for inputs less than or equal to 0 and +1 for inputs great than 0.

 

BinarySigmoid

BinarySigmoid outputs 0 for inputs less than or equal to 0 and +1 for inputs great than 0.

 

Unit layer

The unit layer provides a function for inserting other networks in the middle of a network. By using the unit layer, you can insert a collection of layers defined as a network in the middle of another network.

A network that another network is inserted into using the unit layer is called a caller network, and the network to be inserted into another network using the unit layer is called a unit network.

 

Unit

Inserts a unit network into the current caller network.

Network Specify the name of the unit network to be inserted.
ParameterScope Specify the name of the parameter used by this unit.

The parameter is shared between units with the same ParameterScope.

(Other properties) Specify the properties of the unit network to be inserted.

 

Argument

Set the parameters of the unit network to allow editing as unit properties from the caller network. You can specify the name of an argument layer for the other layer properties in the unit network to use the argument layer values from the other layers in the unit network. The implemented argument layer values can be specified through the unit properties implemented in the caller network.

Value Specify the default parameter value.
Type Specify the parameter type.

Boolean: True or False

Int: Integer

IntArray: Array of integers

PInt: Integer greater than equal to 1

PIntArray: Array of integers greater than equal to 1

PIntArrays: Array of array of integers greater than equal to 1

UInt: Unsigned integer

UIntArray: Array of unsigned integers

Float: Floating-point number

FloatArray: Array of floating-point numbers

FloatArrays: Array of array of floating-point numbers

Text: Character string

File: File name

Search Specify for the caller network whether to include the parameter in the automatic structure search.

 

Arithmetic operation layer

 

Sum

Sum determines the sum of the values of the specified dimension.

Axis Specifies the index (starting at 0) of the axis to sum the values of.
KeepDims Specifies whether to hold the axis whose values have been summed.

 

Mean

Mean determines the average of the values of the specified dimension.

Axis Specifies the index (starting at 0) of the axis whose values will be averaged
KeepDims Specifies whether to hold the axis whose values have been averaged.

 

Prod

Prod determines the product of the values of the specified dimension.

Axis Specifies the index (starting at 0) of the axis to multiply the values of.
KeepDims Specifies whether to hold the axis whose values have been multiplied.

 

Max

Max determines the maximum of the values of the specified dimension.

Axis Specifies the index (starting at 0) of the axis on which to determine the maximum value.
KeepDims Specifies whether to hold the axis whose maximum value has been determined.

 

Min

Min determines the minimum of the values of the specified dimension.

Axis Specifies the index (starting at 0) of the axis on which to determine the minimum value.
KeepDims Specifies whether to hold the axis whose minimum value has been determined.

 

Log

Log calculates the natural logarithm with base e.

 

Exp

Exp calculates the exponential function with base e.

 

Sign

Sign outputs -1 for negative inputs, +1 for positive inputs, and alpha for 0.

Alpha Specifies the output corresponding to input 0.

Reference

When compared to BinaryTanh, the operation is similar to forward computation (except for the behavior when the input is 0), but the operation of backward computation is completely different. Unlike BinaryTanh, which sets the derivative to 0 when the absolute value of the input data is 1 or greater, Sign passes through the back propagation derivative as its own derivative.

 

Scalar arithmetic operation layer (Arithmetic (Scalar))

Various arithmetic operations can be performed on each element using a real number specified by the Value property.

Layer name Expression
AddScalar o=i+value
MulScalar o=i*value
RSubScalar o=value-i
RDivScalar o=value/i
PowScalar o=i value
RPowScalar o=value i
MaximumScalar o=max (i,value)
MinimumScalar o=min(i,value)

(where i is the input, o is the output, and value is the real number)

 

Two-input arithmetic operation layer (Arithmetic (2 Inputs))

Various arithmetic operations can be performed on each element using two inputs.

Layer Expression
Add2 o=i1+i2
Sub2 o=i1-i2
Mul2 o=i1*i2
Div2 o=i1/i2

Connect the input for i2 in the right hand side to connector R.

Pow2 o=i1i2

Connect the input for i2 in the right hand side to connector R.

Maxmum2 o=max(i1,i2)
Minimum2 o=min(i1,i2)

(where i1 and i2 are inputs and o is the output)

 

Logical operation layer

Various logical operations can be performed on each element using two inputs or one input and a value specified by the Value property. The logical operation output is 0 or 1.

Layer name Process
LogicalAnd o= i1 and i2
LogicalOr o= i1 or i2
LogicalXor o= i1 xor i2
Equal o= i1 == i2
NotEqual o= i1 != i2
GreaterEqual o= i1 >= i2
Greater o= i1 > i2
LessEqual o= i1 <= i2
Less o= i1 < i2
LogicalAndScalar o= i and value
LogicalOrScalar o= i or value
LogicalXorScalar o= i xor value
EqualScalar o= i == value
NotEqualScalar o= i != value
GreaterEqualScalar o= i >= value
GreaterScalar o= i > value
LessEqualScalar o= i <= value
LessScalar o= i < value
LogicalNot o= !i

Notes

A logical operation layer does not support back propagation.

 

Evaluation layer

 

BinaryError

The input data and variable T in the dataset indicating correct values are converted into binary values (0 or 1) depending on whether the values are greater than or equal to 0.5. Then, each unmatched data sample is evaluated. If the input data match the correct binary value, 0 is output. Otherwise, 1 is output.

T.Dataset Specifies the name of the variable expected to be the output of this layer.
T.Generator Specifies the generator to use in place of the dataset. If the Generator property is not set to None, the data that the generator generates is used in place of the variable specified by T.Dataset during optimization.

None: Data generation is not performed.

Uniform: Uniform random numbers between -1.0 and 1.0 are generated.

Normal: Gaussian random numbers with 0.0 mean and 1.0 variance are generated.

Constant: Data whose elements are all constant (1.0) is generated.

T.GeneratorMultiplier Specifies the multiplier to apply to the values that the generator generates.

 

TopNError

Based on the input data indicating the probability or score of each category and variable T in the dataset indicating the category index, each data sample is evaluated as to whether the probability or score of the correct category is within the top N of all categories. If the probability or score of the correct category is within the top N, 0 is output. Otherwise, 1 is output.

Axis Specifies the index of the axis indicating the category.
N Specifies the lowest ranking N that will be considered correct.

For example, if you want to allow only the maximum probability or score of the correct category to be considered correct, specify 1. If you want to allow only the top five probabilities or scores of the correct category to be considered correct, specify 5.

T.Dataset Specifies the name of the variable expected to be the output of this layer.
T.Generator Specifies the generator to use in place of the dataset. If the Generator property is not set to None, the data that the generator generates is used in place of the variable specified by T.Dataset during optimization.

None: Data generation is not performed.

Uniform: Uniform random numbers between -1.0 and 1.0 are generated.

Normal: Gaussian random numbers with 0.0 mean and 1.0 variance are generated.

Constant: Data whose elements are all constant (1.0) is generated.

T.GeneratorMultiplier Specifies the multiplier to apply to the values that the generator generates.

 

Other layers

 

BatchNormalization

The input is normalized to 0 mean and 1 variance. Inserting this layer after Convolution or Affine has the effect of improving accuracy and accelerating convergence.

ox=(ix-meani) *gammax/sigmax +betax

(where o is the output, i is the input, and x is the data index)

Axes Of the available inputs, specifies the index (that starts at 0) of the axis to be normalized individually. For example, for inputs 4,3,5, to individually normalize the first dimension (four elements), set Axis to 0. To individually normalize the second and third dimensions (elements 3,5), set Axes to 1,2.
DecayRate Specifies the decay rate (0.0 to 1.0) to apply when updating the mean and standard deviation of the input data during training. The closer the value is to 1.0, the more the mean and standard deviation determined from past data are retained.
Epsilon Specifies the value to add to the denominator (standard deviation of the input data) to prevent division by zero during normalization.
BatchStat Specifies whether to use the average variance calculated for each mini-batch in batch normalization.

True: The average variance calculated for each mini-batch is used.

ParameterScope Specifies the name of the parameter used by this layer.

The parameter is shared between layers with the same ParameterScope.

beta.File When using pre-trained beta, specifies the file containing beta with an absolute path.

If a beta is specified and is to be loaded from a file, initialization with the initializer will be disabled.

beta.Initializer Specifies the initialization method for beta.

Uniform: Initialization is performed using uniform random numbers between -1.0 and 1.0.

Normal: Initialization is performed using Gaussian random numbers with 0.0 mean and 1.0 variance.

Constant: All elements are initialized with a constant (1.0).

beta.InitializerMultiplier Specifies the multiplier to apply to the values that the initializer generates.
beta.LRateMultiplier Specifies the multiplier to apply to the Learning Rate specified on the CONFIG tab. This multiplier is used to update weight W.

For example, if the Learning Rate specified on the CONFIG tab is 0.01 and W.LRateMultiplier is set to 2, weight W will be updated using a Learning Rate of 0.02.

gamma.* Specifies the standard deviation after normalization.
mean.* Specifies the mean of the input data.
var.* Specifies the standard deviation of the input data.

 

Dropout

Dropout sets input elements to 0 with a given probability.

P Specifies the probability to set an element to 0, within the range from 0.0 to 1.0.

 

Concatenate

Concatenate joins two or more arrays on an existing axis.

Axis Specifies the axis on which to concatenate arrays.

Axis indexes take on values 0, 1, 2, and so on from the left.

For example, to concatenate two inputs “3,28,28” and “5,28,28” on the first (the leftmost) axis, specify “0”. In this case, the output size will be “8,28,28”.

 

Reshape

Reshape transforms the shape of an array into the specified shape.

OutShape Specifies the shape of the array after the transform.

For example, to output an array “2,5,5” as “10,5”, specify “10,5”.

The number of elements in the input and output arrays must be the same.

 

Broadcast

Broadcast transforms the shape of an array into the specified shape by copying the elements of axes whose element number is 1.

OutShape Specify the shape of the array after the transform.

For example, to copy elements of the second axis of an array “3,1,2” 10 times specify “3,10,2”.

 

Flip

Flip reverses the order of elements of the specified dimension of an array.

Axes Specifies the index of the dimension you want to reverse the order of the elements.

Axis indexes take on values 0, 1, 2, and so on from the left.

For example, to flip a 32 (W) by 24 (H) RGB image “3,24,32” vertically and horizontally, specify “1,2”.

 

Shift

Shift shifts the array elements by the specified amount.

Shift Specifies the amount to shift elements.

For example, to shift image data to the right by 2 pixels and up 3 pixels, specify “-3,2”.

BorderMode Specifies how to process the ends of arrays whose values will be undetermined as a result of shifting.

nearest: The data at the borders (beginning and end) of the original array is copied and used.

reflect: Original data at the borders of the original array is reflected (reversed) and used.

 

Transpose

Transpose swaps data dimensions.

Axes Specifies the axis indexes on the output data for each dimension of the input data.

Axis indexes take on values 0, 1, 2, and so on from the left.

For example, to swap the second and third dimensions of input data “3,20,10” and output the result, specify “0,2,1” (the output data size in this case is 3,10,20).

 

Slice

Slice extracts part of the array.

Start Specify the start point of extraction
Stop Specify the end point of extraction
Step Specify the interval of extraction

For example, to extract the center 24 x 32 pixels of an image data of size “3,48,64” at two pixel intervals, specify “0,12,16” for Start, “3,36,48” for Stop, and “1,2,2” for Step

 

Stack

Stack joins two or more arrays on a new axis. The sizes of all the arrays to be stacked must be the same. Unlike Concatenate, which joins arrays on an existing axis, Stack joins arrays on a new axis.

Axis Specifies the axis on which to concatenate arrays.

Axis indexes take on values 0, 1, 2, and so on from the left.

For example, to stack four “3,28,28” inputs on the second axis, specify “1”. In this case, the output size will be “3,4,28,28”.

 

MatrixDiag

MatrixDiag performs a matrix diagonalization of the last one dimension of an array.

 

MatrixDiagPart

MatrixDiagPart extracts the diagonal component of the last two dimensions of an array.

 

VATNoise

This layer is provided to achieve a method called Virtual Adversarial Training.

This layer normalizes and outputs input noise signals during forward calculation. During backward calculation, the error propagated from the output is stored in Buf.

 

Unlink

The input is output directly during forward calculation, and zero is output during backward calculation. This is used when you do not want errors to propagate to layers before the unlink layer.

 

Identity

Identity outputs the input as-is. There is no need to use this layer normally, but it can be inserted to assign a name for identification in certain locations of the network.

 

Other layers (preprocess)

 

OneHot

OneHot creates one-hot array based on input indices.

Shape Specify the size of the array to be created

The number of dimensions of the Shape must be the same as the number of elements in the last dimension of the input data

 

MeanSubtraction

MeanSubtraction normalizes input to mean 0. Using this as a preprocessing function has the effect of improving accuracy in image classification and similar tasks.

ox=ix-mean

(where o is the output, i is the input, and x is the data index)

BaseAxis Specifies the index of the first axis to take the mean of.
UpdateRunningMean Specifies whether to calculate a running mean.
ParameterScope Specifies the name of the parameter used by this layer.

The parameter is shared between layers with the same ParameterScope.

mean.* Specifies the mean of the input data.
t.* Specifies the number of mini-batches that was used to calculate the mean of the input data.

 

RandomFlip

RandomFlip reverses the order of elements of the specified dimension of an array at 50% probability.

Axes Specifies the index of the axis you want to reverse the order of the elements.

Axis indexes take on values 0, 1, 2, and so on from the left.

For example, to flip a 32 (W) by 24 (H) RGB image “3,24,32” vertically and horizontally at random, specify “1,2”.

SkipAtInspection Specifies whether to skip processing at inspection.

To execute RandomFlip only during training, set SkipAtInspection to True (default).

 

RandomShift

RandomShift randomly shifts the array elements within the specified range.

Shift Specifies the amount to shift elements by.

For example, to shift image data horizontally by ±2 pixels and vertically by ±3 pixels, specify “3,2”.

BorderMode Specifies how to process the ends of arrays whose values will be undetermined as a result of shifting.

nearest: The data at the borders (beginning and end) of the original array is copied and used.

reflect: Original data at the borders of the original array is reflected (reversed) and used.

SkipAtInspection Specifies whether to skip processing at inspection.

To execute RandomFlip only during training, set SkipAtInspection to True (default).

 

RandomCrop

RandomCrop randomly extracts a portion of an array.

Shape Specifies the data size to extract.

For example, to randomly extract a portion of the image (3,48,48) from a 3,64,64 image, specify “3,48,48”.

 

ImageAugmentation

ImageAugmentation randomly alters the input image.

Shape Specifies the output image data size.
MinScale Specifies the minimum scale ratio when randomly scaling the image.

For example, to scale down to 0.8 times the size of the original image, specify “0.8”.

To not apply random scaling, set both MinScale and MaxScale to “1.0”.

MaxScale Specifies the maximum scale ratio when randomly scaling the image.

For example, to scale up to 2 times the size of the original image, specify “2.0”.

Angle Specifies the rotation angle range in radians when randomly rotating the image.

The image is randomly rotated in the -Angle to +Angle range.

For example, to rotate in a ±15 degree range, specify “0.26” (15 degrees/360 degrees × 2π).

To not apply random rotation, specify “0.0”.

AspectRatio Specifies the aspect ratio variation range when randomly varying the aspect ratio of the image.

For example, if the original image is 1:1, to vary the aspect ratio between 1:1.3 and 1.3:1, specify 1.3.

Distortion Specifies the strength range when randomly distorting the image.
FlipLR Specifies whether to randomly flip the image horizontally.
FlipUD Specifies whether to randomly flip the image vertically.
Brightness Specifies the range of values to randomly add to the brightness.

A random value in the -Brightness to +Brightness range is added to the brightness.

For example, to vary the brightness in the -0.05 to +0.05 range, specify “0.05”.

To not apply random addition to brightness, specify “0.0”.

BrightnessEach Specifies whether to apply the random addition to brightness (as specified by Brightness) to each color channel.

True: Brightness is added based on a different random number for each channel.

False: Brightness is added based on a random number common to all channels.

Contrast Specifies the range in which to randomly very the image contrast.

The contrast is varied in the 1/Contrast times to Contrast times range.

The output brightness is equal to (input ? 0.5) * contrast + 0.5.

For example, to vary the contrast in the 0.91 times to 1.1 times range, specify “1.1”.

To not apply random contrast variation, specify “1.0”.

ContrastEach Specifies whether to apply the random contrast variation (as specified by Contrast) to each color channel.

True: Contrast is varied based on a different random number for each channel.

False: Contrast is varied based on a random number common to all channels.

Reference

An effective and easy image augmentation verification method is available. In this method, a network consisting only of image input → image augmentation → squared error is used for training with maximum epoch set to 0 and the processed result is monitored with the Run Evaluation button.

 

Configuration layer

This is a layer for configuring the network.

 

StructureSearch

StructureSearch is used to configure the automatic structure search.

Search Specify whether to include this network in the automatic structure search.

When set to false, the network will not change during automatic structure search.

The default value when the StructureSearch is not implemented is true.