support.dl.sony.com > Docs > Layer reference

Layer reference

This chapter describes the layers and their properties that can be used with the Neural Network Libraries.

> IO
- – Input
> Loss
> Parameter
- – Parameter
- – WorkingMemory
> Basic
> Pooling
> Activation
- – Tanh
- – Sigmoid
- – Abs
- – ReLU
- – CReLU
- – LeakyReLU
- – PReLU
- – ELU
- – CELU
- – SELU
- – Swish
- – Softmax
> LoopControl
> Quantization
> Unit
> Math
- – Sum
- – Mean
- – Prod
- – Max
- – Min
- – Log
- – Exp
- – Sign
> Arithmetic (Scalar)
> Arithmetic (2 Inputs)
- – Add2
- – Sub2
- – Mul2
- – Div2
- – Pow2
- – Maximum2
- – Minimum2
> Logical
> Validation
- – BinaryError
- – TopNError
> Others
> Others(Pre Process)
> Configulation
- – StructureSearch

Color of each layer

IO

Input/output data.
Basic

Perform computatoins on artificial neurons.
(Binary layer required for binary neural networks)
Activation

Apply non-linear conversion to input data for activation.
(Binary layer required for binary neural networks)
Pooling

Perform pooling.
Parameter

Parameters to be optimized.
LoopControl

Control repetitive processes.
Unit, Math, Loss, Others, Others (Pre Process)

For various purposes, from arithmetic operations on tensor elements to preprocessing data.
Arithmetic (Scalar, 2 Inputs), Logical, Validation

For arithmetic/logical opertaions, precision computation, etc.

Common layer properties

Common layer properties include the layer name and the properties that are automatically calculated based on the inter-layer link status and layer-specific properties.

Name	This indicates the layer name. Each layer name must be unique in the graph.
Input	This indicates the layer’s input size.
Output	This indicates the layer’s output size.
CostParameter	This indicates the number of parameters that the layer contains.
CostAdd	This indicates the number of multiplications required in forward calculation and the number of additions that cannot be performed during the same time.
CostMultiply	This indicates the number of additions required in forward calculation and the number of multiplications that cannot be performed during the same time.
CostMultiplyAdd	This indicates the number of multiplications and additions required in forward calculation (the number of multiplications and the number of additions that can be performed during the same time).
CostDivision	This indicates the number of divisions required in forward calculation.
CostExp	This indicates the number of exponentiation required in forward calculation.
CostIf	This indicates the number of conditional decisions required in forward calculation.

Input and output layers

Input

This is the neural network input layer.

Size	Specifies the input size. For image data, the size is specified in the “the number of colors,height,width” format. For example, for a RGB color image whose width is 32 and height is 24, specify “3,24,32”. For a monochrome image whose width is 64 and height is 48, specify “1,48,64”. For CSV data, the size is specified in the “the number of rows, the number of columns” format. For example, for CSV data consisting of 16 rows and 1 column, specify “16,1”. For CSV data consisting of 12 rows and 3 columns, specify “12,3”.
Dataset	Specifies the name of the variable to input into this Input layer.
Generator	Specifies the generator to use in this input layer. If the Generator property is not set to None, the data that the generator generates is used in place of the variable specified by Dataset during optimization. None: Data generation is not performed. Uniform: Uniform random numbers between -1.0 and 1.0 are generated. Normal: Gaussian random numbers with 0.0 mean and 1.0 variance are generated. Constant: Data whose elements are all constant (1.0) is generated.
GeneratorMultiplier	Specifies the multiplier to apply to the values that the generator generates.

Loss layer

SquaredError

This is the output layer of a neural network that minimizes the squared errors between the variables and dataset variables. It is used when solving regression problems with neural networks (when optimizing neural networks that output continuous values).

T.Dataset

Specifies the name of the variable expected to be the output of this SquaredError layer.

T.Generator

Specifies the generator to use in place of the dataset. If the Generator property is not set to None, the data that the generator generates is used in place of the variable specified by T.Dataset during optimization.

None: Data generation is not performed.

Uniform: Uniform random numbers between -1.0 and 1.0 are generated.

Normal: Gaussian random numbers with 0.0 mean and 1.0 variance are generated.

Constant: Data whose elements are all constant (1.0) is generated.

T.GeneratorMultiplier

Specifies the multiplier to apply to the values that the generator generates.

HuberLoss

This is the output layer of a neural network that minimizes the Huber loss between the variables and dataset variables. Like Squared Error, this is used when solving regression problems with neural networks. Using this in place of Squared Error has the effect of stabilizing the training process.

$$y_i= \begin{cases}
d^2 & (|d| < \delta) \
\delta (2 |d| – \delta) & ({\rm otherwise})
\end{cases}$$

where
$$d = x^{(0)}_i – x^{(1)}_i$$

Delta

Specify δ, which is used as a threshold for increasing the loss linearly.

Other properties are the same as those of SquaredError.

AbsoluteError

This is the output layer of a neural network that minimizes absolute error between the variables and dataset variables. Like Squared Error, this is used when solving regression problems with neural networks. The properties are the same as those of SquaredError.

EpsilonInsensitiveLoss

This is the output layer of a neural network that minimizes absolute error exceeding the range specified by Epsilon between the variables and dataset variables. Like Squared Error, this is used when solving regression problems with neural networks.

$$y_i= \begin{cases}
0 & (|d| < \epsilon) \
|d| – \epsilon & ({\rm otherwise})
\end{cases}$$

where
$$d = x^{(0)}_i – x^{(1)}_i$$

Epsilon

Specify ε

Other properties are the same as those of SquaredError.

BinaryCrossEntropy

This is the output layer of a neural network that minimizes the mutual information between the variable and dataset variables. It is used to solve binary classification problems (0 or 1). The input to BinaryCrossEntropy must be between 0.0 and 1.0 (probability), and the dataset variable must be 0 or 1. All the properties are the same as those of SquaredError.

SigmoidCrossEntropy

This is the output layer of a neural network that minimizes the mutual information between the variable and dataset variables. SigmoidCrossEntropy is equivalent to Sigmoid+BinaryCrossEntropy during training, but computing them at once has the effect of reducing computational error. All the properties are the same as those of SquaredError.

Reference

When SigmoidCrossEntropy is used instead of Sigmoid+BinaryCrossEntropy, continuous values without undergoing Sigmoid processing will be output for the evaluation results.

CategoricalCrossEntropy

This is the output layer of a neural network that minimizes the mutual information between the variables and the variables of a dataset given by a category index. All the properties are the same as those of SquaredError.

SoftmaxCrossEntropy

This is the output layer of a neural network that minimizes the mutual information between the variables and the variables of a dataset given by a category index. SoftmaxCrossEntropy is equivalent to Softmax+CategoricalCrossEntropy during training, but computing them at once has the effect of reducing computational error. All the properties are the same as those of SquaredError.

Reference

When SoftmaxCrossEntropy is used instead of Softmax+CategoricalCrossEntropy, continuous values without undergoing Softmax processing will be output for the evaluation results.

KLMultinomial

This is the output layer of a neural network that minimizes the Kullback Leibler distance between the probability distribution (p), which is a polynomial distribution input, and the dataset variable (q). All the properties are the same as those of SquaredError.

Basic layers

Affine

The affine layer is a fully-connected layer that has connections from all inputs to all output neurons specified with the OutShape property.

o = Wi+b

(where i is the input, o is the output, W is the weight, and b is the bias term.)

OutShape	Specifies the number of output neurons of the Affine layer.
WithBias	Specifies whether to include a bias term (b).
ParameterScope	Specifies the name of the parameter used by this layer. The parameter is shared between layers with the same ParameterScope.
W.File	When using pre-trained weight W, specifies the file containing W with an absolute path. If a file is specified and weight W is to be loaded from a file, initialization with the initializer will be disabled.
W.Initializer	Specifies the initialization method for weight W. Uniform: Initialization is performed using uniform random numbers between -1.0 and 1.0. UniformAffineGlorot: Initialization is performed by applying the multiplier recommended by Xavier Glorot to uniform random numbers. Normal: Initialization is performed using Gaussian random numbers with 0.0 mean and 1.0 variance. NormalAffineHeForward: Initialization is performed by applying the multiplier recommended by Kaiming He to Gaussian random numbers (Forward Case). NormalAffineHeBackward: Initialization is performed by applying the multiplier recommended by Kaiming He to Gaussian random numbers (Backward Case). NormalAffineGlorot: Initialization is performed by applying the multiplier recommended by Xavier Glorot to Gaussian random numbers (default). Constant: All elements are initialized with a constant (1.0).
W.InitializerMultiplier	Specifies the multiplier to apply to the values that the initializer generates.
W.LRateMultiplier	Specifies the multiplier to apply to the Learning Rate specified on the CONFIG tab. This multiplier is used to update weight W. For example, if the Learning Rate specified on the CONFIG tab is 0.01 and W.LRateMultiplier is set to 2, weight W will be updated using a Learning Rate of 0.02.
b.*	This is used to set the bias b. The properties are the same as those of W.

Convolution

The convolution layer is used to convolve the input.

O_x,m = Σ_{_i,n} W_i,n,m I_x+i,n + b_m (one-dimensional convolution)

O_x,y,m = Σ_{_i,j,n} W_i,j,n,m I_x+i,y+j,n + b_m (two-dimensional convolution)

(where O is the output; I is the input; i,j is the kernel size; x,y,n is the input index; m is the output map (OutMaps property), W is the kernel weight, and b is the bias term of each kernel)

KernelShape	Specifies the convolution kernel size. For example, to convolve an image with a 3 (height) by 5 (width) two-dimensional kernel, specify “3,5”. For example, to convolve a one-dimensional time series signal with a 7-tap filter, specify “7”.
WithBias	Specifies whether to include the bias term (b).
OutMaps	Specifies the number of convolution kernels (which is equal to the number of output data samples). For example, to convolve an input with 16 types of filters, specify “16”.
BorderMode	Specifies the type of convolution border. valid: Convolution is performed within the border of the kernel shape for the input data size of each axis. In this case, the size of each axis of the output data is equal to input size – kernel shape + 1. full: Convolution is performed within the border even with a single sample for the input data of each axis. Insufficient data (kernel shape – 1 to the top, bottom, left, and right) within the border are padded with zeros. In this case, the size of each axis of the output data is equal to input size + kernel shape – 1. same: Convolution is performed within a border that would make the input data size the same as the output data size. Insufficient data (kernel shape/2 – 1 to the top, bottom, left, and right) within the border are padded with zeros.
Padding	Specifies the size of zero padding to add to the ends of the arrays before the convolution process. For example, to insert 3 pixels vertically and 2 pixels horizontally, specify “3,2”. * ConvolutionPaddingSize: A value calculated from BorderMode is used for Padding.
Strides	Specifies the period of performing the convolution (after how many samples kernel convolution is performed) The output size of an axis with Stride set to a value other than 1 will be downsampled by the specified value. For example, to convolve every two samples in the X-axis direction and every three samples in the Y-axis direction, specify “3,2”.
Dilation	Specifies the factor by which the kernel is to be dilated using a stride value in unit of kernel size. For example, to dilate a 3 (height) by 5 (width) two-dimensional kernel to three times the height and two times the width and perform convolution on a 7 by 9 area, specify “3,2”.
Group	Specifies the unit for grouping OutMaps.
ParameterScope	Specifies the name of the parameter used by this layer. The parameter is shared between layers with the same ParameterScope.
W.File	When using pre-trained weight W, specify the file containing W with an absolute path. If a file is specified and weight W is to be loaded from a file, initialization with the initializer will be disabled.
W.Initializer	Specifies the initialization method for weight W. Uniform: Initialization is performed using uniform random numbers between -1.0 and 1.0. UniformConvolutionGlorot: Initialization is performed by applying the multiplier recommended by Xavier Glorot to uniform random numbers. Normal: Initialization is performed using Gaussian random numbers with 0.0 mean and 1.0 variance. NormalConvolutionHeForward: Initialization is performed by applying the multiplier recommended by Kaiming He to Gaussian random numbers (Forward Case). NormalConvolutionHeBackward: Initialization is performed by applying the multiplier recommended by Kaiming He to Gaussian random numbers (Backward Case). NormalConvolutionGlorot: Initialization is performed by applying the multiplier recommended by Xavier Glorot to Gaussian random numbers (default). Constant: All elements are initialized with a constant (1.0).
W.InitializerMultiplier	Specifies the multiplier to apply to the values that the initializer generates.
W.LRateMultiplier	Specify the multiplier to apply to the Learning Rate specified on the CONFIG tab. This multiplier is used to update weight W. For example, if the Learning Rate specified on the CONFIG tab is 0.01 and W.LRateMultiplier is set to 2, weight W will be updated using a Learning Rate of 0.02.
b.*	This is used to set the bias b. The properties are the same as those of W.

Notes

Currently, NNabla supports only two-dimensional convolution. If you need to perform one-dimensional convolution, use Reshape to convert the tensor into two dimensions.

DepthwiseConvolution

The convolution layer is used to convolve the input for each map. The operation of DepthwiseConvolution is equivalent to setting Group of a convolution whose number of input and output maps is the same to OutMaps.

O_x,y,n = Σ_{_i,j} W_i,j I_x+i,y+j,n + b_n (two-dimensional convolution, when Multiplier=1)

(where O is the output; I is the input; i,j is the kernel size; x,y,n is the input index; W is the kernel weight, and b is the bias term of each kernel)

KernelShape	Specifies the convolution kernel size. For example, to convolve an image with a 3 (height) by 5 (width) two-dimensional kernel, specify “3,5”. For example, to convolve a one-dimensional time series signal with a 7-tap filter, specify “7”.
WithBias	Specifies whether to include the bias term (b).
BorderMode	Specifies the type of convolution border. valid: Convolution is performed within the border of the kernel shape for the input data size of each axis. In this case, the size of each axis of the output data is equal to input size – kernel shape + 1. full: Convolution is performed within the border even with a single sample for the input data of each axis. Insufficient data (kernel shape – 1 to the top, bottom, left, and right) within the border are padded with zeros. In this case, the size of each axis of the output data is equal to input size + kernel shape – 1. same: Convolution is performed within a border that would make the input data size the same as the output data size. Insufficient data (kernel shape/2 – 1 to the top, bottom, left, and right) within the border are padded with zeros.
Padding	Specifies the size of zero padding to add to the ends of the arrays before the convolution process. For example, to insert 3 pixels vertically and 2 pixels horizontally, specify “3,2”. * ConvolutionPaddingSize: A value calculated from BorderMode is used for Padding.
Strides	Specifies the period of performing the convolution (after how many samples kernel convolution is performed) The output size of an axis with Stride set to a value other than 1 will be downsampled by the specified value. For example, to convolve every two samples in the X-axis direction and every three samples in the Y-axis direction, specify “3,2”.
Dilation	Specifies the factor by which the kernel is to be dilated using a stride value in unit of kernel size. For example, to dilate a 3 (height) by 5 (width) two-dimensional kernel to three times the height and two times the width and perform convolution on a 7 by 9 area, specify “3,2”.
Multiplier	Specify the magnification of the number of output images relative to the number of input images.
ParameterScope	Specifies the name of the parameter used by this layer. The parameter is shared between layers with the same ParameterScope.
W.File	When using pre-trained weight W, specify the file containing W with an absolute path. If a file is specified and weight W is to be loaded from a file, initialization with the initializer will be disabled.
W.Initializer	Specifies the initialization method for weight W. Uniform: Initialization is performed using uniform random numbers between -1.0 and 1.0. UniformConvolutionGlorot: Initialization is performed by applying the multiplier recommended by Xavier Glorot to uniform random numbers. Normal: Initialization is performed using Gaussian random numbers with 0.0 mean and 1.0 variance. NormalConvolutionHeForward: Initialization is performed by applying the multiplier recommended by Kaiming He to Gaussian random numbers (Forward Case). NormalConvolutionHeBackward: Initialization is performed by applying the multiplier recommended by Kaiming He to Gaussian random numbers (Backward Case). NormalConvolutionGlorot: Initialization is performed by applying the multiplier recommended by Xavier Glorot to Gaussian random numbers (default). Constant: All elements are initialized with a constant (1.0).
W.InitializerMultiplier	Specifies the multiplier to apply to the values that the initializer generates.
W.LRateMultiplier	Specify the multiplier to apply to the Learning Rate specified on the CONFIG tab. This multiplier is used to update weight W. For example, if the Learning Rate specified on the CONFIG tab is 0.01 and W.LRateMultiplier is set to 2, weight W will be updated using a Learning Rate of 0.02.
b.*	This is used to set the bias b. The properties are the same as those of W.

Notes

Currently, NNabla supports only two-dimensional convolution. If you need to perform one-dimensional convolution, use Reshape to convert the tensor into two dimensions.

Deconvolution

The deconvolution layer is used to deconvolve the input. The properties of deconvolution are the same as those of convolution.

Notes

Currently, NNabla supports only two-dimensional deconvolution. If you need to perform one-dimensional deconvolution, use Reshape to convert the tensor into two dimensions.

Embed

Inputs are assumed to be discrete symbols represented by integers ranging from 0 to N-1 (where N is the number of classes), and arrays of a specified size are assigned to each symbol. For example, this is used when the inputs are word indexes, and each word is converted into vectors in the beginning of a network. The output size is equal to input size × array size.

NumClass	Specifies the number of classes N.
Shape	Specifies the size of the array assigned to a single symbol.
ParameterScope	Specifies the name of the parameter used by this layer. The parameter is shared between layers with the same ParameterScope.
W.File	When using pre-trained weight W, specify the file containing W with an absolute path. If a file is specified and weight W is to be loaded from a file, initialization with the initializer will be disabled.
W.Initializer	Specifies the initialization method for weight W. Uniform: Initialization is performed using uniform random numbers between -1.0 and 1.0. Normal: Initialization is performed using Gaussian random numbers with 0.0 mean and 1.0 variance. Constant: All elements are initialized with a constant (1.0).
W.InitializerMultiplier	Specifies the multiplier to apply to the values that the initializer generates.
W.LRateMultiplier	Specifies the multiplier to apply to the Learning Rate specified on the CONFIG tab. This multiplier is used to update weight W. For example, if the Learning Rate specified on the CONFIG tab is 0.01 and W.LRateMultiplier is set to 2, weight W will be updated using a Learning Rate of 0.02.

Pooling layer

MaxPooling

MaxPooling outputs the maximum value of local inputs.

KernelShape	Specifies the size of the local region for sampling the maximum value. For example, to output the maximum value over a 3 (height) by 5 (width) region, specify “3,5”.
Strides	Specifies the period for sampling maximum values (after how many samples maximum values are determined) The output size of each axis will be downsampled by the specified value. For example, to sample maximum values every two samples in the X-axis direction and every three samples in the Y-axis direction, specify “3,2”. * Use the same value as KernelShape for KernelShape:Strides.
IgnoreBorder	Specifies the border processing method. True: Processing is performed over regions that have enough samples to fill KernelShape. Samples near the border where there are not enough samples to fill KernelShape are ignored. False: Samples near the border are not discarded. Processing is performed even over regions that only have one sample.
Padding	Specifies the size of zero padding to add to the ends of the arrays before the pooling process. For example, to add two pixels of zero padding to the top and bottom and one pixel of zero padding to the left and right of an image, specify “2,1”.

SumPooling

SumPooling outputs the sum of local inputs. The properties are the same as those of MaxPooling.

AveragePooling

AveragePooling outputs the average of local inputs. The properties are the same as those of MaxPooling.

Unpooling

Unpooling copies a single input to multiple inputs in order to generate data larger in size than the input data.

KernelShape

Specify the size of data to copy.

For example, if you want to copy a data sample twice in the vertical direction and three times in the horizontal direction (output data whose size is twice as large in the vertical direction and three times as large in the horizontal direction), specify “2,3”.

Activation layer

Tanh

Tanh outputs the result of taking the hyperbolic tangent of the input.

o=tanh(i)

(where o is the output and i is the input)

Sigmoid

Sigmoid outputs the result of taking the sigmoid of the input. This is used when you want to obtain probabilities or output values ranging from 0.0 to 1.0.

o=sigmoid(i)

(where o is the output and i is the input)

ReLU

ReLU outputs the result of applying the rectified linear unit (ReLU) to the input.

o=max(0, i)

(where o is the output and i is the input)

CReLU

Concatenated ReLU (CReLU) applies Relu to each of the negated input signals, concatenates each result on the axis indicated by the Axis property, and outputs the final result.

LeakyReLU

Unlike ReLU, which always outputs 0 for inputs less than 0, LeakyReLU multiplies inputs less than 0 with a constant value to output results.

o=max(0, i) + a min(0, i)

(where o is the output and i is the input)

alpha

Specify negative gradient a.

PReLU

Unlike ReLU, which always outputs 0 for inputs less than 0, parametric ReLU (PReLU) multiplies inputs less than 0 with a constant value to output results. The value of a, which is a gradient less than 0, is obtained from training.

o=max(0, i) + a min(0, i)

(where o is the output and i is the input)

BaseAxis	Of the available inputs, specifies the index (that starts at 0) of the axis that individual a’s are to be trained on. For example, for inputs 4,3,5, to train individual a’s for the first dimension (four elements), set BaseAxis to 0.
ParameterScope	Specifies the name of the parameter used by this layer. The parameter is shared between layers with the same ParameterScope.
slope.File	When using pre-trained gradient a, specifies the file containing a with an absolute path. If a file is specified and weight slope is to be loaded from a file, initialization with the initializer will be disabled.
slope.Initializer	Specifies the initialization method for gradient a. Uniform: Initialization is performed using uniform random numbers between -1.0 and 1.0. Normal: Initialization is performed using Gaussian random numbers with 0.0 mean and 1.0 variance. Constant: All elements are initialized with a constant (1.0).
slope.InitializerMultiplier	Specifies the multiplier to apply to the values that the initializer generates.
slope.LRateMultiplier	Specifies the multiplier to apply to the Learning Rate specified on the CONFIG tab. This multiplier is used to update gradient a. For example, if the Learning Rate specified on the CONFIG tab is 0.01 and slope.LRateMultiplier is set to 2, gradient a will be updated using a Learning Rate of 0.02.

ELU

ELU outputs the result of applying the exponential linear unit (ELU) to the input.

o=max(0, i) + alpha(exp(min(0, i)) – 1)

(where o is the output and i is the input)

Alpha

Specify coefficient alpha for negative outputs.

CELU

Concatenated ELU (CELU) applies ELU to each of the negated input signals, concatenates each result on the axis indicated by the Axis property, and outputs the final result.

SELU

SELU outputs the result of applying the scaled exponential linear unit (SELU) to the input.

o=lambda {max(0, i) + alpha(exp(min(0, i)) – 1)}

(where o is the output and i is the input)

Scale	Specify the whole scale lambda.
Alpha	Specify coefficient alpha for negative outputs.

Swish

Swish outputs the result of taking the swish of the input.

o=i/(1+exp(-i))

(where o is the output and i is the input)

Abs

Abs outputs the absolute values of inputs.

o=abs(i)

(where o is the output and i is the input)

Softmax

Softmax outputs the Softmax of inputs. This is used when you want to obtain probabilities in a categorization problem or output values ranging from 0.0 to 1.0 that sum up to 1.0.

o_x=exp(i_x) / Σ_{_j}exp(i_j)

(where o is the output, i is the input, and x is the data index)

Loop Control layer

The Loop Control layer is useful for configuring networks with a loop structure, such as residual networks and recurrent neural networks.

RepeatStart

RepeatStart is a layer that indicates the start position of a loop. Layers placed between RepeatStart and RepeatEnd are created repeatedly for the number of times specified by the Times property of Repeat Start.

Notes

The array sizes of the layers between RepeatStart and RepeatEnd must be the same. Structures whose array sizes differ between repetitions cannot be written.

RepeatEnd

RepeatEnd is a layer that indicates the end position of a loop.

RecurrentInput

RecurrentInput is a layer that indicates the time loop start position of a recurrent neural network. The axis specified by the Axis property is handled as the time axis. The length of a time loop is defined by the number of elements specified by the Axis property of the input data.

RecurrentOutput

RecurrentOutput is a layer that indicates the time loop end position of a recurrent neural network.

Delay

Delay is a layer that indicates the time delay signal in a recurrent neural network.

Size	Specifies the size of the time delay signal.
Initial.Dataset	Specifies the name of the variable to be used as the initial value of the time delay signal.
Initial.Generator	Specifies the generator to use in place of the dataset. If the Generator property is not set to None, the data that the generator generates is used in place of the variable specified by T.Dataset during optimization. None: Data generation is not performed. Uniform: Uniform random numbers between -1.0 and 1.0 are generated. Normal: Gaussian random numbers with 0.0 mean and 1.0 variance are generated. Constant: Data whose elements are all constant (1.0) is generated.
Initial.GeneratorMultiplier	Specifies the multiplier to apply to the values that the generator generates.

Quantization layer

The quantization layer is used to quantize weights and data.

FixedPointQuantize

FixedPointQuantize performs linear quantization.

Sign	Specify whether to include signs. When set to false, all values after quantization will be positive.
N	Specify the number of quantization bits.
Delta	Specify the quantization step size.
STEFineGrained	Specify the gradient calculation method for backward processing. True: The gradient is always 1. False: The gradient is 1 between the maximum and minimum values of the range that can be expressed through quantization; otherwise it is 0.

Pow2Quantize

Pow2Quantize performs power-of-two quantization.

Sign	Specify whether to include signs. When set to false, all values after quantization will be positive.
WithZero	Specify whether to include zeros. When set to false, values after quantization will not include zeros.
N	Specify the number of quantization bits.
M	Specify the maximum value after quantization as 2^M.
STEFineGrained	Specify the gradient calculation method for backward processing. True: The gradient is always 1. False: The gradient is 1 between the maximum and minimum values of the range that can be expressed through quantization; otherwise it is 0.

BinaryConnectAffine

BinaryConnectAffine is an Affine layer that uses W, which has been converted into the binary values of -1 and +1.

o = sign(W)i+b

(where i is the input, o is the output, W is the weight, and b is the bias term.)

Wb.*	Specifies the weight Wb settings to use after the conversion into binary values. The properties are the same as those of W for the Affine layer.

For details on other properties, see the Affine layer.

BinaryConnectConvolution

BinaryConnectConvolution is a convolution layer that uses W, which has been converted into the binary values of -1 and +1.

O_x,y,m = Σ_{_i,j,n} sign(W_i,j,n,m) I_x+i,y+j,n + b_m (two-dimensional convolution)

(where O is the output; I is the input; i,j is the kernel size; x,y,n is the input index; m is the output map (OutMaps property), W is the kernel weight, and b is the bias term of each kernel)

Wb.*	Specifies the weight Wb settings to use after the conversion into binary values. The properties are the same as those of W for the Convolution layer.

For details on other properties, see the Convolution layer.

BinaryWeightAffine

BinaryWeightAffine is an affine layer that uses W, which has been converted into binary values of -1 and +1, and then scaled in order to make the output closer to the normal Affine layer.

o = a sign(W)i +b

(where i is the input, o is the output, W is the weight, a is the scale, and b is the bias term.)

BinaryWeightConvolution

BinaryWeightConvolution is an affine layer that uses W, which has been converted into binary values of -1 and +1, and then scaled to make the output closer to the normal Convolution layer.

O_x,y,m = Σ_{_i,j,n} sign(W_i,j,n,m) I_x+i,y+j,n * a+ b_m (two-dimensional convolution)

(where O is the output; I is the input; i,j is the kernel size; x,y,n is the input index; m is the output map (OutMaps property), W is the kernel weight, a is the scale, and b is the bias term of each kernel)

BinaryTanh

Binarytanh outputs -1 for inputs less than or equal to 0 and +1 for inputs great than 0.

BinarySigmoid

BinarySigmoid outputs 0 for inputs less than or equal to 0 and +1 for inputs great than 0.

Unit layer

The unit layer provides a function for inserting other networks in the middle of a network. By using the unit layer, you can insert a collection of layers defined as a network in the middle of another network.

A network that another network is inserted into using the unit layer is called a caller network, and the network to be inserted into another network using the unit layer is called a unit network.

Unit

Inserts a unit network into the current caller network.

Network

Specify the name of the unit network to be inserted.

ParameterScope

Specify the name of the parameter used by this unit.

The parameter is shared between units with the same ParameterScope.

(Other properties)

Specify the properties of the unit network to be inserted.

Argument

Set the parameters of the unit network to allow editing as unit properties from the caller network. You can specify the name of an argument layer for the other layer properties in the unit network to use the argument layer values from the other layers in the unit network. The implemented argument layer values can be specified through the unit properties implemented in the caller network.

Value

Specify the default parameter value.

Type

Specify the parameter type.

Boolean: True or False

Int: Integer

IntArray: Array of integers

PInt: Integer greater than equal to 1

PIntArray: Array of integers greater than equal to 1

PIntArrays: Array of array of integers greater than equal to 1

UInt: Unsigned integer

UIntArray: Array of unsigned integers

Float: Floating-point number

FloatArray: Array of floating-point numbers

FloatArrays: Array of array of floating-point numbers

Text: Character string

File: File name

Specify for the caller network whether to include the parameter in the automatic structure search.

Arithmetic operation layer

Sum

Sum determines the sum of the values of the specified dimension.

Axis	Specifies the index (starting at 0) of the axis to sum the values of.
KeepDims	Specifies whether to hold the axis whose values have been summed.

Mean

Mean determines the average of the values of the specified dimension.

Axis	Specifies the index (starting at 0) of the axis whose values will be averaged
KeepDims	Specifies whether to hold the axis whose values have been averaged.

Prod

Prod determines the product of the values of the specified dimension.

Axis	Specifies the index (starting at 0) of the axis to multiply the values of.
KeepDims	Specifies whether to hold the axis whose values have been multiplied.

Max

Max determines the maximum of the values of the specified dimension.

Axis	Specifies the index (starting at 0) of the axis on which to determine the maximum value.
KeepDims	Specifies whether to hold the axis whose maximum value has been determined.

Min

Min determines the minimum of the values of the specified dimension.

Axis	Specifies the index (starting at 0) of the axis on which to determine the minimum value.
KeepDims	Specifies whether to hold the axis whose minimum value has been determined.

Log

Log calculates the natural logarithm with base e.

Exp

Exp calculates the exponential function with base e.

Sign

Sign outputs -1 for negative inputs, +1 for positive inputs, and alpha for 0.

Alpha

Specifies the output corresponding to input 0.

Reference

When compared to BinaryTanh, the operation is similar to forward computation (except for the behavior when the input is 0), but the operation of backward computation is completely different. Unlike BinaryTanh, which sets the derivative to 0 when the absolute value of the input data is 1 or greater, Sign passes through the back propagation derivative as its own derivative.

Scalar arithmetic operation layer (Arithmetic (Scalar))

Various arithmetic operations can be performed on each element using a real number specified by the Value property.

Layer name	Expression
AddScalar	o=i+value
MulScalar	o=i*value
RSubScalar	o=value-i
RDivScalar	o=value/i
PowScalar	o=i ^value
RPowScalar	o=value ⁱ
MaximumScalar	o=max (i,value)
MinimumScalar	o=min(i,value)

(where i is the input, o is the output, and value is the real number)

Two-input arithmetic operation layer (Arithmetic (2 Inputs))

Various arithmetic operations can be performed on each element using two inputs.

Layer	Expression
Add2	o=i₁+i₂
Sub2	o=i₁-i₂
Mul2	o=i₁*i₂
Div2	o=i₁/i₂ Connect the input for i₂ in the right hand side to connector R.
Pow2	o=i₁ⁱ² Connect the input for i₂ in the right hand side to connector R.
Maxmum2	o=max(i₁,i₂)
Minimum2	o=min(i₁,i₂)

(where i₁ and i₂ are inputs and o is the output)

Logical operation layer

Various logical operations can be performed on each element using two inputs or one input and a value specified by the Value property. The logical operation output is 0 or 1.

Layer name	Process
LogicalAnd	o= i₁ and i₂
LogicalOr	o= i₁ or i₂
LogicalXor	o= i₁ xor i₂
Equal	o= i₁ == i₂
NotEqual	o= i₁ != i₂
GreaterEqual	o= i₁ >= i₂
Greater	o= i₁ > i₂
LessEqual	o= i₁ <= i₂
Less	o= i₁ < i₂
LogicalAndScalar	o= i and value
LogicalOrScalar	o= i or value
LogicalXorScalar	o= i xor value
EqualScalar	o= i == value
NotEqualScalar	o= i != value
GreaterEqualScalar	o= i >= value
GreaterScalar	o= i > value
LessEqualScalar	o= i <= value
LessScalar	o= i < value
LogicalNot	o= !i

Notes

A logical operation layer does not support back propagation.

Evaluation layer

BinaryError

The input data and variable T in the dataset indicating correct values are converted into binary values (0 or 1) depending on whether the values are greater than or equal to 0.5. Then, each unmatched data sample is evaluated. If the input data match the correct binary value, 0 is output. Otherwise, 1 is output.

T.Dataset

Specifies the name of the variable expected to be the output of this layer.

T.Generator

None: Data generation is not performed.

Uniform: Uniform random numbers between -1.0 and 1.0 are generated.

Normal: Gaussian random numbers with 0.0 mean and 1.0 variance are generated.

Constant: Data whose elements are all constant (1.0) is generated.

T.GeneratorMultiplier

Specifies the multiplier to apply to the values that the generator generates.

TopNError

Based on the input data indicating the probability or score of each category and variable T in the dataset indicating the category index, each data sample is evaluated as to whether the probability or score of the correct category is within the top N of all categories. If the probability or score of the correct category is within the top N, 0 is output. Otherwise, 1 is output.

Axis	Specifies the index of the axis indicating the category.
N	Specifies the lowest ranking N that will be considered correct. For example, if you want to allow only the maximum probability or score of the correct category to be considered correct, specify 1. If you want to allow only the top five probabilities or scores of the correct category to be considered correct, specify 5.
T.Dataset	Specifies the name of the variable expected to be the output of this layer.
T.Generator	Specifies the generator to use in place of the dataset. If the Generator property is not set to None, the data that the generator generates is used in place of the variable specified by T.Dataset during optimization. None: Data generation is not performed. Uniform: Uniform random numbers between -1.0 and 1.0 are generated. Normal: Gaussian random numbers with 0.0 mean and 1.0 variance are generated. Constant: Data whose elements are all constant (1.0) is generated.
T.GeneratorMultiplier	Specifies the multiplier to apply to the values that the generator generates.

Other layers

BatchNormalization

The input is normalized to 0 mean and 1 variance. Inserting this layer after Convolution or Affine has the effect of improving accuracy and accelerating convergence.

o_x=(i_x-mean_i) *gamma_x/sigma_x +beta_x

(where o is the output, i is the input, and x is the data index)

Axes	Of the available inputs, specifies the index (that starts at 0) of the axis to be normalized individually. For example, for inputs 4,3,5, to individually normalize the first dimension (four elements), set Axis to 0. To individually normalize the second and third dimensions (elements 3,5), set Axes to 1,2.
DecayRate	Specifies the decay rate (0.0 to 1.0) to apply when updating the mean and standard deviation of the input data during training. The closer the value is to 1.0, the more the mean and standard deviation determined from past data are retained.
Epsilon	Specifies the value to add to the denominator (standard deviation of the input data) to prevent division by zero during normalization.
BatchStat	Specifies whether to use the average variance calculated for each mini-batch in batch normalization. True: The average variance calculated for each mini-batch is used.
ParameterScope	Specifies the name of the parameter used by this layer. The parameter is shared between layers with the same ParameterScope.
beta.File	When using pre-trained beta, specifies the file containing beta with an absolute path. If a beta is specified and is to be loaded from a file, initialization with the initializer will be disabled.
beta.Initializer	Specifies the initialization method for beta. Uniform: Initialization is performed using uniform random numbers between -1.0 and 1.0. Normal: Initialization is performed using Gaussian random numbers with 0.0 mean and 1.0 variance. Constant: All elements are initialized with a constant (1.0).
beta.InitializerMultiplier	Specifies the multiplier to apply to the values that the initializer generates.
beta.LRateMultiplier	Specifies the multiplier to apply to the Learning Rate specified on the CONFIG tab. This multiplier is used to update weight W. For example, if the Learning Rate specified on the CONFIG tab is 0.01 and W.LRateMultiplier is set to 2, weight W will be updated using a Learning Rate of 0.02.
gamma.*	Specifies the standard deviation after normalization.
mean.*	Specifies the mean of the input data.
var.*	Specifies the standard deviation of the input data.

Dropout

Dropout sets input elements to 0 with a given probability.

P	Specifies the probability to set an element to 0, within the range from 0.0 to 1.0.

Concatenate

Concatenate joins two or more arrays on an existing axis.

Axis	Specifies the axis on which to concatenate arrays. Axis indexes take on values 0, 1, 2, and so on from the left. For example, to concatenate two inputs “3,28,28” and “5,28,28” on the first (the leftmost) axis, specify “0”. In this case, the output size will be “8,28,28”.

Reshape

Reshape transforms the shape of an array into the specified shape.

OutShape

Specifies the shape of the array after the transform.

For example, to output an array “2,5,5” as “10,5”, specify “10,5”.

The number of elements in the input and output arrays must be the same.

Broadcast

Broadcast transforms the shape of an array into the specified shape by copying the elements of axes whose element number is 1.

OutShape

Specify the shape of the array after the transform.

For example, to copy elements of the second axis of an array “3,1,2” 10 times specify “3,10,2”.

Flip

Flip reverses the order of elements of the specified dimension of an array.

Axes	Specifies the index of the dimension you want to reverse the order of the elements. Axis indexes take on values 0, 1, 2, and so on from the left. For example, to flip a 32 (W) by 24 (H) RGB image “3,24,32” vertically and horizontally, specify “1,2”.

Shift

Shift shifts the array elements by the specified amount.

Shift

Specifies the amount to shift elements.

For example, to shift image data to the right by 2 pixels and up 3 pixels, specify “-3,2”.

BorderMode

Specifies how to process the ends of arrays whose values will be undetermined as a result of shifting.

nearest: The data at the borders (beginning and end) of the original array is copied and used.

reflect: Original data at the borders of the original array is reflected (reversed) and used.

Transpose

Transpose swaps data dimensions.

Axes

Specifies the axis indexes on the output data for each dimension of the input data.

Axis indexes take on values 0, 1, 2, and so on from the left.

For example, to swap the second and third dimensions of input data “3,20,10” and output the result, specify “0,2,1” (the output data size in this case is 3,10,20).

Slice

Slice extracts part of the array.

Start

Specify the start point of extraction

Stop

Specify the end point of extraction

Step

Specify the interval of extraction

For example, to extract the center 24 x 32 pixels of an image data of size “3,48,64” at two pixel intervals, specify “0,12,16” for Start, “3,36,48” for Stop, and “1,2,2” for Step

Stack

Stack joins two or more arrays on a new axis. The sizes of all the arrays to be stacked must be the same. Unlike Concatenate, which joins arrays on an existing axis, Stack joins arrays on a new axis.

Axis	Specifies the axis on which to concatenate arrays. Axis indexes take on values 0, 1, 2, and so on from the left. For example, to stack four “3,28,28” inputs on the second axis, specify “1”. In this case, the output size will be “3,4,28,28”.

MatrixDiag

MatrixDiag performs a matrix diagonalization of the last one dimension of an array.

MatrixDiagPart

MatrixDiagPart extracts the diagonal component of the last two dimensions of an array.

VATNoise

This layer is provided to achieve a method called Virtual Adversarial Training.

This layer normalizes and outputs input noise signals during forward calculation. During backward calculation, the error propagated from the output is stored in Buf.

Unlink

The input is output directly during forward calculation, and zero is output during backward calculation. This is used when you do not want errors to propagate to layers before the unlink layer.

Identity

Identity outputs the input as-is. There is no need to use this layer normally, but it can be inserted to assign a name for identification in certain locations of the network.

Other layers (preprocess)

OneHot

OneHot creates one-hot array based on input indices.

Shape

Specify the size of the array to be created

The number of dimensions of the Shape must be the same as the number of elements in the last dimension of the input data

MeanSubtraction

MeanSubtraction normalizes input to mean 0. Using this as a preprocessing function has the effect of improving accuracy in image classification and similar tasks.

o_x=i_x-mean

(where o is the output, i is the input, and x is the data index)

BaseAxis	Specifies the index of the first axis to take the mean of.
UpdateRunningMean	Specifies whether to calculate a running mean.
ParameterScope	Specifies the name of the parameter used by this layer. The parameter is shared between layers with the same ParameterScope.
mean.*	Specifies the mean of the input data.
t.*	Specifies the number of mini-batches that was used to calculate the mean of the input data.

RandomFlip

RandomFlip reverses the order of elements of the specified dimension of an array at 50% probability.

Axes

Specifies the index of the axis you want to reverse the order of the elements.

Axis indexes take on values 0, 1, 2, and so on from the left.

For example, to flip a 32 (W) by 24 (H) RGB image “3,24,32” vertically and horizontally at random, specify “1,2”.

SkipAtInspection

Specifies whether to skip processing at inspection.

To execute RandomFlip only during training, set SkipAtInspection to True (default).

RandomShift

RandomShift randomly shifts the array elements within the specified range.

Shift

Specifies the amount to shift elements by.

For example, to shift image data horizontally by ±2 pixels and vertically by ±3 pixels, specify “3,2”.

BorderMode

Specifies how to process the ends of arrays whose values will be undetermined as a result of shifting.

nearest: The data at the borders (beginning and end) of the original array is copied and used.

reflect: Original data at the borders of the original array is reflected (reversed) and used.

SkipAtInspection

Specifies whether to skip processing at inspection.

To execute RandomFlip only during training, set SkipAtInspection to True (default).

RandomCrop

RandomCrop randomly extracts a portion of an array.

Shape

Specifies the data size to extract.

For example, to randomly extract a portion of the image (3,48,48) from a 3,64,64 image, specify “3,48,48”.

ImageAugmentation

ImageAugmentation randomly alters the input image.

Shape	Specifies the output image data size.
MinScale	Specifies the minimum scale ratio when randomly scaling the image. For example, to scale down to 0.8 times the size of the original image, specify “0.8”. To not apply random scaling, set both MinScale and MaxScale to “1.0”.
MaxScale	Specifies the maximum scale ratio when randomly scaling the image. For example, to scale up to 2 times the size of the original image, specify “2.0”.
Angle	Specifies the rotation angle range in radians when randomly rotating the image. The image is randomly rotated in the -Angle to +Angle range. For example, to rotate in a ±15 degree range, specify “0.26” (15 degrees/360 degrees × 2π). To not apply random rotation, specify “0.0”.
AspectRatio	Specifies the aspect ratio variation range when randomly varying the aspect ratio of the image. For example, if the original image is 1:1, to vary the aspect ratio between 1:1.3 and 1.3:1, specify 1.3.
Distortion	Specifies the strength range when randomly distorting the image.
FlipLR	Specifies whether to randomly flip the image horizontally.
FlipUD	Specifies whether to randomly flip the image vertically.
Brightness	Specifies the range of values to randomly add to the brightness. A random value in the -Brightness to +Brightness range is added to the brightness. For example, to vary the brightness in the -0.05 to +0.05 range, specify “0.05”. To not apply random addition to brightness, specify “0.0”.
BrightnessEach	Specifies whether to apply the random addition to brightness (as specified by Brightness) to each color channel. True: Brightness is added based on a different random number for each channel. False: Brightness is added based on a random number common to all channels.
Contrast	Specifies the range in which to randomly very the image contrast. The contrast is varied in the 1/Contrast times to Contrast times range. The output brightness is equal to (input ? 0.5) * contrast + 0.5. For example, to vary the contrast in the 0.91 times to 1.1 times range, specify “1.1”. To not apply random contrast variation, specify “1.0”.
ContrastEach	Specifies whether to apply the random contrast variation (as specified by Contrast) to each color channel. True: Contrast is varied based on a different random number for each channel. False: Contrast is varied based on a random number common to all channels.

Reference

An effective and easy image augmentation verification method is available. In this method, a network consisting only of image input → image augmentation → squared error is used for training with maximum epoch set to 0 and the processed result is monitored with the Run Evaluation button.

Configuration layer

This is a layer for configuring the network.

StructureSearch

StructureSearch is used to configure the automatic structure search.

Specify whether to include this network in the automatic structure search.

When set to false, the network will not change during automatic structure search.

The default value when the StructureSearch is not implemented is true.