English 中文(简体)
Apache MXNet - Python API gluon
  • 时间:2024-09-17

Apache MXNet - Python API gluon


Previous Page Next Page  

As we have already discussed in previous chapters that, MXNet Gluon provides a clear, concise, and simple API for DL projects. It enables Apache MXNet to prototype, build, and train DL models without forfeiting the training speed.

Core Modules

Let us learn the core modules of Apache MXNet Python apppcation programming interface (API) gluon.

gluon.nn

Gluon provides a large number of build-in NN layers in gluon.nn module. That is the reason it is called the core module.

Methods and their parameters

Following are some of the important methods and their parameters covered by mxnet.gluon.nn core module −

Methods and its Parameters Definition
Activation(activation, **kwargs) As name imppes, this method apppes an activation function to input.
AvgPool1D([pool_size, strides, padding, …]) This is average poopng operation for temporal data.
AvgPool2D([pool_size, strides, padding, …]) This is average poopng operation for spatial data.
AvgPool3D([pool_size, strides, padding, …]) This is Average poopng operation for 3D data. The data can be spatial or spatio-temporal.
BatchNorm([axis, momentum, epsilon, center, …]) It represents batch normapsation layer.
BatchNormReLU([axis, momentum, epsilon, …]) It also represents batch normapsation layer but with Relu activation function.
Block([prefix, params]) It gives the base class for all neural network layers and models.
Conv1D(channels, kernel_size[, strides, …]) This method is used for 1-D convolution layer. For example, temporal convolution.
Conv1DTranspose(channels, kernel_size[, …]) This method is used for Transposed 1D convolution layer.
Conv2D(channels, kernel_size[, strides, …]) This method is used for 2D convolution layer. For example, spatial convolution over images).
Conv2DTranspose(channels, kernel_size[, …]) This method is used for Transposed 2D convolution layer.
Conv3D(channels, kernel_size[, strides, …]) This method is used for 3D convolution layer. For example, spatial convolution over volumes.
Conv3DTranspose(channels, kernel_size[, …]) This method is used for Transposed 3D convolution layer.
Dense(units[, activation, use_bias, …]) This method represents for your regular densely-connected NN layer.
Dropout(rate[, axes]) As name imppes, the method apppes Dropout to the input.
ELU([alpha]) This method is used for Exponential Linear Unit (ELU).
Embedding(input_dim, output_dim[, dtype, …]) It turns non-negative integers into dense vectors of fixed size.
Flatten(**kwargs) This method flattens the input to 2-D.
GELU(**kwargs) This method is used for Gaussian Exponential Linear Unit (GELU).
GlobalAvgPool1D([layout]) With the help of this method, we can do global average poopng operation for temporal data.
GlobalAvgPool2D([layout]) With the help of this method, we can do global average poopng operation for spatial data.
GlobalAvgPool3D([layout]) With the help of this method, we can do global average poopng operation for 3-D data.
GlobalMaxPool1D([layout]) With the help of this method, we can do global max poopng operation for 1-D data.
GlobalMaxPool2D([layout]) With the help of this method, we can do global max poopng operation for 2-D data.
GlobalMaxPool3D([layout]) With the help of this method, we can do global max poopng operation for 3-D data.
GroupNorm([num_groups, epsilon, center, …]) This method apppes group normapzation to the n-D input array.
HybridBlock([prefix, params]) This method supports forwarding with both Symbol and NDArray.
HybridLambda(function[, prefix]) With the help of this method we can wrap an operator or an expression as a HybridBlock object.
HybridSequential([prefix, params]) It stacks HybridBlocks sequentially.
InstanceNorm([axis, epsilon, center, scale, …]) This method apppes instance normapsation to the n-D input array.

Implementation Examples

In the example below, we are going to use Block() which gives the base class for all neural network layers and models.


from mxnet.gluon import Block, nn
class Model(Block):
   def __init__(self, **kwargs):
      super(Model, self).__init__(**kwargs)
      # use name_scope to give child Blocks appropriate names.
      with self.name_scope():
         self.dense0 = nn.Dense(20)
         self.dense1 = nn.Dense(20)
   def forward(self, x):

      x = mx.nd.relu(self.dense0(x))
      return mx.nd.relu(self.dense1(x))

model = Model()
model.initiapze(ctx=mx.cpu(0))
model(mx.nd.zeros((5, 5), ctx=mx.cpu(0)))

Output

You will see the following output −


[[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]]
<NDArray 5x20 @cpu(0)*gt;

In the example below, we are going to use HybridBlock() that supports forwarding with both Symbol and NDArray.


import mxnet as mx
from mxnet.gluon import HybridBlock, nn


class Model(HybridBlock):
   def __init__(self, **kwargs):
      super(Model, self).__init__(**kwargs)
      # use name_scope to give child Blocks appropriate names.
      with self.name_scope():
         self.dense0 = nn.Dense(20)
         self.dense1 = nn.Dense(20)

   def forward(self, x):
      x = nd.relu(self.dense0(x))
      return nd.relu(self.dense1(x))
model = Model()
model.initiapze(ctx=mx.cpu(0))

model.hybridize()
model(mx.nd.zeros((5, 5), ctx=mx.cpu(0)))

Output

The output is mentioned below −


[[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]]
<NDArray 5x20 @cpu(0)>

gluon.rnn

Gluon provides a large number of build-in recurrent neural network (RNN) layers in gluon.rnn module. That is the reason, it is called the core module.

Methods and their parameters

Following are some of the important methods and their parameters covered by mxnet.gluon.nn core module:

Methods and its Parameters Definition
BidirectionalCell(l_cell, r_cell[, …]) It is used for Bidirectional Recurrent Neural Network (RNN) cell.
DropoutCell(rate[, axes, prefix, params]) This method will apply dropout on the given input.
GRU(hidden_size[, num_layers, layout, …]) It apppes a multi-layer gated recurrent unit (GRU) RNN to a given input sequence.
GRUCell(hidden_size[, …]) It is used for Gated Rectified Unit (GRU) network cell.
HybridRecurrentCell([prefix, params]) This method supports hybridize.
HybridSequentialRNNCell([prefix, params]) With the help of this method we can sequentially stack multiple HybridRNN cells.
LSTM(hidden_size[, num_layers, layout, …])0 It apppes a multi-layer long short-term memory (LSTM) RNN to a given input sequence.
LSTMCell(hidden_size[, …]) It is used for Long-Short Term Memory (LSTM) network cell.
ModifierCell(base_cell) It is the Base class for modifier cells.
RNN(hidden_size[, num_layers, activation, …]) It apppes a multi-layer Elman RNN with tanh or ReLU non-pnearity to a given input sequence.
RNNCell(hidden_size[, activation, …]) It is used for Elman RNN recurrent neural network cell.
RecurrentCell([prefix, params]) It represents the abstract base class for RNN cells.
SequentialRNNCell([prefix, params]) With the help of this method we can sequentially stack multiple RNN cells.
ZoneoutCell(base_cell[, zoneout_outputs, …]) This method apppes Zoneout on the base cell.

Implementation Examples

In the example below, we are going to use GRU() which apppes a multi-layer gated recurrent unit (GRU) RNN to a given input sequence.


layer = mx.gluon.rnn.GRU(100, 3)
layer.initiapze()
input_seq = mx.nd.random.uniform(shape=(5, 3, 10))
out_seq = layer(input_seq)
h0 = mx.nd.random.uniform(shape=(3, 3, 100))
out_seq, hn = layer(input_seq, h0)
out_seq

Output

This produces the following output −


[[[ 1.50152072e-01 5.19012511e-01 1.02390535e-01 ... 4.35803324e-01
1.30406499e-01 3.30152437e-02]
[ 2.91542172e-01 1.02243155e-01 1.73325196e-01 ... 5.65296151e-02
1.76546033e-02 1.66693389e-01]
[ 2.22257316e-01 3.76294643e-01 2.11277917e-01 ... 2.28903517e-01
3.43954474e-01 1.52770668e-01]]


[[ 1.40634328e-01 2.93247789e-01 5.50393537e-02 ... 2.30207980e-01
6.61415309e-02 2.70989928e-02]
[ 1.11081995e-01 7.20834285e-02 1.08342394e-01 ... 2.28330195e-02
6.79589901e-03 1.25501186e-01]
[ 1.15944080e-01 2.41565228e-01 1.18612610e-01 ... 1.14908054e-01
1.61080107e-01 1.15969211e-01]]
………………………….

Example


hn

Output

This produces the following output −


[[[-6.08105101e-02 3.86217088e-02   6.64453954e-03 8.18805695e-02
3.85607071e-02 -1.36945639e-02 7.45836645e-03 -5.46515081e-03
9.49622393e-02 6.39371723e-02 -6.37890724e-03 3.82240303e-02
9.11015049e-02 -2.01375950e-02 -7.29381144e-02 6.93765879e-02
2.71829776e-02 -6.64435029e-02 -8.45306814e-02 -1.03075653e-01
6.72040805e-02 -7.06537142e-02 -3.93818803e-02 5.16211614e-03
-4.79770005e-02 1.10734522e-01 1.56721435e-02 -6.93409378e-03
1.16915874e-01 -7.95962065e-02 -3.06530762e-02 8.42394680e-02
7.60370195e-02 2.17055440e-01 9.85361822e-03 1.16660878e-01
4.08297703e-02 1.24978097e-02 8.25245082e-02 2.28673983e-02
-7.88266212e-02 -8.04114193e-02 9.28791538e-02 -5.70827350e-03
-4.46166918e-02 -6.41122833e-02 1.80885363e-02 -2.37745279e-03
4.37298454e-02 1.28888980e-01 -3.07202265e-02 2.50503756e-02
4.00907174e-02 3.37077095e-03 -1.78839862e-02 8.90695080e-02
6.30150884e-02 1.11416787e-01 2.12221760e-02 -1.13236710e-01
5.39616570e-02 7.80710578e-02 -2.28817668e-02 1.92073174e-02
………………………….

In the example below we are going to use LSTM() which apppes a long-short term memory (LSTM) RNN to a given input sequence.


layer = mx.gluon.rnn.LSTM(100, 3)
layer.initiapze()

input_seq = mx.nd.random.uniform(shape=(5, 3, 10))
out_seq = layer(input_seq)
h0 = mx.nd.random.uniform(shape=(3, 3, 100))
c0 = mx.nd.random.uniform(shape=(3, 3, 100))
out_seq, hn = layer(input_seq,[h0,c0])
out_seq

Output

The output is mentioned below −


[[[ 9.00025964e-02 3.96071747e-02 1.83841765e-01 ... 3.95872220e-02
1.25569820e-01 2.15555862e-01]
[ 1.55962542e-01 -3.10300849e-02 1.76772922e-01 ... 1.92474753e-01
2.30574399e-01 2.81707942e-02]
[ 7.83204585e-02 6.53361529e-03 1.27262697e-01 ... 9.97719541e-02
1.28254429e-01 7.55299702e-02]]
[[ 4.41036932e-02 1.35250352e-02 9.87644792e-02 ... 5.89378644e-03
5.23949116e-02 1.00922674e-01]
[ 8.59075040e-02 -1.67027581e-02 9.69351009e-02 ... 1.17763653e-01
9.71239135e-02 2.25218050e-02]
[ 4.34580036e-02 7.62207608e-04 6.37005866e-02 ... 6.14888743e-02
5.96345589e-02 4.72368896e-02]]
……………

Example


hn

Output

When you run the code, you will see the following output −


[
[[[ 2.21408084e-02 1.42750628e-02 9.53067932e-03 -1.22849066e-02
1.78788435e-02 5.99269159e-02 5.65306023e-02 6.42553642e-02
6.56616641e-03 9.80876666e-03 -1.15729487e-02 5.98640442e-02
-7.21173314e-03 -2.78371759e-02 -1.90690923e-02 2.21447181e-02
8.38765781e-03 -1.38521893e-02 -9.06938594e-03 1.21346042e-02
6.06449470e-02 -3.77471633e-02 5.65885007e-02 6.63008019e-02
-7.34188128e-03 6.46054149e-02 3.19911093e-02 4.11194898e-02
4.43960279e-02 4.92892228e-02 1.74766723e-02 3.40303481e-02
-5.23341820e-03 2.68163737e-02 -9.43402853e-03 -4.11836170e-02
1.55221792e-02 -5.05655073e-02 4.24557598e-03 -3.40388380e-02
……………………

Training Modules

The training modules in Gluon are as follows −

gluon.loss

In mxnet.gluon.loss module, Gluon provides pre-defined loss function. Basically, it has the losses for training neural network. That is the reason it is called the training module.

Methods and their parameters

Following are some of the important methods and their parameters covered by mxnet.gluon.loss training module:

Methods and its Parameters Definition
Loss(weight, batch_axis, **kwargs) This acts as the base class for loss.
L2Loss([weight, batch_axis]) It calculates the mean squared error (MSE) between label and prediction(pred).
L1Loss([weight, batch_axis]) It calculates the mean absolute error (MAE) between label and pred.
SigmoidBinaryCrossEntropyLoss([…]) This method is used for the cross-entropy loss for binary classification.
SigmoidBCELoss This method is used for the cross-entropy loss for binary classification.
SoftmaxCrossEntropyLoss([axis, …]) It computes the softmax cross-entropy loss (CEL).
SoftmaxCELoss It also computes the softmax cross entropy loss.
KLDivLoss([from_logits, axis, weight, …]) It is used for the Kullback-Leibler spanergence loss.
CTCLoss([layout, label_layout, weight]) It is used for connectionist Temporal Classification Loss (TCL).
HuberLoss([rho, weight, batch_axis]) It calculates smoothed L1 loss. The smoothed L1 loss will be equal to L1 loss if absolute error exceeds rho but is equal to L2 loss otherwise.
HingeLoss([margin, weight, batch_axis]) This method calculates the hinge loss function often used in SVMs:
SquaredHingeLoss([margin, weight, batch_axis]) This method calculates the soft-margin loss function used in SVMs:
LogisticLoss([weight, batch_axis, label_format]) This method calculates the logistic loss.
TripletLoss([margin, weight, batch_axis]) This method calculates triplet loss given three input tensors and a positive margin.
PoissonNLLLoss([weight, from_logits, …]) The function calculates the Negative Log pkephood loss.
CosineEmbeddingLoss([weight, batch_axis, margin]) The function computes the cosine distance between the vectors.
SDMLLoss([smoothing_parameter, weight, …]) This method calculates Batchwise Smoothed Deep Metric Learning (SDML) Loss given two input tensors and a smoothing weight SDM Loss. It learns similarity between paired samples by using unpaired samples in the minibatch as potential negative examples.

Example

As we know that mxnet.gluon.loss.loss will calculate the MSE(Mean Squared Error) between label and prediction (pred). It is done with the help of following formula:

Mean Squared Error

gluon.parameter

mxnet.gluon.parameter is a container that holds the parameters i.e. weights of the Blocks.

Methods and their parameters

Following are some of the important methods and their parameters covered by mxnet.gluon.parameter training module −

Methods and its Parameters Definition
cast(dtype) This method will cast data and gradient of this Parameter to a new data type.
data([ctx]) This method will return a copy of this parameter on one context.
grad([ctx]) This method will return a gradient buffer for this parameter on one context.
initiapze([init, ctx, default_init, …]) This method will initiapze parameter and gradient arrays.
pst_ctx() This method will return a pst of contexts this parameter is initiapzed on.
pst_data() This method will return copies of this parameter on all contexts. It will be done in the same order as creation.
pst_grad() This method will return gradient buffers on all contexts. This will be done in the same order as values().
pst_row_sparse_data(row_id) This method will return copies of the ‘row_sparse’ parameter on all contexts. This will be done in the same order as creation.
reset_ctx(ctx) This method will re-assign Parameter to other contexts.
row_sparse_data(row_id) This method will return a copy of the ‘row_sparse’ parameter on the same context as row_id’s.
set_data(data) This method will set this parameter’s value on all contexts.
var() This method will return a symbol representing this parameter.
zero_grad() This method will set the gradient buffer on all contexts to 0.

Implementation Example

In the example below, we will initiapze parameters and the gradients arrays by using initiapze() method as follows −


weight = mx.gluon.Parameter( weight , shape=(2, 2))
weight.initiapze(ctx=mx.cpu(0))
weight.data()

Output

The output is mentioned below −


[[-0.0256899 0.06511251]
[-0.00243821 -0.00123186]]
<NDArray 2x2 @cpu(0)>

Example


weight.grad()

Output

The output is given below −


[[0. 0.]
[0. 0.]]
<NDArray 2x2 @cpu(0)>

Example


weight.initiapze(ctx=[mx.gpu(0), mx.gpu(1)])
weight.data(mx.gpu(0))

Output

You will see the following output −


[[-0.00873779 -0.02834515]
 [ 0.05484822 -0.06206018]]
<NDArray 2x2 @gpu(0)>

Example


weight.data(mx.gpu(1))

Output

When you execute the above code, you should see the following output −


[[-0.00873779 -0.02834515]
 [ 0.05484822 -0.06206018]]
<NDArray 2x2 @gpu(1)>

gluon.trainer

mxnet.gluon.trainer apppes an Optimizer on a set of parameters. It should be used together with autograd.

Methods and their parameters

Following are some of the important methods and their parameters covered by mxnet.gluon.trainer training module −

Methods and its Parameters Definition
allreduce_grads() This method will reduce the gradients from different contexts for each parameter (weight).
load_states(fname) As name imppes, this method will load trainer states.
save_states(fname) As name imppes, this method will save trainer states.
set_learning_rate(lr) This method will set a new learning rate of the optimizer.
step(batch_size[, ignore_stale_grad]) This method will make one step of parameter update. It should be called after autograd.backward() and outside of record() scope.
update(batch_size[, ignore_stale_grad]) This method will also make one step of parameter update. It should be called after autograd.backward() and outside of record() scope and after trainer.update().

Data Modules

The data modules of Gluon are explained below −

gluon.data

Gluon provides a large number of build-in dataset utipties in gluon.data module. That is the reason it is called the data module.

Classes and their parameters

Following are some of the important methods and their parameters covered by mxnet.gluon.data core module. These methods are typically related to Datasets, Samppng, and DataLoader.

Dataset
Methods and its Parameters Definition
ArrayDataset(*args) This method represents a dataset which combines two or more than two dataset-pke objects. For example, Datasets, psts, arrays, etc.
BatchSampler(sampler, batch_size[, last_batch]) This method wraps over another Sampler. Once wrapped it returns the mini batches of samples.
DataLoader(dataset[, batch_size, shuffle, …]) Similar to BatchSampler but this method loads data from a dataset. Once loaded it returns the mini batches of data.
This represents the abstract dataset class.
FilterSampler(fn, dataset) This method represents the samples elements from a Dataset for which fn (function) returns True.
RandomSampler(length) This method represents samples elements from [0, length) randomly without replacement.
RecordFileDataset(filename) It represents a dataset wrapping over a RecordIO file. The extension of the file is .rec.
Sampler This is the base class for samplers.
SequentialSampler(length[, start]) It represents the sample elements from the set [start, start+length) sequentially.
It represents the sample elements from the set [start, start+length) sequentially. This represents the simple Dataset wrapper especially for psts and arrays.

Implementation Examples

In the example below, we are going to use gluon.data.BatchSampler() API, which wraps over another sampler. It returns the mini batches of samples.


import mxnet as mx
from mxnet.gluon import data
sampler = mx.gluon.data.SequentialSampler(15)
batch_sampler = mx.gluon.data.BatchSampler(sampler, 4,  keep )
pst(batch_sampler)

Output

The output is mentioned below −


[[0, 1, 2, 3], [4, 5, 6, 7], [8, 9, 10, 11], [12, 13, 14]]

gluon.data.vision.datasets

Gluon provides a large number of pre-defined vision dataset functions in gluon.data.vision.datasets module.

Classes and their parameters

MXNet provides us useful and important datasets, whose classes and parameters are given below −

Classes and its Parameters Definition
MNIST([root, train, transform]) This is a useful dataset providing us the handwritten digits. The url for MNIST dataset is http://yann.lecun.com/exdb/mnist
FashionMNIST([root, train, transform]) This dataset consists of Zalando’s article images consisting of fashion products. It is a drop-in replacement of original MNIST dataset. You can get this dataset from https://github.com/zalandoresearch/fashion-mnist
CIFAR10([root, train, transform]) This is an image classification dataset from https://www.cs.toronto.edu/~kriz/cifar.html. In this dataset each sample is an image with shape (32, 32, 3).
CIFAR100([root, fine_label, train, transform]) This is CIFAR100 image classification dataset from https://www.cs.toronto.edu/~kriz/cifar.html. It also has each sample is an image with shape (32, 32, 3).
ImageRecordDataset (filename[, flag, transform]) This dataset is wrapping over a RecordIO file that contains images. In this each sample is an image with its corresponding label.
ImageFolderDataset (root[, flag, transform]) This is a dataset for loading image files that are stored in a folder structure.
ImageListDataset ([root, imgpst, flag]) This is a dataset for loading image files that are specified by a pst of entries.

Example

In the example below, we are going to show the use of ImageListDataset(), which is used for loading image files that are specified by a pst of entries −


# written to text file *.lst

0 0 root/cat/0001.jpg
1 0 root/cat/xxxa.jpg
2 0 root/cat/yyyb.jpg
3 1 root/dog/123.jpg
4 1 root/dog/023.jpg
5 1 root/dog/wwww.jpg

# A pure pst, each item is a pst [imagelabel: float or pst of float, imgpath]

[[0, root/cat/0001.jpg]
[0, root/cat/xxxa.jpg]
[0, root/cat/yyyb.jpg]
[1, root/dog/123.jpg]
[1, root/dog/023.jpg]
[1, root/dog/wwww.jpg]]

Utipty Modules

The utipty modules in Gluon are as follows −

gluon.utils

Gluon provides a large number of build-in parallepsation utipty optimiser in gluon.utils module. It provides variety of utipties for training. That is the reason it is called the utipty module.

Functions and their parameters

Following are the functions and their parameters consisting in this utipty module named gluon.utils −

Functions and its Parameters Definition
sppt_data(data, num_spce[, batch_axis, …]) This function is usually use for data parallepsm and each spce is sent to one device i.e. GPU. It sppts an NDArray into num_spce spces along batch_axis.
sppt_and_load(data, ctx_pst[, batch_axis, …]) This function sppts an NDArray into len(ctx_pst) spces along batch_axis. The only difference from above sppt_data () function is that, it also loads each spce to one context in ctx_pst.
cpp_global_norm(arrays, max_norm[, …]) The job of this function is to rescale NDArrays in such a way that the sum of their 2-norm is smaller than max_norm.
check_sha1(filename, sha1_hash) This function will check whether the sha1 hash of the file content matches the expected hash or not.
download(url[, path, overwrite, sha1_hash, …]) As name specifies, this function will download a given URL.
replace_file(src, dst) This function will implement atomic os.replace. it will be done with Linux and OSX.
Advertisements