What Is Weight Sharing In Deep Learning And Why Is It Important – Analytics India Magazine

Neural architecture search (NAS) deals with the selection of neural models for specific learning problems. NAS, however, is computationally expensive for automating and democratising machine learning. The initial success of NAS was attributed partially to the weight-sharing method, which helped in the dramatic acceleration of probing the architectures. But why is the weight sharing method being criticised?

Traditionally, NAS methods were expensive due to the combinatorially large search space, requiring to train thousands of neural networks to completion. In 2018, ENAS (Efficient NAS) paper, introduced the idea of weight-sharing, in which only one shared set of model parameters is trained for all architectures.

These shared weights were used to compute the validation losses of different architectures which are then used as estimates of their validation losses. Since one had to train only one set of parameters, weight-sharing led to a massive speedup over earlier methods, reducing search time on CIFAR-10 from 2,000-20,000 GPU-hours to just 16.

The validation accuracies computed using shared weights were sufficient to find good models cheaply. However, this correlation, although sufficient, doesnt mean that weight-sharing does well.

This method has come under scrutiny due to its poor performance as a substitute for full model-training and is alleged to be inconsistent with results on recent benchmarks.

The technique of sharing parameters among child models allowed efficient NAS to deliver strong empirical performances, whilst using much fewer GPU-hours than existing automatic model design approaches, and notably, 1000x less expensive than standard Neural Architecture.

The most popular implementation of shared weights as substitutes for standalone weights is the Random Search with Weight-Sharing (RS-WS) method, in which the shared parameters are optimised by taking gradient steps using architectures sampled uniformly at random from the search space.

However, practitioners started to wonder if sharing weights between models accelerate NAS.

In an attempt to address this issue and to make a case for the weight sharing mechanism, the researchers at CMU published a work that lists their findings. The paper states that most of the criticism on weight sharing has the issue of the rank disorder as a common occurrence. The rank disorder occurs when the shared-weight performance of architectures does not correlate well with their standalone performance.

The rank disorder is a problem for those methods, which rely on the shared-weights performance to rank architectures for evaluation, as it will cause them to ignore networks that achieve high accuracy when their parameters are trained without sharing.

Home What Is Weight Sharing In Deep Learning And Why Is It Important

The above picture illustrates rank-disorder issues where shared-weights are on the right, and individual weights trained from scratch are on the left.

To tackle this, the researchers present a unifying framework for designing and analysing gradient-based NAS methods that exploit the underlying problem structure to find high-performance architectures quickly. The geometry-aware framework, wrote the researchers, resulted in the algorithms that:

The results show that this new framework outclasses previous best works for both CIFAR and ImageNet on both the DARTS search space and NAS-Bench-201.

According to the authors, this work on weight sharing methods tried to establish the following:

Link to paper

comments

Read more:
What Is Weight Sharing In Deep Learning And Why Is It Important - Analytics India Magazine

Related Posts
This entry was posted in $1$s. Bookmark the permalink.