Speed-up hyperparameter tuning in deep learning with Keras hyperband tuner – Analytics India Magazine

The performance of machine learning algorithms is heavily dependent on selecting a good collection of hyperparameters. The Keras Tuner is a package that assists you in selecting the best set of hyperparameters for your application. The process of finding the optimal collection of hyperparameters for your machine learning or deep learning application is known as hyperparameter tuning. Hyperband is a framework for tuning hyperparameters which helps in speeding up the hyperparameter tuning process. This article will be focused on understanding the hyperband framework. Following are the topics to be covered in this article.

Hyperparameters are not model parameters and cannot be learned directly from data. When we optimize a loss function with something like gradient descent, we learn model parameters during training. Lets talk about Hyperband and try to understand the need for its creation.

The approach of tweaking hyperparameters of machine learning algorithms is known as hyperparameter optimization (HPO). Excellent machine learning algorithms feature various, diverse, and complicated hyperparameters that produce a massive search space. Deep learning is used as the basis of many start-up processes, and the search space for deep learning methods is considerably broader than for typical ML algorithms. Tuning on a large search space is a difficult task. Data-driven strategies must be used to tackle HPO difficulties. Manual approaches do not work.

Are you looking for a complete repository of Python libraries used in data science,check out here.

By defining hyperparameter optimization as a pure-exploration adaptive resource allocation issue addressing how to distribute resources among randomly chosen hyperparameter configurations, a novel configuration assessment technique was devised. This is known as a Hyperband setup. It allocates resources using a logical early-stopping technique, allowing it to test orders of magnitude more configurations than black-box processes such as Bayesian optimization methods. Unlike previous configuration assessment methodologies, Hyperband is a general-purpose tool that makes few assumptions.

The capacity of Hyperband to adapt to unknown convergence rates and the behaviour of validation losses as a function of the hyperparameters was proved by the developers in the theoretical study. Furthermore, for a range of deep-learning and kernel-based learning issues, Hyperband is 5 to 30 times quicker than typical Bayesian optimization techniques. In the non-stochastic environment, Hyperband is one solution with properties similar to the pure-exploration, infinite-armed bandit issue.

Hyperparameters is input to a machine learning algorithm that governs the performance generalization of the algorithm to unseen data. Due to the growing number of tuning parameters associated with these models are difficult to set by standard optimization techniques.

In an effort to develop more efficient search methods, Bayesian optimization approaches that focus on optimizing hyperparameter configuration selection have lately dominated the subject of hyperparameter optimization. By picking configurations in an adaptive way, these approaches seek to discover good configurations faster than typical baselines such as random search. These approaches, however, address the fundamentally difficult problem of fitting and optimizing a high-dimensional, non-convex function with uncertain smoothness and perhaps noisy evaluations.

The goal of an orthogonal approach to hyperparameter optimization is to accelerate configuration evaluation. These methods are computationally adaptive, providing greater resources to promising hyperparameter combinations while swiftly removing bad ones. The size of the training set, the number of features, or the number of iterations for iterative algorithms are all examples of resources.

These techniques seek to analyze orders of magnitude more hyperparameter configurations than approaches that evenly train all configurations to completion, hence discovering appropriate hyperparameters rapidly. The hyperband is designed to accelerate the random search by providing a simple and theoretically sound starting point.

Hyperband calls the SuccessiveHalving technique introduced for hyperparameter optimization a subroutine and enhances it. The original Successive Halving method is named from the theory behind it: uniformly distribute a budget to a collection of hyperparameter configurations, evaluate the performance of all configurations, discard the worst half, and repeat until only one configuration remains. More promising combinations receive exponentially more resources from the algorithm.

The Hyperband algorithm is made up of two parts.

Each loop that executes the SuccessiveHalving within Hyperband is referred to as a bracket. Each bracket is intended to consume a portion of the entire resource budget and corresponds to a distinct tradeoff between n and B/n. As a result, a single Hyperband execution has a limited budget. Two inputs are required for hyperband.

The two inputs determine how many distinct brackets are examined; particularly, various configuration settings. Hyperband starts with the most aggressive bracket, which configures configuration to maximize exploration while requiring that at least one configuration be allotted R resources. Each consecutive bracket decreases the number of configurations by a factor until the last bracket, which allocates resources to all configurations. As a result, Hyperband does a geometric search in the average budget per configuration, eliminating the requirement to choose the number of configurations for a set budget at a certain cost.

Since the arms are autonomous and sampled at random, the hyperband has the potential to be parallelized. The simplest basic parallelization approach is to distribute individual Successive Halving brackets to separate computers. With this article, we have understood bandit-based hyperparameter tuning algorithm and its variation from bayesian optimization.

Original post:
Speed-up hyperparameter tuning in deep learning with Keras hyperband tuner - Analytics India Magazine

Related Posts
This entry was posted in $1$s. Bookmark the permalink.