How to Build a Chat Interface using Gradio & Vultr Cloud GPU SitePoint – SitePoint

Posted: February 26, 2024 at 12:17 am

This article was created in partnership with Vultr. Thank you for supporting the partners who make SitePoint possible.

Gradio is a Python library that simplifies the process of deploying and sharing machine learning models by providing a user-friendly interface that requires minimal code. You can use it to create customizable interfaces and share them conveniently using a public link for other users.

In this guide, youll be creating a web interface where you can interact with the Mistral 7B large language model through the input field and see model outputs displayed in real time on the interface.

On the deployed instance, you need to install some packages for creating a Gradio application. However, you dont need to install packages like the NVIDIA CUDA Toolkit, cuDNN, and PyTorch, as they come pre-installed on the Vultr GPU Stack instances.

Follow the next steps for populating this file.

The above code snippet imports all the required modules in the namespace for inferring the Mistral 7B large language model and launching a Gradio chat interface.

The above code snippet initializes model, tokenizer and enable CUDA processing.

The above code snippets inherits a new class named StopOnTokens from the StoppingCriteria class.

The above code snippet defines variables for StopOnToken() object and storing the conversation history. It formats the history by pairing each of the message with its response and providing tags to determine whether it is from a human or a bot.

The code snippet in the next step is to be pasted inside the predict() function as well.

The streamer requests for new tokens from the model and receives them one by one ensuring a continuous flow of text output.

You can adjust the model parameters such as max_new_tokens, top_p, top_k, and temperature to manipulate the model response. To know more about these parameters you can refer to How to Use TII Falcon Large Language Model on Vultr Cloud GPU.

Gradio uses the port 7860 by default.

Executing the application for the first time can take additional time for downloading the checkpoints for the Mistral 7B large language model and loading it on to the GPU. This procedure may take anywhere from 5 mins to 10 mins depending on your hardware, internet connectivity and so on.

Once it executes, you can access the Gradio chat interface via your web browser by navigating to:

The expected output is shown below.

In this guide, you used Gradio to build a chat interface and infer the Mistral 7B model by Mistral AI using Vultr GPU Stack.

This is a sponsored article by Vultr. Vultr is the worlds largest privately-held cloud computing platform. A favorite with developers, Vultr has served over 1.5 million customers across 185 countries with flexible, scalable, global Cloud Compute, Cloud GPU, Bare Metal, and Cloud Storage solutions. Learn more about Vultr.

See the article here:

How to Build a Chat Interface using Gradio & Vultr Cloud GPU SitePoint - SitePoint

Related Posts