Large Language Models In Generative Artificial Intelligence

Claims and assertions that LLMs will be sentient and will pass the Turing test seem farfetched. That said, they are being trained and fine-tuned to get better by the day

Photo Credit :

1573625539_nGMxtr_ai.jpg

13 June, 2023

by Jayesh Shah

Print this article Font size -16+

ChatGPT and Bard are the talk of the town, these days. Both are conversational Artificial Intelligence (AI) bots that can respond to and communicate with humans to provide comprehensive information in response to queries. They can be a virtual assistant and accomplish tasks like organising meetings and replying to messages and emails. They can also create content like blogs and scripts, generate and debug code, and perform the role of a sophisticated search engine with intelligent responses. The question is, what is under the hood of these fascinating tools? It is the LLMs or Large Language Models.

How do LLMs work

LLMs are designed to understand natural language. These models are trained on extremely large datasets that have millions of parameters, in some cases, even billions. Simply put, an LLM consists of text data that tools like ChatGPT and Bard can parse through to generate a response to the query input by a user. This data comes from a variety of sources like books, magazine articles, Wikipedia, social media, etc. The more the parameters, the deeper the knowledge of the model and better the responses one can get from it.

LLMs use transformer neural networks, a type of deep learning algorithm. In fact, GPT in ChatGPT stands for generative pre-trained transformer. The transformer algorithm analyses data sequentially, spots patterns in the phrases to understand context and predicts the next word based on the context as it comprehends it. It then goes on to predict the next sequence of words till the response is complete. Thus, LLMs can also be called next word prediction engines.

How are LLMs trained

LLM training is spread over three phases –

In the pre-training stage, the model imbibes grammatical rules of the language from the large volume of text data that has been obtained from various sources. It understands the meaning of individual words and the arrangement of those words to build sentences. During this period, it also learns about word patterns.

The next step is fine tuning the model to assist it to start identifying concepts with a higher degree of accuracy using a test dataset. The process is repeated continuously by making various adjustments until a satisfactory level of performance is reached.

In the final stage, the trained model can be put to use and will start generating responses based on the level of training imparted to it. It also self-learns and improves itself.

Some of the well known LLMs

GPT-3.5: developed by OpenAI, it is the engine that fuels ChatGPT. It has over 175 billion parameters.

LaMDA: or Language Model for Dialogue Applications (LaMDA) runs Google’s Bard AI. Its advanced version called LaMDA 2 incorporates Google's Pathways Language Model (PaLM), which has a count of 540 billion parameters.

LLaMA: developed by Meta AI, it is part of their commitment to open science and democratising access to LLMs. It has in the range of 7 billion to 65 billion parameters.

WuDao 2.0: known as Enlightenment or Road to Awareness, it comes from the Beijing Academy of Artificial Intelligence. It is the largest model in existence with 1.75 trillion parameters. It has been trained on 4.9 terabytes of high-quality text and image data.

PaLM 2: this is Google’s latest release. While Google hasn’t disclosed data on the number of parameters in this model, the claim is that it has been trained on multilingual text from over 100 languages.

Uses of LLMs

LLMs can be used for a variety of applications, some of them being –

Translation: from one language to another
Conversational AI: chatbots and other conversational AI platforms that can engage with users in natural language
Search: improve quality of search results and give more meaningful responses
Code generation: generate and debug code
Content creation: generate news articles, summaries, headlines
Sentiment analysis: analyse text and categorise it into positive, negative or neutral, helping organisations gauge customer sentiment

Benefits and limitations

Like any other technology advancement, LLMs bring along with it benefits as well as challenges. The most obvious benefit of using an LLM is the speed with which it can process extremely large volumes of data and create a response, whether it is answering a search query or generating content or code. In addition to speed, there is a great deal of accuracy and while there are occasional blips, the results produced are by and large accurate and improving rapidly. LLMs, by their very nature are highly flexible and versatile, and can be customised to a business use case as required by the organisation using them.

The key challenge is the training and operational costs associated with running them because of the amount of data and computing power needed, which could be to the tune of millions of GPU hours. While LLMs are highly trained for contextual understanding, they don’t always get it right and this leads to wrong or sometimes downright inappropriate responses. Bias is another risk associated with them. Any bias in the data on which they are trained can percolate down to the responses they provide.

Future of LLMs

Claims and assertions that LLMs will be sentient and will pass the Turing test seem farfetched. That said, they are being trained and fine-tuned to get better by the day.

The phenomenal pace at which ChatGPT has been adopted clearly demonstrates that LLM technology is here to stay and grow exponentially. According to available data ChatGPT had over 100 million visitors in the first three months after its launch and now has around 13 million daily users. The estimated reach of Google Bard is 1 billion users, which in effect means 1 out of every 8 people worldwide will use it. The stats speak for themselves; need one say more?

Disclaimer: The views expressed in the article above are those of the authors' and do not necessarily represent or reflect the views of this publishing house. Unless otherwise noted, the author is writing in his/her personal capacity. They are not intended and should not be thought to represent official ideas, attitudes, or policies of any agency or institution.