Large Language Models - A Primer

April 17, 2023 7 minute read

Two-second Summary

Large Language Models (LLMs) are artificial intelligence systems that can analyze, understand, and generate human language. These models are designed to learn the patterns and structures of natural language by processing vast amounts of text data.

Brief history of LLM

In 2012, researchers at the University of Toronto and Google developed the first neural language model, called Word2Vec. It was able to learn word embeddings that could capture the semantic relationships between words. This was a major breakthrough in the field and it paved the way for the development of larger and more complex language models.
In 2018, Google developed BERT, a large pre-trained language model. BERT has achieved state-of-the-art results on many NLP benchmarks, and it has been used for a variety of NLP tasks, including sentiment analysis, named entity recognition, and question answering. The main challenge with BERT models is because it is a complex model with millions of parameters, training this model requires considerable data and computational power, resulting in high costs and time consumption.
The same year, researchers at OpenAI developed the first GPT (Generative Pre-trained Transformer) model, which was able to generate human-like text and perform a wide range of NLP tasks with high accuracy.

GPT vs BERT

The primary difference between GPT family models and BERT lies in their architectures, training data, and objectives. For instance, BERT is designed to perform specific tasks, such as sentiment analysis, language translation, or speech recognition, meaning that it can be trained on a smaller dataset to perform a specific language-based task with high accuracy. On the other hand, GPT is trainred on a large corpus of publicly available data, hence it is more suitable for tasks that require generating coherent and meaningful language, such as holding a conversation and content creation.

ChatGPT

ChatGPT, developed by OpenAI, has gained immense popularity due to its exceptional conversational abilities. It has been trained on a wide range of conversational text, and fine-tuned to excel at tasks such as question answering and dialogue generation. Furthermore, its user-friendly interface makes it highly versatile and adaptable to various use cases, even beyond developers.

One of the most remarkable features of ChatGPT is its ability to generate human-like responses. This is primarily due to its use of reinforcement learning from human feedback (RLHF). ChatGPT employs this technique to rank the responses generated by the initial model and learn from human rankings to select the best human-like response, resulting in more natural and coherent conversations.

Use Cases

For Corporations	For individuals
Chatbots that are more personalised	Text summarization and generation
Integration with existing work applications (e.g. Slack, G-Drive)	Grammar correction
Accelerate content creation and customer personalisation	Explain difficult concepts like I’m 5, or a PhD student
Email classification, summarisation and automated response	Translate text too different languages
Enhance team productivity and creativity, for instance generate meeting agenda.	Write and explain code, and even translate to another coding language
Create new text-based products	Turn a product description to an ad copy
	Integration with 3rd party apps – the possibilities are endless!

Model Architecture of ChatGPT

ChatGPT belongs to the GPT family of language models. Let’s zoom in on GPT-3, which comprises an encoder, attention layers, a feedforward network, a decoder, and a softmax layer. To achieve its impressive language generation capabilities, GPT-3 uses causal language modeling. This means that the model predicts the next token in a sequence of tokens, with a constraint that it can only attend to tokens on the left. Here are the steps involved by ChatGPT to generate text:

The input sequence for GPT-3 is fixed at 2048 words, but shorter sequences can still be used by filling the extra positions with “empty” values.
To encode the input sequence, the encoder first converts it into a one-hot vector and then compresses it into a smaller dimensional space called an embedding vector to save space.
Meanwhile, GPT-3 also encodes the position of each token in the sequence, but does not reduce its size to form an embedding.
The position encodings and input embeddings are combined into a single matrix, which is then fed into the attention layers.
In simple terms, the attention layer predicts which input tokens to focus on and how much for each output in the sequence. The input matrix is transformed into three separate matrices - queries, keys, and values - and matrix manipulations are performed among them to select the most important token.
This process is repeated 96 times in GPT, which is why it is called multi-head attention.
The output of the attention layers is then passed into a feed-forward block in a multi-layer perceptron.
The resulting matrix contains, for each of the 2048 output positions in the sequence, a 12288-vector of information about which word should appear. To generate text, this matrix is decoded using a “decoder”.
When GPT-3 generates text, it doesn’t just provide a single guess for the next word. Instead, it generates a sequence of guesses - one for each of the 2048 “next” positions in the sequence - with each guess representing the probability of a likely word.

Limitation of ChatGPT

Hallucination - ChatGPT can generate highly creative but potentially inaccurate information, and therefore should not be used for decision-making without human involvement. Although the AI model is continuously improving, it cannot understand cause and effect, reason like a human, or produce sensible moves in games like chess. It is a useful tool for ideation and creativity, but critical thinking and validation should remain the responsibility of humans. The output of ChatGPT is not a reliable source of factual information and should not be used without human supervision.
Data security and privacy - Studies have demonstrated that large models like ChatGPT can be vulnerable to privacy intrusion issues, where personally identifiable information (PII) can be extracted from training data using specific prompts or code. As such, businesses must carefully consider data security and privacy concerns when incorporating this technology into their operations. Protecting sensitive information and customer privacy should be a top priority, and guardrails should be established to reduce potential risks.
Fairness and Inclusiveness - Internet-scale systems are prone to bias, which can have unintended negative consequences for minority groups, such as perpetuating bias in algorithms and increasing error rates in facial recognition. Additionally, the digital divide may prevent minority groups from accessing the benefits of technological advancements. As a result, it is important to develop and deploy new technologies responsibly and equitably. While ChatGPT uses a Moderation API to block unsafe content, it may not effectively address the propagation of unfairness and bias within the system.

Recent Trends (as of April 2023)

Microsoft has invested $10 billion in OpenAI and recently released their latest conversational AI solution, the Bing chatbot. Unlike ChatGPT, which can only retrieve information up until 2022 based on the data it was trained on, “the new Bing” is able to retrieve information about recent news and events.
In mid-March, OpenAI announced their latest breakthrough - the GPT-4 model. GPT-4 is able to handle more complex conversational tasks compared to ChatGPT. The new model is versatile and can accept images as input as well as text.
Google has its own conversational AI system called BARD, and they have released the PaLM API.
Meta released LLaMA, a smaller and more performant model compared to ChatGPT. They intend to grant access to users on a case–by-case basis.
Amazon has introduced a cloud service called Bedrock that developers can use to enhance their software with artificial intelligence systems that can generate text. Through its Bedrock generative AI service, AWS will offer access to its own first-party language models called Titan, and a model for turning text into images from startup Stability AI.

Segue - Prompt engineering

Prompt engineering is the process of designing and refining prompts to guide generative AI systems, particularly in language and image models. It is crucial for achieving high-quality results, but can be challenging and time-consuming. Prompt engineering is becoming more popular due to the increasing demand for generative AI applications, and some creators are already offering their prompts on marketplaces like PromptBase. However, there are concerns that people may overestimate the technical rigor and reliability of results obtained from a constantly evolving black box. Crafting appropriate prompts requires meticulous exploration of possibilities and figuring out why and when AI produces inaccurate results. The field of prompt engineering is evolving, and new strategies and techniques may become necessary to keep pace with emerging trends and challenges. Despite limitations, the potential benefits of these technologies are vast and far-reaching.

Stay tuned for more content on Large Language Models!

Twitter Facebook LinkedIn

Ivan