23. July 2024 By Patrick Flege
Generative AI – A primer into the past, present, and future of intelligent content generation
We are currently living in an exciting period for business and science, with artificial intelligence poised to change our lives more and more. AI has been compared to electricity in its social and economic consequences. This blog post serves as a brief primer on where a novel technology, generative AI or GenAI, systems that generate content, instead of just analyzing it, came from, and where opportunities and risks lie. If you are interested in how adesso can help your company with GenAI, have a look here.
GenAI – The new electricity
In the last couple of years, we’ve witnessed a remarkable expansion in the capabilities of Artificial Intelligence (AI). Such a development is not without precedent – in the 1970s and 1980s, explosions in government and private funding for machine learning and AI happened likewise yet were followed by what is often referred to as an ‘AI-Winter’ – a period of long stagnation in investment and progress in AI. However, this time, it is poised to be different. Already in 2017, Stanford scientist, Google Brain founder, and former chief scientist at Baidu, Andrew Ng, predicted that advances in hardware would enable continuous progress for years to come. He was not mistaken. Thanks to an architectural pattern called Neural Networks, often also referred to as ‘Deep Learning’, and advances in processing power, the capabilities of AI improved continuously. In 2017, with the advance of a new type of architecture, the transformer-model, content-generation capabilities of computer systems again took a leap. Yet it was not until the release of ChatGPT by OpenAI that intelligent, content-generating AI-systems, also known as Generative AI or GenAI, became omnipresent in daily life.
While much hype, good and bad, has surrounded GenAI, the economic benefits and opportunities are tangible, and cannot be overstated. In 2017, McKinsey estimated that the application of AI could add up to 15.4 $ trillion to the annual economic value of the global economy in the coming decades. In their 2023 report, McKinsey updated this estimate, to include up to 4.4$ trillion, generated annually from the adaptation of GenAI in businesses. In comparison, the GDP of Britain in 2021 was 3.1 $ trillion(see here for the full report). Many of these productivity gains could be realized in knowledge-intensive sectors, such as banking, sales, R&D, the life sciences, and software engineering. According to Google’s and MIT’s Andrew McAfee, GenAI is a transformative technology, like the steam engine or electricity (see here). Like these, it will most likely generate strong growth in and demand for new professions using this kind of technology. Here at adesso, we are at the forefront of this development. For our clients, we are currently developing a broad portfolio of technologies that harness the power of GenAI. More information about the solutions that we provide can be found above.
Yet for all its promises, GenAI remains somewhat of a mystery for most people, even those whose work might be transformed drastically by it. Let’s get a grasp of what hoe GenAI models work, why they are so powerful, and what some of their pitfalls are.
GenAI and DeepLearning
Most generative AI–models are a type of model called Large Language Models (LLM). As the name suggests, these models have their origin in the processing of language. Modern LLMs represent what are called ‘Foundation Models’. Such models can solve a diverse array of problems, not just one task. Earlier architectures and models excelled at one thing only – such as recognizing cats in a picture. Foundation models’ capabilities, in contrast, are generalizable to a large swath of tasks. With regards to language, think about a model that can only translate from English to French. A model that can only do this is not a foundation model. Modern systems, like OpenAI’s family of GPTs (Generative Pre-Trained Transformer), are in contrast capable of handling many tasks – they can translate, summarize texts, tell jokes, etc. Most LLMs today are foundation models, but strictly speaking, not all LLMs are foundation models. GenAI is technically independent of those terms, meaning an AI system that creates content, instead of merely classifying or counting objects. Yet the best-known GenAI systems are LLMs which are also foundation models – got it?
Neural Networks and Transformers
GenAI could not have advanced without a simultaneous increase in the quantity of data made available by digitalization. Why? LLMs, which make up many GenAIs, are built on top of a computer architecture called Neural Networks (NN). As the name suggests, the basic principle behind them is to mimic human neurons, although the analogy only goes so far. They take many different input signals (say, mathematical representations of words or sentences), and ‘fire’ if the input exceeds a certain level, just like your neurons as you read this sentence. Stack many of those neurons together in layers and take the output of one layer as the input to the next – voila, a simple neural network. Each neuron has several parameters (it represents basically a mathematical, statistical equation), which must be tuned to generate a good output signal (for example, a good translation into French). NN models can be big – billions of parameters (it is estimated that, although no public information is available, the OpenAIs GPT-4 family has around a trillion learned parameters, and Metas yet-to-be-released 400B Llama has 400 billion and their performance is impressive. But such huge models only make sense if they have a lot of data to be trained on. To see why, it helps to put on our history goggles – NN have been around since the 1970s (around 50 years!), yet for most of the time, they have been seen as inferior to other techniques. One reason was their complexity and computing costs. It was only with the advance of big data that one could harness its full potential, blowing its performance through the roof. Add to this the need for stronger computing (nowadays often provided in the form of Graphical Processing Units), and we can see why it took so long for neural networks to take off.
The final piece of the puzzle was the inception of a new kind of NN architecture. Previously, language-based tasks posed one big problem – that the next word in a sentence might be depending on a word further back. As NN only took the input of the previous layers, a workaround was necessary. Until 2017, this workaround consisted of an architecture called Long-Short-Term-Memory Recurrent Neural Networks (LSTM-RNN). Such networks are called ‘recurrent’ because the parameters for the NN are the same in each layer. These networks suffered from shortcomings inthe computing capacity that they needed to accurately predict the next word in any text-completion task, like translation.They had to store more and more of the previously seen text in memory to predict the next word. With a large text corpus, that method quickly ran into bottlenecks. That all changed in 2017 when scientists from Google Brain and the University of Toronto came up with a sophisticated architecture.
This architecture was called the transformer. A seminal paper (‘Attention is all you need’, available here) by this team laid out this new architecture. It enabled efficient scaling of processing by using parallelization and powerful hardware components like GPUs. Amazingly, transformer models could easily learn various relationships between words in documents and use them to generate texts and other documents in an almost human-like manner. Such relationships were learned by a procedure called ‘Multi-Head-Attention’ – a head is a mathematical description of one kind of relationship between words in the text. Using and incorporating many heads inside the transformer, the intricacies of language could now be captured by the model. Transformers now build the foundation for almost every LLM, although the original architecture has since been adapted for different tasks.
Training and Big Data
Transformer-based LLMs are trained on massive corpora of data with self-supervised learning – a process whereby the model may try to, for example, predict the next item in a piece of text and change its parameters if it is wrong. Later, to be more effective at specific niche tasks, AI-engineers, and data scientists will represent the model with prompt-completion-pairs and punish the model if the completion is inadequate. A prompt is what we enter, for example, into ChatGPT, and the completion is its answer. Without the explosion in digital data we have available in the last few years, the crucial first step in training LLMs mentioned before could not have happened. OpenAI’s GPT-4o was trained on approximately 570 GB of digitalized data from all over the internet. The energetic costs of this training are non-negligible – to train a model of this size emits as much as 5 cars running over the lifetime of a human being.
Apart from possible environmental costs, other issues may arise with Large Language Models – let’s dive into some.
Models do not perform well on my task
Most LLMs are like Swiss-army knives – relatively good at a lot of things, but perhaps excelling at none. Businesses can choose to fine-tune a model with specific labeled data (data that is marked as desirable or undesirable, for example) so that it gets better at their task. One problem that may arise is called catastrophic forgetting, where the model changes so much that it cannot perform many of its initial tasks well anymore, even though it improves on your business task. Many tools are available to solve this, such as multitask-learning, a technique where the model is trained simultaneously on multiple different skills, or parameter-efficient fine-tuning (PEFT). PEFT is a lightweight procedure to either train only a few model parameters or to create an ‘adapter’ for a specific task, which is much less compute-consuming than re-training the whole model. Basically, most of the original model parameters are just added to this adapter (see this paper for an overview of methods: https://arxiv.org/abs/2312.12148)
Models are outdated
Models have only trained on documents that are available before a certain cutoff dates. The question ‘Who is the prime minister of the Netherlands?’ will be answered incorrectly by, say, GPT-4o or Llama3 in a few months. Take this into account when building solutions. An effective way to address such a shortcoming is Retrieval Augmented Generation (RAG), where the static model knowledge is enriched with documents specific to your use case. adesso implements several GenAI solutions which make use of RAG to solve our customer’s needs. Check out our website for more about our portfolio of GenAI solutions.
Bias in models
The old programming adage GIGO (Garbage-in, Garbage-out) holds true also for LLMs. As many models are also trained on texts decades or even centuries old, unhealthy stereotypes about, for example, gender can sneak into the models’ evaluations. In 2017, a team of Microsoft researchers found that Language models tend to associate lucrative, engineering professions with men, and household and lower-paid jobs with women (see this work). While there are ways to address this issue using mathematical procedures that adjust the mathematical representations of text for LLMs, they can still generate responses biased towards certain groups (see here). Encouragingly, the extent of bias seems to decrease with newer, bigger models!
Toxic language use and inappropriate responses
LLMs themselves are just mathematical models, and therefore often cannot tell which responses are considered ‘bad’ or unethical by human standards. A model may give ‘helpful’ responses to prompts that are unethical (‘How could I best hack my neighbor’s WiFi?’). To address this problem, a technique called Reinforcement Learning from Human Feedback (RLHF), where the model is rewarded for desired prompt completions and punished for undesired ones, can alleviate this issue (see here). In Reinforcement learning, an agent (the LLM) learns new behavior (updated parameters) based on feedback, or reinforcement, from the environment. If available, humans are the best judges to punish or reward a model, but nowadays, specific LLMs exist to supervise bigger LLMs and hand out rewards or punishments.
Hallucinations
LLMs are trained on large corpora of data but may fail to distinguish between fact and fiction. It is important for GenAI practitioners and the public to realize that GenAI is based merely on statistical patterns in input-data, and may sometimes generate plausible-sounding content, which is incorrect. Such behavior is referred to as hallucination (link). Hallucination may happen because LLMs tend to overgeneralize from data they have encountered. LLMs do not have an actual understanding of the content. Solutions to this ongoing problem, which can have detrimental consequences if made-up content spreads quickly, include RAG with up-to-date backend systems, RHLF or other forms of human auditing, or training the model with more accurate data.
Future directions
GenAI is an exciting area of development and can greatly benefit companies and civil society. Some promising research directions and developments include efforts to make GenAI better explainable to the public and practitioners alike – models often are perceived as black boxes that generate content. Furthermore, GenAI recently expanded to other modalities, such as video, audio, and even whole movies. Ongoing efforts to reduce the size of models and concurrently improve their efficiency will deliver the same or better services at lower costs and less energy use. Lastly, new and specialized tools using GenAI will free workers from many arduous tasks, and broaden the time available for more creative and engaging activities, opening up a new era of productivity. At adesso, we are excited to be actively engaged in this new frontier.