If you’ve been in any of the more technical corners of the internet lately, you’ve probably heard of ChatGPT.
Recently it has risen to prominence as a potential Google Search “killer” and a significant step toward artificial generalised intelligence – with some commentators calling it the “iPhone moment” for AI. As of January 2023, ChatGPT had become the fastest growing consumer application in history – within 5 days it amassed 1 million active monthly users. In comparison, it took Netflix 3.5 years to hit the 1 million mark, while Facebook took over 10 months.
The clamour resulted in Google’s parent company, Alphabet, issuing a “code red” at the start of February 2023. Google’s subsequent actions in demonstrating their alternative solution “Bard” and its failure to answer a specific question resulted in Alphabet losing $100 billion in its stock market value!
Clearly ChatGPT is a force to be reckoned with. But how exactly does AI’s newest rising star work? And what does it mean for businesses that are already investing in AI?
What is ChatGPT?
To understand ChatGPT, one needs to understand the technology that underpins it: Large Language Models or LLMs. LLMs are foundation models that are built using deep learning and are used in natural language processing (NLP) and natural language generation (NLG) tasks.
ChatGPT is based on the Generative Pre-trained Transformer (GPT) architecture, which was developed by OpenAI. Generative models learn patterns in input data to generate new data instances and are finding their application in fields like image generation (Dall-E 2), music generation (Sounddraw), and text generation (ChatGPT).
The Foundation: Transformer Architecture
The transformer architecture was ironically developed by Google Brain in 2017 with their revolutionary paper “Attention is all you need” (Vaswani et al).
At the time, the most performant models applied to NLP tasks – such as machine translation, summarization, and question answering – were the sequence-to-sequence (seq2seq) models, based on recurrent neural networks (RNNs).
However, these RNNs had limitations in modelling long-term dependencies and parallelisation. The new transformer architecture and its inherent attention mechanism resolved these issues and allowed models to be trained more efficiently – soon after achieving the state-of-the-art results on nearly every NLP task.
There have been many implementations of this architecture, with massive success in natural language processing (NLP). However, the various models largely differ only in the number of transformer layers, the volume of data they are trained with and the objective they are given at the outset.
The evolution of ChatGPT
There have been several versions of GPT, with the latest model (Davinci) containing some 175 billion parameters. Davinci was trained on a huge corpus of text not publicly disclosed but likely including billions of web pages, books, etc.
ChatGPT itself is much smaller and is trained on a less diverse data set. However, this is expected to change following Microsoft’s investment in OpenAI earlier this year, which included plans to incorporate ChatGPT into the Bing search engine.
For many years, Google has dominated the search space (over 90% of all searches are “googled”) and the associated advertising revenue. The advent of ChatGPT has, for the first time since the early 2000s, put this dominance in doubt. All this has generated a buzz around AI with many businesses asking what this means for their industry and how can it be leveraged to gain a competitive advantage.
What does this mean for businesses?
For businesses that are already investing in AI, the growth of ChatGPT (and other LLMs / generative AI) won’t make older models and methods obsolete. Instead, it opens up new use cases and offers the potential to accelerate development. In fact, at Inawisdom, we’re already using generative AI to support customer projects.
It does mean that innovation is increasingly important to maintain competitive advantage, as AI becomes more accessible and more business look to embed these tools across their organisation.
So whether you’re well into your AI journey or just starting out, there are several domains where LLMs and generative AI can be applied. Here’s a quick summary:
- Content Creation: Generative AI models can be used to create new and original content, such as images, videos, music, and text. This can be useful for content creators, marketers, and advertisers looking to generate unique and engaging content. In financial institutions, this is being applied to generate legal documents or insurance contracts. It’s even being used in software development to write code (and make software engineers nervous J). With tools such as Github co-pilot, the aim is to significantly decrease time to market for product development.
- Product Design: Generative AI can be used to create new designs for products, such as clothing, furniture, and even buildings. This can help designers and architects explore new design ideas and create more innovative products. In manufacturing, this is being applied to create digital twins for new product development or to simulate running systems (see below) and the impact of changes.
- Personalization: Generative AI can be used to create personalized content or experiences for individual users, based on their preferences and behaviour. This can be useful for companies looking to personalize their marketing efforts or to generate personalized recommendations for individuals. It also has applications in chat-based applications such as call centres or support desks, with more realistic and tailored responses helping to address previous limitations of automated chat systems.
- Data Augmentation: Generative AI can be used to generate new data samples that can be used to augment existing datasets. This can help improve the performance of machine learning models by providing more diverse and representative data. This is something Inawisdom has actively applied to our discovery process to augment customer-supplied data and improve the accuracy of our models.
- Simulation: Generative AI can be used to create simulations of real-world scenarios, such as weather patterns, traffic flow, or even human behaviour. This can be useful for researchers, manufacturing, city planners, and policymakers looking to better understand and predict complex systems.
There are areas where generative AI models such as GPT-3 are not directly applicable – namely, where understanding of the language is important, such as in classification or sentiment analysis. For this there are better models, such as the Bidirectional Encoder Representations from Transformers (BERT) variations. Due to the size of an LLM, inference times can also be large – so if response times are important, this can be problematic. However, as they support “few-shot learning”, they perform well when there are only a small number of examples and data is limited.
The choice of which flavour of transformer to use is therefore dependent on the specific task and the context. The choice of model is key – which is where consultants like Inawisdom can help. Transfer learning makes training quicker with only small amounts of data. Once the problem is understood and the business benefit clear, the appropriate model can be selected with impressive results, irrespective of the domain.
Final thoughts – and a few caveats
Due to the size of LLMs, it is not practical for smaller companies to run their own versions as training costs of an LLM can be millions of dollars per year. Even just fine-tuning the models and hosting the inference endpoint would cost hundreds of thousands annually. Companies like AWS and Microsoft are therefore in the process of exposing large language models, like GPT, via application programming interfaces (APIs) and homing in on the business case for end users and companies. It is not clear how this will affect data privacy, bias and security, which would be important factors for business adoption.
Large GPT-based models also suffer the scourge of explainability, whereas the smaller open-source language models have many libraries that can add explanation of the decisions made (SHAP etc). In addition, LLMs can often “hallucinate” and make up facts whilst being confident they are correct!
This is particularly important to consider in regulated industries like finance, where making decisions based on fact is critical – as is the ability to explain why a model has made a particular decision. For example, a loan application should provide a full detail of why an end-user was accepted or rejected and this should not be based on spurious assumptions. Going forward we expect this to be addressed, but for now it is an important factor when looking to use an LLM.
Many of the LLMs do not currently have a wide variety of supported languages and tend to be primarily English based. When a key aim of business problem requires localised responses, then this can be problematic.
Over the coming months, all the major tech companies plan to launch LLM-based products (see resources below), some of these will incorporate LLMs in their own products and some will make services available for use by other businesses. Inawisdom has been making use of language models for some time now, so improvements to the services the cloud companies provide will only enhance the products we deliver for our customers.
And as a final note… this article was not written in any way by ChatGPT, honest 😄