Generative artificial intelligence (GenAI) is artificial intelligence that can generate text, images, or other media, using predictive modeling. Here’s how it works.
GenAI models are initially trained on large datasets.
- Text generators are trained on large datasets of existing text, such as books, articles, or websites.
- Image generators are trained on extensive datasets of images. Each image consists of a grid of pixels, with each pixel having color values and positions.
- Audio and video generators are trained on datasets containing audio clips or video frames, which are sequences of images displayed rapidly.
GenAI models learn to recognize patterns in the training data and build predictive models based on this learning.
- Text generators learn the context in which words and phrases commonly appear and use linguistic and grammatical rules to predict the next word or phrase and generate sentences or paragraphs.
- Image generators learn patterns in images, identifying shapes, objects, colours, and textures, and use spatial relationships between elements and colours to predict and generate pixels.
- Audio/video generators, in addition to recognizing static image features, learn how sounds or images evolve in a sequence, and use these temporal and spatial relationships to generate video frames and/or audio segments.
If you’re interested in learning more about how this process works, you can check out this visual explainer.
You can further refine the generated content – directly, by providing feedback to the AI tool, or by editing your original prompt – to meet your specific needs. You’ll learn more about this when we practice using and AI tool in the Practice! tab of this learning module.
What’s the difference between a GenAI model and a GenAI tool?
A GenAI model is the underlying technology or algorithm that enables the generation of content. A GenAI tool is the user interface or service that allows users to access and interact with the generative AI model. For example, GPT (Generative Pre-trained Transformer) is one of the most popular LLMs (there are currently two versions – GPT-3.5 and GPT-4), whereas ChatGPT is the natural language chatbot that uses GPT-3.5 or GPT-4 to generate content based on user inputs.
There are many GenAI tools available – resource directories like There’s an AI for That list thousands, with more being added each day. But it’s most helpful to start with the core foundational models because most AI tools are running on top of or taking advantage of these models. Understanding how to use these foundational models directly is the most powerful and easiest way to gain experience with AI.
Click on the cards below to learn more about some of the features and functionality of the most common GenAI models.
Foundation Models and Large Language Models
Foundation models describe a class of AI systems that can learn from a large amount of data and perform a wide range of tasks across different domains. Foundation models are not limited to language, but can also handle other modalities like images, audio, and video. Foundation models are so called because they act as the “foundation” for many other uses, like answering questions, making summaries, translating, and more. Large language models (LLMs) are a specific type of foundation models that are trained on massive amounts of text data and can generate natural language responses or perform text-based tasks.
Foundation models are very general and broad, and they may not capture the nuances and details of every domain or task. You can “fine-tune” or adapt foundation models to improve the performance and quality of the model outputs by providing additional data and training that are relevant to a specific subject area or task. For example, if you want to use a foundation model like GPT-4 to generate summaries of news articles, you can fine-tune it on a dataset of news articles and their summaries. This helps the model learn the specific style, vocabulary, and structure of news summaries, and generate more accurate and coherent outputs.
Information Box Group
Learn More
Want to learn more about Large Language Models? Check out Wharton Interactive’s video
References
Andrei. There’s An AI For That (TAAFT)—The #1 AI Aggregator. There’s An AI For That. Retrieved October 26, 2023, from https://theresanaiforthat.com
Anthropic. (2023, May 11). Introducing 100K Context Windows. Anthropic. https://www.anthropic.com/index/100k-context-windows
Anthropic. (2024, March 4). Introducing the next generation of Claude. Announcements. https://www.anthropic.com/news/claude-3-family.
Bai, Y., Kadavath, S., Kundu, S., Askell, A., Kernion, J., Jones, A., Chen, A., Goldie, A., Mirhoseini, A., McKinnon, C., Chen, C., Olsson, C., Olah, C., Hernandez, D., Drain, D., Ganguli, D., Li, D., Tran-Johnson, E., Perez, E., … Kaplan, J. (2022). Constitutional AI: Harmlessness from AI Feedback (arXiv:2212.08073). arXiv. https://doi.org/10.48550/arXiv.2212.08073
Mollick, E. (2023d, September 16). Power and Weirdness: How to Use Bing AI. One Useful Thing. https://www.oneusefulthing.org/p/power-and-weirdness-how-to-use-bing
Mollick, E. (2024, February 8). Google’s Gemini Advanced: Tasting Notes and Implications. One Useful Thing. https://www.oneusefulthing.org/p/google-gemini-advanced-tasting-notes.
Murgia, M. and the Visual Storytelling Team. (2023, September 12). Generative AI exists because of the transformer. Financial Times. https://ig.ft.com/generative-ai.
OpenAI. (2023, September 25). ChatGPT can now see, hear, and speak. https://openai.com/blog/chatgpt-can-now-see-hear-and-speak
Reuters. (2023, September 27). ChatGPT users can now browse internet, OpenAI says. Reuters. https://www.reuters.com/technology/openai-says-chatgpt-can-now-browse-internet-2023-09-27/
Stewart, E. (2024, February 14). Google’s Bard Has Just Become Gemini. What’s Different? Enterprise Management 360. https://em360tech.com/tech-article/gemini-vs-bard.
Wharton School. (2023b, August 1). Practical AI for Instructors and Students Part 2: Large Language Models (LLMs)—YouTube. YouTube. https://www.youtube.com/watch?v=ZRf2BfDLlIA