AI Terminology Glossary

Activation Function

In a neural network, this is a mathematical function that determines the output of a neuron. It introduces non-linearity to the model, allowing it to learn complex patterns. Examples include ReLU (Rectified Linear Unit), Sigmoid, and Tanh.

Adversarial Attack

A technique used to intentionally fool a trained machine learning model by feeding it carefully constructed, subtle inputs (e.g., adding imperceptible noise to an image).

Agent (AI Agent / Agentic System)

An AI system that is capable of autonomously performing complex tasks by setting goals, creating a step-by-step plan, reasoning, executing actions, and utilizing various tools (e.g., web search, code interpreters, external APIs) with minimal or no human intervention. Unlike a Workflow, an Agent dynamically directs its own process and tool usage based on context and feedback. In Reinforcement Learning (RL), the Agent is the learner or decision-maker. It takes actions within an Environment to maximize a cumulative Reward.

AI (Artificial Intelligence)

(In this modern context) Often used as a general term referring to the simulation of human intelligence processes by machines, especially computer systems. It includes learning (the acquisition of information and rules for using the information), reasoning (using rules to reach approximate or definite conclusions), and self-correction.

AI Ethicist

A specialist role focused on the responsible and fair development of AI systems.

Primary Focus: Audits models and datasets for harmful Bias (Algorithmic), develops guidelines for AI Ethics, and advises organizations on the social, legal, and moral implications of their AI deployments, ensuring transparency and accountability.
Goal: To mitigate risks and ensure that AI systems are developed and used in a way that respects human values and adheres to ethical principles.

AI Ethics

The field of study and practice concerned with the moral implications of designing, developing, and deploying AI systems, focusing on issues like bias, fairness, transparency, and accountability.

AI Research Scientist (or Applied Scientist)

A role focused on advancing the theoretical and practical knowledge of AI.

Primary Focus: Conducts experimental research, develops novel algorithms, new Neural Network architectures (like a new Transformer variant), and publishes findings.
Goal: To push the State-of-the-Art (SOTA) and develop breakthroughs that can be applied to future products. Applied Scientists focus on turning this cutting-edge research into proof-of-concept solutions.

Algorithm

A set of well-defined, step-by-step procedures or rules designed to perform a specific task or solve a particular problem, often used for processing data, making calculations, or automated reasoning.

Artificial General Intelligence (AGI)

A hypothetical type of AI that possesses the ability to understand, learn, and apply its intelligence to solve any problem a human being can, essentially exhibiting human-level cognitive capabilities across a wide range of tasks.

Artificial Narrow Intelligence (ANI)

Also known as Weak AI, this is AI designed and trained to perform a narrow, specific task (e.g., facial recognition, voice assistants, playing chess). This is the form of AI we use today.

Attention Mechanism

A technique used in advanced neural networks (especially Transformers for NLP) that allows the model to selectively focus on the most relevant parts of the input data when processing it or making a prediction.

Backpropagation

The core algorithm used to train neural networks. It calculates the gradient of the Loss Function with respect to the weights of the network, and then uses this gradient to adjust the weights, minimizing the error.

Benchmark (AI Benchmark)

A standardized evaluation framework used to quantitatively measure and compare the performance of different AI models or systems on a specific task or set of tasks.

Components: A benchmark typically includes a test dataset, a defined problem specification, and a rigorous metric (e.g., accuracy, F1 score) that results in a comparable score.
Purpose: They act as "exams" for AI, providing a common reference point for researchers and companies to track progress, establish the state-of-the-art (SOTA), and identify model strengths and weaknesses.

Bias (Algorithmic)

A systematic, repeatable error in a computer system that creates unfair outcomes, such as favoring or disadvantaging particular groups of people. It often stems from biased data used to train the model or from design choices.

Big Data

Extremely large datasets that may be analyzed computationally to reveal patterns, trends, and associations, especially concerning human behavior and interactions. Often described by the three Vs: Volume, Velocity, and Variety.

Chatbot

An AI program designed to simulate conversation with human users, especially over the internet. For example, ChatGPT is a chatbot powered by OpenAI's GPT models.

ChatGPT

A conversational AI chatbot developed by OpenAI, powered by the GPT series of large language models, enabling natural language interactions for tasks like question-answering, content generation, and code assistance.

Classification

A type of supervised machine learning task where the model learns to assign an input data point to one of a set of predefined categories or classes (e.g., classifying an email as "spam" or "not spam").

Claude

An AI assistant and large language model family developed by Anthropic, emphasizing safety, helpfulness, and alignment with human values through constitutional AI principles.

Clustering

A type of unsupervised learning task where the goal is to group a set of data points such that those in the same group (cluster) are more similar to each other than to those in other groups.

Cognitive Computing

A technological approach that aims to simulate human thought processes in a computerized model. It involves self-learning systems that use data mining, pattern recognition, and Natural Language Processing (NLP).

Computer Vision (CV)

A field of AI that enables computers and systems to derive meaningful information from digital images, videos, and other visual inputs, and take actions or make recommendations based on that information.

Context Window

The maximum limit of input (Prompt) and output (Completion) Tokens that a Large Language Model (LLM) can process or "remember" in a single request. It defines the model's effective "short-term memory."

Significance: The size of the Context Window determines how much text (e.g., historical conversation, documents for RAG, or detailed instructions) can be included in the Prompt and how long the model's response can be.
Constraint: The total number of tokens (input + output) must not exceed the model's fixed Context Window limit.

Convolutional Neural Network (CNN)

A specialized type of neural network primarily used for analyzing visual imagery. It uses a mathematical operation called convolution to automatically and adaptively learn spatial hierarchies of features from input data.

Data Engineer

A role focused on building and maintaining the infrastructure for large-scale data processing.

Primary Focus: Designs, builds, and manages reliable, scalable data pipelines (ETL/ELT) to ensure that clean, high-quality Big Data is available for Data Scientists and ML Engineers to use for training models and analysis.
Goal: To create the robust data ecosystem that fuels all AI and Machine Learning initiatives.

Data Labeling/Annotation

The process of tagging raw data (such as images, text, or video) with informative labels to provide context for a machine learning model. This is essential for Supervised Learning.

Data Scientist

A role focused on extracting insights and knowledge from data to inform strategic business decisions.

Primary Focus: Performs Exploratory Data Analysis (EDA), statistical modeling, testing hypotheses, and often builds the initial, non-production-ready prototype Machine Learning models (Classification, Regression, etc.) to uncover patterns and make predictions.
Goal: To use data to answer specific business questions and provide actionable recommendations.

Deepfake

Synthetic media (video, audio, or images) that has been digitally manipulated or generated using Deep Learning techniques to replace one person’s likeness convincingly with another’s.

Deep Learning (DL)

A subset of Machine Learning that uses Artificial Neural Networks with multiple layers (hence "deep") to analyze data, learn complex patterns, and make intelligent decisions.

DeepSeek

An open-source large language model series from DeepSeek AI, optimized for coding, mathematics, and reasoning tasks with efficient architectures for high performance on resource-constrained environments.

Deployment

The process of integrating a trained and tested machine learning model into a production environment, making it available for real-world use (e.g., within an application, website, or enterprise system).

Dimensionality Reduction

The process of reducing the number of random variables under consideration by obtaining a set of principal variables, often used to compress data and speed up computation while retaining the most important information.

Discriminator

In a Generative Adversarial Network (GAN), the Discriminator is a neural network trained to distinguish between real data samples and fake (synthetic) data samples created by the Generator.

Environment

In Reinforcement Learning (RL), the Environment is the external world with which the Agent interacts. The Agent performs an action, and the Environment responds with a new state and a Reward.

F1 Score

A common metric used for evaluating the performance of Classification models, particularly on imbalanced datasets. It is the harmonic mean of the model's precision and recall.

Feature Engineering

The process of using domain knowledge to select, transform, and create new variables (features) from raw data, which helps a machine learning algorithm perform better.

Fine-tuning

The process of taking a pre-trained Large Language Model (LLM) or other model and training it further on a smaller, specific dataset to adapt it to a new, more niche task.

Gemini

Google's multimodal large language model family, capable of processing and generating text, images, audio, and video, integrated into products like Google Search and Workspace for versatile AI applications.

Generative Adversarial Network (GAN)

A framework of two neural networks (a Generator and a Discriminator) competing against each other. The Generator creates synthetic data, and the Discriminator tries to distinguish the real data from the synthetic data. Used often in image and video generation.

Generative AI

A type of AI that is capable of creating new content, such as text, images, code, or audio, rather than just classifying or analyzing existing data. ChatGPT and DALL-E are prominent examples.

Generator

In a Generative Adversarial Network (GAN), the Generator is a neural network that is trained to create new data instances (e.g., images, text) that are indistinguishable from the real data to fool the Discriminator.

Gradient

A mathematical concept representing the rate and direction of change of a function. In machine learning, the gradient of the Loss Function indicates how the Loss changes with respect to the model's weights. This is crucial for algorithms like Backpropagation and Gradient Descent.

Gradient Descent

An optimization algorithm used to minimize the cost (loss) function in machine learning. It iteratively adjusts the model's parameters in the direction of the steepest descent of the cost function.

Grok

xAI's large language model, designed for maximum truth-seeking and helpfulness with a witty personality inspired by the Hitchhiker's Guide to the Galaxy, accessible via x.com and mobile apps.

Hallucination (AI)

A phenomenon where a Generative AI model, particularly an LLM, generates plausible-sounding but factually incorrect, nonsensical, or irrelevant output.

Hyperparameter

A configuration variable that is external to the model and whose value cannot be estimated from the data. These parameters must be set by the machine learning engineer *before* the training process begins (for training hyperparameters like learning rate) or *before* the model is used for inference (for inference hyperparameters like Temperature and top_p).

Contrast with Parameters: Model parameters (Weights and biases) are learned internally from the data during training; hyperparameters are set manually to control the learning process or the output generation.

Inference

The process of using a trained machine learning model to make a prediction or decision on new, unseen data. It is the "run-time" stage of the model.

Large Language Model (LLM)

A type of Deep Learning model, often based on the Transformer architecture, trained on massive amounts of text data to understand, generate, and predict human-like language. Some popular LLMs include ChatGPT, Gemini, Claude, Grok, and DeepSeek.

Llama

Meta's open-source large language model series, released for research and commercial use, supporting multilingual tasks, fine-tuning, and deployment in efficient inference frameworks like Hugging Face.

Loss Function (Cost Function)

A function that quantifies the "cost" or penalty associated with an event. In machine learning, it measures the difference between a model's prediction and the actual value. The goal of training is to minimize this function.

Machine Learning (ML)

A subfield of AI focused on building systems that can learn from data, identify patterns, and make decisions with minimal human intervention.

Machine Learning (ML) Engineer

A role focused on designing, building, and maintaining the infrastructure and software that supports Machine Learning models.

Primary Focus: Takes a prototype model (often from a Data Scientist or AI Research Scientist) and prepares it for Deployment in a production environment. This includes building scalable training pipelines, optimizing model performance, and working with tools like Docker and Kubernetes (MLOps).
Goal: To reliably and efficiently integrate AI capabilities into real-world applications and products.

max_output_tokens

A mandatory or optional Hyperparameter that sets the hard upper limit on the length of the response a model is allowed to generate in a single request.

Purpose: This parameter is essential for controlling costs (since users are billed per token) and managing the latency (time taken) of the response.
Context: The maximum number of tokens in the *total conversation* (Prompt input + Output completion) is constrained by the model's Context Window. Setting max_output_tokens ensures the model stops generating text before exceeding a desired length or the overall context limit.

Model

The output of the machine learning training process. It is a mathematical representation learned from the data that can be used to make predictions or decisions.

MLOps

A set of practices that automates and manages the entire Machine Learning lifecycle. It is a fusion of two disciplines: ML development (building the model) and DevOps (software development and IT operations).

Natural Language Processing (NLP)

A field of AI that gives computers the ability to read, understand, and generate human languages (both written and spoken).

Neural Network (Artificial Neural Network - ANN)

A computational model inspired by the structure and function of biological neurons in the human brain. It consists of interconnected nodes (neurons) organized into layers (Input, Hidden, Output).

Noise

In the context of data, Noise refers to irrelevant, corrupted, or random data that obscures the underlying patterns. In Adversarial Attacks, subtle, imperceptible noise is added to inputs to intentionally fool a model. Overfitting occurs when a model learns the noise in the training data.

Non-linearity

The characteristic introduced by an Activation Function that allows a Neural Network to learn complex, non-straight-line relationships between inputs and outputs. Without non-linearity, a deep neural network would be equivalent to a simple linear model.

OpenAI

A research organization founded by Elon Musk and others in 2015, focused on developing safe artificial general intelligence (AGI); creators of the GPT models, DALL-E, and tools like ChatGPT.

Overfitting

A modeling error that occurs when a machine learning model learns the training data too well, including the noise and random fluctuations, leading to excellent performance on the training data but poor performance on new, unseen data.

Pre-training

The initial phase of training for a large model (like an LLM), where it is exposed to a massive, general dataset to learn foundational knowledge and representations before being fine-tuned for specific tasks.

Prompt Engineer

A role focused on optimizing the input given to Generative AI models, especially Large Language Models (LLMs).

Primary Focus: Develops, refines, and tests specialized prompts, instructions, and input structures to consistently elicit the desired, high-quality output from a model for specific tasks (e.g., maximizing accuracy, consistency, or creativity).
Goal: To maximize the value and performance of pre-trained LLMs through expert communication and input design.

Prompt Engineering

The process of carefully crafting the input text (prompt) given to a Generative AI model, especially an LLM, to elicit a desired and optimal response.

Recurrent Neural Network (RNN)

A type of neural network where connections between nodes form a directed graph along a sequence, allowing it to exhibit time-dynamic behavior. They are used for tasks involving sequential data like speech recognition and time series prediction.

Regression

A type of supervised learning task where the model learns to predict a continuous numerical value (e.g., predicting house prices or temperature).

Reinforcement Learning (RL)

A machine learning paradigm where an Agent learns to make decisions by performing actions in an Environment to maximize a cumulative Reward. Used often for robotics and game-playing AI.

Retrieval-Augmented Generation (RAG)

An architectural pattern that enhances the output of a Large Language Model (LLM) by retrieving relevant information from an external, authoritative knowledge base and integrating it into the prompt before generating a response.

Reward

In Reinforcement Learning (RL), the Reward is a scalar feedback signal provided by the Environment to the Agent after an action, indicating how good or bad the action was. The Agent attempts to maximize its cumulative Reward over time.

State-of-the-Art (SOTA)

Refers to the best-performing model or technique currently available for a specific task, usually determined by performance on a standardized Benchmark.

Supervised Learning

A machine learning approach where the model is trained on a labeled dataset, meaning the input data is paired with the correct output or "answer" (e.g., classification, regression).

Synthetic Data

Data that is artificially generated rather than collected from real-world events. It mimics the statistical properties of real data but is often used to address privacy concerns or data scarcity.

Temperature

A Hyperparameter that controls the randomness or creativity of the generated output from a model, typically an LLM.

Low Temperature (e.g., close to 0): The model is highly deterministic, always choosing the most probable token. Output is conservative, predictable, and factual, making it ideal for tasks like summarization or code generation.
High Temperature (e.g., closer to 1 or 2): The model increases the probability of selecting less-likely tokens. Output is more diverse, varied, and creative, making it suitable for brainstorming or creative writing, but with an increased risk of Hallucination.

Testing/Validation

The process of evaluating a trained model's performance on a separate dataset (the test or validation set) to ensure it generalizes well to new, unseen data and to detect overfitting.

Token

A fundamental unit of text used by LLMs. A token can be a word, a punctuation mark, a sub-word, or even a single character. Models process text by breaking it down into a sequence of tokens.

top_p (Nucleus Sampling)

A Hyperparameter used for controlling the diversity and quality of output by defining a threshold for the cumulative probability mass of tokens the model considers for its next word selection.

How it works: The model first sorts all possible next Tokens by their probability. If $P$ is set to $0.90$, the model will only sample from the smallest set of the most probable tokens whose combined probability is $\ge 90\%$. This dynamically restricts the vocabulary pool.
Low top_p: Restricts the model to a very small set of high-probability words (less diverse).
High top_p (e.g., 1.0): Allows the model to consider a wider range of tokens (more diverse and less focused). It is often used as an alternative or in conjunction with Temperature for more nuanced control.

Transfer Learning

A machine learning method where a model developed for a task is reused as the starting point for a model on a second, related task (e.g., using a pre-trained image classification model for medical image analysis).

Transformer

A novel neural network architecture introduced in 2017 that relies entirely on the Attention Mechanism. It is the foundational architecture for modern LLMs and state-of-the-art NLP and Generative AI.

Turing Test

A test of a machine's ability to exhibit intelligent behavior equivalent to, or indistinguishable from, that of a human.

Unsupervised Learning

A machine learning approach where the model is trained on an unlabeled dataset and must find patterns, structures, and relationships within the data on its own (e.g., clustering, dimensionality reduction).

Variety

One of the "Three Vs" of Big Data, referring to the many different types of data sources, formats, and structures involved (e.g., structured, unstructured, text, video, sensor data).

Velocity

One of the "Three Vs" of Big Data, referring to the high speed at which data is generated, collected, and needs to be processed or analyzed (e.g., real-time sensor readings).

Vibe Coding

An emerging software development practice where the human developer heavily relies on a Large Language Model (LLM) to generate, refine, and debug code from high-level, natural language prompts rather than writing code line-by-line manually. The developer shifts their focus from manual implementation to describing the desired outcome (the "vibe") and iteratively providing conversational feedback to the AI.

Key Aspect: It encourages rapid prototyping and an outcome-driven development approach, often embracing a "code first, refine later" mindset.

Volume

One of the "Three Vs" of Big Data, referring to the immense scale of data generated, measured in terabytes, petabytes, or zettabytes.

Weights

The adjustable parameters within a Neural Network that determine the strength of the connection between two neurons. These weights are what the model learns and adjusts during the Backpropagation and training process to minimize the Loss Function.

Workflow (AI Workflow / Agentic Workflow)

An AI-driven process where Large Language Models (LLMs) and tools are orchestrated through a predefined code path or sequence of steps. Workflows offer high predictability and consistency for well-defined, repeatable tasks, such as classifying input and routing it to a specific LLM, or a fixed sequence of prompt calls.