What are the 4 types of data formats?
What is structured data?
Data in a fixed format.
Example: spreadsheets with rows and columns
What is unstructured data?
Data that is not in a fixed format.
Example: images, videos, audio.
What is static data?
Historical data that does not change.
What is streaming data?
Data that updates frequently.
Example: performance and trends.
What are the 5 V’s of data?
What is data cleansing?
Removing inaccurate, irrelevant, duplicate, toxic, or personal identifier data.
What is data labeling?
Tagging or annotating data.
Usually done manually.
What is system architecture in AI design?
The algorithm or model design.
Examples: Convolutional Neural Network, Recurrent Neural Network, Transformer.
What factors influence system architecture choice?
What is a feature in AI?
An input variable used to generate model predictions.
What is feature engineering?
Transforming raw data into relevant information to create predictive model features.
What is the base data pile?
The final dataset from the design stage including training, testing, and validation data.
What is training data?
A subset of the base data pile used to train the model.
Analogy: textbook with answers in the back
What is testing data?
A subset used for final evaluation of a trained model, also used for upgrades or variations.
Analogy: final exam
What is validation data?
A subset used during training to fine-tune parameters and prevent overfitting.
Analogy: quiz after reading the textbook
What is unseen data?
New data that the model has not encountered before.
What is synthetic data?
System or model-generated data that mimics real data for training or testing when real data is limited.
What is data poisoning?
A malicious attack that compromises the training dataset to manipulate or ruin model operation.
What is the purpose of an AI impact assessment?
What resources can be leveraged to build AI impact assessments?
What is risk scoring?
Assigning a quantitative value to risk using severity of harms multiplied by probability of occurrence.
What are 4 common risk classifications?
What does a 3x3 probability and severity matrix evaluate?
Risk level based on likelihood and impact of harms.