Domain 3: AI Life Cycle Stages 1-4 Flashcards

Examine responsible AI practices in system design, development, and testing. (131 cards)

1
Q

What are the 4 types of data formats?

A
  1. Structured
  2. Unstructured
  3. Static
  4. Streaming
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is structured data?

A

Data in a fixed format.

Example: spreadsheets with rows and columns

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is unstructured data?

A

Data that is not in a fixed format.

Example: images, videos, audio.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is static data?

A

Historical data that does not change.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is streaming data?

A

Data that updates frequently.

Example: performance and trends.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are the 5 V’s of data?

A
  1. Variety
  2. Value
  3. Velocity
  4. Veracity
  5. Volume
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is data cleansing?

A

Removing inaccurate, irrelevant, duplicate, toxic, or personal identifier data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is data labeling?

A

Tagging or annotating data.

Usually done manually.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is system architecture in AI design?

A

The algorithm or model design.

Examples: Convolutional Neural Network, Recurrent Neural Network, Transformer.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What factors influence system architecture choice?

A
  • Desired accuracy
  • Interpretability
  • Data objective
  • Business problem
  • Compliance
  • Constraints
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is a feature in AI?

A

An input variable used to generate model predictions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is feature engineering?

A

Transforming raw data into relevant information to create predictive model features.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is the base data pile?

A

The final dataset from the design stage including training, testing, and validation data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is training data?

A

A subset of the base data pile used to train the model.

Analogy: textbook with answers in the back

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is testing data?

A

A subset used for final evaluation of a trained model, also used for upgrades or variations.

Analogy: final exam

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is validation data?

A

A subset used during training to fine-tune parameters and prevent overfitting.

Analogy: quiz after reading the textbook

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is unseen data?

A

New data that the model has not encountered before.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What is synthetic data?

A

System or model-generated data that mimics real data for training or testing when real data is limited.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What is data poisoning?

A

A malicious attack that compromises the training dataset to manipulate or ruin model operation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What is the purpose of an AI impact assessment?

A
  • To understand the severity of mapped risks
  • Identify system parts needing governance and controls
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What resources can be leveraged to build AI impact assessments?

A
  • Model cards
  • Model evaluation tools
  • ISO 42001
  • ISO 42005
  • ISO 31000
  • NIST AI RMF
  • Microsoft Responsible AI Guide
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What is risk scoring?

A

Assigning a quantitative value to risk using severity of harms multiplied by probability of occurrence.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What are 4 common risk classifications?

A
  1. Prohibitive
  2. Major
  3. Moderate
  4. Low-risk
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What does a 3x3 probability and severity matrix evaluate?

A

Risk level based on likelihood and impact of harms.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
What are the **3 levels of probability** in the **3x3 matrix**?
1. Improbable 2. Occasional 3. Probable
26
What are the **3 levels of severity** in the **3x3 matrix**?
1. Marginal 2. Moderate 3. Critical
27
What is a **confusion matrix**?
A tool to **evaluate predictive performance of classification models** and where they get confused.
28
What are the **4 outputs** of a **confusion matrix**?
1. True positive 2. True negative 3. False positive 4. False negative
29
What does a **false positive** in facial recognition mean?
**Unauthorized** person **granted access**.
30
What does a **false negative** in facial recognition mean?
**Authorized** person **denied access**.
31
What is the **Risk Mitigation Hierarchy**?
A **framework** to **categorize mitigation actions** early in AI system design and impact assessment.
32
What is an **operational control**?
Systematic measure focused on **day-to-day management and oversight** of AI systems. ## Footnote Examples: assign system responsibility, conduct audits and reviews, establish feedback mechanisms, respond to feedback and appeals, elevate issues, assign kill switch responsibility.
33
What is **benchmarking** in AI?
**Standardized evaluation method** to assess and **compare AI system performance** using specific criteria and metrics. ## Footnote Example: Stanford Holistic Evaluation of Language Models (HELM).
34
What are the **seven stages** of the **AI development lifecycle**?
1. Plan and design. 2. Data collection and preparation. 3. Build and/or select model.* 4. Test, Evaluate, Verify, Validate. 5. Deploy / implement 6. Ongoing monitoring and maintenance 7. Decommission / retire. ## Footnote * On the exam, IAPP may refer to this as "develop".
35
What are the **5 ordered steps** to the **plan and design** stage?
1. Define the business problem and objectives. 2. Identify use cases. 3. Determine scope. 4. Evaluate data and data availability. 5. Establish governance structure.
36
What **activities** are involved in defining the **business problem and objectives**?
* Establishing metrics to measure system success * Interviewing target users * Conducting market research.
37
What **three factors** help to determine an AI project's **scope**?
* Impact. * Effort * Fit.
38
When determining an AI project's **scope**, what questions does **"impact"** answer?
* How will the solution affect the organization? * Is the solution solving a big or small problem?
39
When determining an AI project's **scope**, what questions does **"effort"** answer?
* What resources will be required to achieve the objective? * What's the timeline?
40
When determining an AI project's **scope**, what question does **"fit"** answer?
How well does the proposed solution suit the problem?
41
In addition to the five ordered steps of the **plan and design** stage, what **four other activities** are also carried out during the plan and design stage?
* Stakeholder engagement. * Establishing operational controls. * Performing impact assessments * Performing risk assessments.
42
What are the **six risk assessments** carried out initially in the plan and design stage?
1. Use case evaluation. 2. Stakeholder mapping. 3. Probability and Severity of Harms Matrix 4. Risk mitigation hierarchy 5. Benchmarking 6. Pre-deployment pilot.
43
Which **two risk assessments** are also conducted during the **model development phase**?
* Probability and severity harms matrix * Risk mitigation hierarchy.
44
Which **risk assessment** is carried out in the **TEVV stage** immediately pre-deployment?
Pre-deployment pilot.
45
Which **risk assessment** is carried out in the **deployment phase**?
Risk Mitigation Hierarchy
46
What is **stakeholder mappin**g?
A project management process that maps stakeholder interests with their appropriate function areas
47
What is the objective of the **use case evaluation**?
Determining the appropriateness of the AI solution for the organization's specific business problem
48
What does **data quality** concern?
The **correctness**, **completeness**, and **currency** of data
49
What **3 activities** are involved in **data wrangling / preparation**?
1. Cleansing 2. Labeling 3. Privacy
50
What is **overfitting**?
* When a model learns too precisely * "Memorize" quirks in training data
51
When a model **overfits**, what **"symptoms"** might the model present?
* Poor performance on new data sets. * Limited real-world applicability * Reduced prediction accuracy. ## Footnote These symptoms will present despite fantastic training performance.
52
What is **"underfitting"**?
When a model **fails to capture** data complexity
53
Under what **three circumstances** might a model suffer from underfitting?
1. Too few parameters 2. Excessive regularization 3. Insufficient features.
54
When a model **underfits**, what **"symptoms"** might the model present?
* Poor predictions * Low accuracy * Weak performance on all data
55
Concerning data, what is **ground truth**?
Known verified facts that serve as reference data.
56
Concerning a model's predictions, what is **accuracy**?
* Primary indicator of model performance. * Measures correctness of system outputs.
57
What are **three accuracy-related metrics** that can help to determine overall accuracy?
* Precision * Recall * F1 score.
58
What is **data transformation**?
Altering of data's format in order to make it compatible for model training.
59
What is **data pre-processing**?
* Preparation of data for a machine learning model. * Includes cleaning data.
59
What is **data post-processing**?
* Steps taken to adjust a model's output * Done to improve fairness or meet business requirements
60
What does **data integrity** refer to?
* Accuracy and consistency * That the data has not been altered in an unauthorized manner
61
What is **data observability**?
The **monitoring** of the overall health of the **data ecosystem/pipeline**.
62
What are **five major stages** in the **data life cycle**?
1. Collect 2. Process/use 3. Disclose/share. 4. Store/retain. 5. Destroy.
63
What are **four major activities** for managing **training data** throughout the data life cycle?
* Confirm lawful basis for processing. * Monitor data quality and representativeness. * Assess for bias. * Maintain reproducibility logs.
64
What is a **reproducibility log**?
Comprehensive record of everything required to recreate a specific model version.
65
What are **four major considerations** for governing the use of **validation and testing data**?
* Fairness metrics * Drift testing. * Edge case analysis. * Model explainability
66
What are **four major considerations** for managing the **data life cycle** during the deployment phase of the AI development life cycle?
* Monitoring real-time data inputs * Human in the loop. * Retraining trigger. * Enforcing access controls and logging
67
What is a **re-training trigger**?
An automated alarm that starts the process of training a new Model version.
68
What are **three examples** of common retraining triggers?
* Performance-based * Drift based * Time-based
69
What is **data drift**?
Changes to the input data.
70
What is **concept drift**?
Changes to the **relationship** between the **input and output** data.
71
What is a **major consideration for data** during the decommissioning stage of the AI development life cycle?
Secure **archival and deletion** of data, training artifacts, and logs in accordance with legal and regulatory requirements.
72
What are **four enhancement activities** carried out during feature engineering?
* Identify **information overlap**. * **Optimization** * **Removal** of unnecessary features. * Feature **regeneration** to accommodate concept drift
73
What is a **feature flag**?
Tool that allows turning specific code **on or off** while the system is running ## Footnote AKA: feature toggle
74
What are **three benefits** of feature engineering?
1. Improve model performance. 2. Improve effectiveness and reduce cost. 3. Boost model explainability.
75
What is **data governance**?
Managing an organization's data assets throughout the life cycle
76
What does **data provenance** document?
* Data origin and creation details. * Who created and modified the data.
77
What does a **data lineage** document?
* How data flows through a system * Data Transformation and Movements * Data dependencies
78
What is **data localization**?
Legal requirement: the data must be stored and processed **within a jurisdiction's geographical borders**.
79
What is a **Know Your Customer** regulation?
Requirement for financial institutions to **verify customer identity**.
80
What is **semi-structured data**?
* Between structured and unstructured * Has organized properties without a rigid structure
81
What is data **pseudonymization**?
Replacing personal identifiers
82
What is data **de-identification**?
Removal of some personal identifiers.
83
What is data **anonymization**?
Removal of all personal identifiers so that data subjects cannot be re-identified.
84
What is data **obfuscation/masking**?
**Modification of sensitive data** so that it has little or no value to unauthorized users ## Footnote E.g., masking all but the last 4 of SSN, account number
85
What are **four examples** of **privacy-enhancing technologies**?
* Homomorphic encryption. * Secure multi-party computation * Differential privacy. * Federated Learning
86
What is **encryption**?
Mathematical process to encode data.
87
What does **homomorphic encryption** allow?
Computation/training on encrypted data.
88
What does **secure multi-party computation** allow?
Computation on combined data without revealing any information about the input data.
89
How does **federated learning** work?
* Model trains on edge devices. * Model updates from edge devices sent to a single **global model**.
90
How does federated learning **preserve privacy**?
Private data **remains on the edge device**
91
How does **differential privacy** work?
* Algorithm **injects noise** into the data set. * Reverse-engineered data is noisy, not original personal data.
92
What are the **3 primary activities** carried out during the **build and/or select model** stage?
* Choose system architecture. * Train, validate, test model * Determine appropriate metrics, thresholds
93
What is **oversight**?
Monitoring of AI systems.
94
What are the **objectives** of oversight?
* Minimize risk. * Regulatory compliance. * Implement Responsible AI.
95
What is **AI assurance**?
Frameworks, policies, processes, and controls to measure, evaluate, and promote trustworthy AI
96
What is an **AI audit**?
Assessment of an AI system to ensure operational compliance with laws, regulations, standards, and policies.
97
What are the **three types** of **human oversight**?
* Human in the loop. * Human out of the loop. * Human on the loop.
98
What are the **three lines of defense** (3LOD)?
* First line * Second line. * Third line.
99
According to the **3LOD model**, what is the **first line** of defense?
Business and functional area that **owns and manages the risk**
100
According to the **3LOD model**, what is the **second line** of defense?
Individuals that **identify and mitigate** risk on a daily basis
101
According to the **3LOD model**, what is the **third line** of defense?
Internal audit team
102
When **testing** a model, what **metrics** are typically assessed?
* Bias. * Accuracy * Reliability. * Robustness * Privacy. * Interpretability * Safety.
103
When **testing** a model, what **types of data** should be included?
* Edge cases * Unseen data * Malicious Data * Data to assess system biases
104
What is a **counterfactual explanation** (CFE)?
A hypothetical reality that contradicts observed facts
105
What documents or tools help to **evaluate models**?
* Model cards. * System cards. * Benchmarks. * Data provenance documentation / datasheets.
106
What is a **model/system card**?
A **transparency document** that provides a **high-level overview** of the model(s), its training, and data
107
What is a **conformity assessment**?
* A **framework** of technical and non-technical assessments and documentation * Demonstrates compliance with the **EU AI Act**.
108
When are conformity assessments **conducted**?
Pre-market deployment.
109
**Who** conducts the conformity assessment?
* AI system provider. * Notified body.
110
What is a **repeatability assessment**?
An evaluation that determines whether the **same team** can obtain the **same results** multiple times under **identical conditions**.
111
What is **adversarial testing**?
Assessment of a model using **malicious inputs**. ## Footnote AKA red teaming
112
What is **threat modeling**?
Process by which threats are identified, listed, and countermeasures prioritized.
113
What is **brittleness**?
Describes a model that fails when minor tweaks are made to input data
114
What is **catastrophic forgetting**?
When new data **overwrites or weakens** weights in LLMs
115
What **documentation** should be provided to the organization to **communicate** an AI project's **plan and products**?
* Business use case. * Timeline * Transparency documentation for regulators and consumers . * User interface copy. * Acceptable use policy * Frequently asked questions.
116
What do each of the letters in **TEVV** stand for?
* Test * Evaluate * Verify * Validate.
117
What is the **objective** of the **TEVV stage**?
* Assess the model's performance across different dimensions. * Ensure model meets business requirements.
118
Concerning the **TEVV stage**, what question does **test** answer?
Does the model work as intended?
119
Concerning the **TEVV stage**, what question does **evaluate** answer?
How well does the system perform overall?
120
Concerning the **TEVV stage**, what question does **verify** answer?
Was the system built correctly?
121
Concerning the **TEVV stage**, what question does **validate** answer?
Does the system meet stakeholder requirements?
122
Concerning ensemble methods, what is **boosting**?
Sequentially builds simple models where **each improves on the previous one**.
123
What is a **system architecture**?
The structure, component, and organization of an AI model.
124
What characteristics define a **feedforward neural network**?
* Straightforward data processing * Data travels in one direction, from input to output.
125
What characteristic defines a **convolutional neural network**?
Use of multiple layers to filter and extract distinctive features from input data ## Footnote Excels in classification and visual tasks
126
What characteristic defines a **recurrent neural network**?
Process data bi-directionally.
127
What characteristic defines **graph neural networks**?
* Process data represented in graph structures. * Understand and analyze how data points are connected in a **social network**.
128
How do **transformers** function?
Use attention to learn relationships between components of the input sentence. ## Footnote E.g. words in a sentence or sentences in a paragraph
129
Concerning ensemble methods, what is **stacking**?
* Training multiple models. * Synthesizing models' outputs.
130
Concerning ensemble methods, what is **bagging**?
* Training the **same model** on **different subsets** of data. * Aggregating outputs of each model.