AI, Papers

Trustworthy AI and UX of AI Models

Delivering AI Models is all fine but how will you know if the consumers of these AI Models are going to trust the AI Model to deliver the solutions it is supposed to deliver? Is there a rule book for them to refer to how and when to trust an AI? How can businesses develop trustworthy AI models?

This paper Formalizing Trust in Artificial Intelligence: Prerequisites, Causes, and Goals of Human Trust in AI addresses some of these interesting questions we have about the human trust in AI and what does trust actually mean when it comes to AI. This also helps us understand how to design trustworthy AI models and the UX behind an AI model.

I will be summarizing this 13 page Paper which talks about trust and trustworthiness in an AI Model and how a Business can leverage this knowledge to design, develop, and deliver an AI Model so that their users can trust their AI Models. This formalization is scientific and will convince you why a Business must adhere to these rules while developing an AI Model.


⁉ Questions

These are some of the questions I had and collected while I was reading the paper.

  1. How to design trustworthy AI models?
  2. What are the prerequisites for trust between a user and AI?
  3. How to evaluate whether trust has manifested?
  4. How to check whether it is warranted trust or unwarranted trust?
  5. What are the mechanisms by which an AI model gains the trust of a person?
  6. What is the nature of trust in AI? How do we exactly interpret “trust” in AI?
  7. How is it different from trusting fellow good humans?
  8. After all, isn’t the AI Model built by another human, presumably, for the greater good? So, does the trust in AI Model solely depend on an AI Developer or can we generalize it?
  9. First of all, are you aware of how do “we” place trust in other “humans”?
  10. What are the pre-requisites and goals to fit the definition of trust?

This paper aims to answer the above questions.

This paper doesn’t exactly require any prerequisite knowledge on all the inner-workings of AI models but it is advantageous if the reader has a general understanding of the difference between a properly functioning AI model and which doesn’t.


? TL;DR

βœ… The authors claim that this “trust model”, which they are proposing, rests on two key properties. They are:

  1. Vulnerability of the user, and
  2. The ability to anticipate the impact of the AI model’s decisions.

βœ… Risk is a prerequisite to the existence of Human-AI trust.

βœ… Distrust manifests in an attempt to mitigate the risk.

βœ… Distrust is not equivalent to the absence of trust.

βœ… The ability to anticipate is a goal, but not necessarily a symptom, of Human-AI trust.

βœ… ‘Trust in model correctness’ refers to: it is in fact not trust in the general performance ability of the model, but that the patterns that distinguish the model’s correct and incorrect cases are available to the user.

βœ… Contracts specify the behavior to be anticipated and to trust the AI is to believe that a set of contracts will be upheld.

βœ… Trustworthy AI: An AI model is trustworthy to some contract if it is capable of maintaining this contract.

βœ… Trustworthiness is not a prerequisite for trust.

βœ… Trust can exist in a model which is not trustworthy, and a trustworthy model does not necessarily gain trust.

βœ… Warranted Human-AI trust via a causal relationship with trustworthiness: incurred Human-AI trust is warranted if the trustworthiness of the model can be theoretically manipulated to affect the incurred trust.

βœ… It is possible for a trustworthy model to incur unwarranted Human-AI trust – in this case, the trust will not be betrayed, even though it is unwarranted.

βœ… When pursuing Human-AI trust, unwarranted trust should be explicitly evaluated against, and avoided or otherwise minimized.

βœ… If an AI model is incapable of maintaining a relevant contract, then the user will develop warranted distrust which is expected by the AI developer. This simply means, that the AI model has to live up to its contract.

βœ… Trustworthiness is a prerequisite to warranted trust.

βœ… Human-AI trust: If a human perceives that an AI model is trustworthy to a contract, and accepts vulnerability of this model’s actions, then the human trusts the model contractually.

βœ… If there is no ‘benchmark’ to be compared with the AI model, then it becomes difficult for the user to have intrinsic trust in the model.

βœ… If your AI is citing respectable sources for the reason of its predictions, then the user finds it is easy to gain trust in your model.

βœ… Intrinsic trust can be increased in a disciplined manner by formalizing the priors behind trustworthy or non-suspicious behavior and incorporating behavior that upholds those priors.

βœ… Basically, a user trusts an AI model if there is sufficient diverse data.

βœ… Even Data Scientists are susceptible to develop unwarranted trust, despite some mathematical understanding of the models.

βœ… Clear explanation about the link between Explainable AI (XAI) and Trust.

βœ… Methods of evaluating whether trust is warranted are underdeveloped, and require future work.

These points will be covered in detail in the below sections. ?


The Curious Bunch πŸ’Œ

Join 380+ Curious Learners! Every Monday, you'll receive a newsletter that contains resources and tools to help you learn, increase productivity, and be creative!


? Abstract

  • ‘Incorrect’ levels of trust may cause misuse, abuse, or disuse of technology. What is meant by ‘disuse’ in this context? We will read more about it in the coming sections.
  • We will discuss a model of trust inspired from trust between people. Formally, the ‘trust between people’ is termed as ‘interpersonal trust’ in sociology. From here on, we will be addressing the trust between people as interpersonal trust.
  • The authors claim that this “trust model”, which they are proposing, rests on two key properties. They are:
    1. Vulnerability of the user, and
    2. The ability to anticipate the impact of the AI model’s decisions.
  • They further discuss the difference between contractual ‘trust’ and ‘trustworthiness’.
  • By the end of this post, you’ll be aware of what is ‘warranted’ and ‘unwarranted’ trust. Also, about why do these definitions play an important role in building a warranted trustworthy AI.

Checkboxes for designing trustworthy AI model:

βœ… Evaluate whether trust has manifested

βœ… Check whether it is warranted trust or unwarranted trust.


? Introduction

  • With the rise of ML Models being “black boxes” and not-really-explainable to common people, trust is fast becoming a key desire or component of interaction between a user and the AI.
  • An AI model can be safely implemented in society if a user can and will trust that AI model.

With the advancement in Facial Recognition Technology, we have seen that governments are using this technology to monitor public places. However, many people are against this Facial Recognition Technology after Deepfake Technology took the internet by storm in the recent days.

It isn’t unreasonable for public to be against such a technology that attaches someone’s head on someone else’s body. People cannot simply trust a model just because it is delivering the results. What could the Facial Recognition AI model do apart from recognizing faces? How is it implemented for the benefit of the public by the government? Are they compromising on people’s private data?

People need to understand these basic questions since they tend to anthropomorphize the ability of AI. They need answers to such questions before they can feel safe around AI models that are being applied at scale. They need a formal system so that they can check those boxes before blindly trusting them or distrusting them.

  • We are interested in formalizing the ‘trust’ between the user and AI, and using this formalization to understand more about the requirements behind AI which can be integrated safely into the society.

⭐ Definition of Artificial Intelligence

We consider ‘artificial intelligence’ to be any automation which is attributed with intent by the user, i.e., anthropomorphized with a human-like reasoning process.

⭐ What is the meaning of anthropomorphize?

It is the action of attributing human characteristics or behaviour to something. In this context, we’re attributing human-like reasoning to an AI model.

  • We will discuss how interpersonal trust is defined in sociology.
  • Derive a basic, yet unfunctional, definition of trust between a human and an AI, based on prerequisites and goals behind the trustor developing trust in the AI.
  • The trustor must be vulnerable to the AI’s actions, and the trustor’s goal in developing trust is to anticipate the impact of the AI model’s decisions.
  • What can we say about when and whether this goal is achieved?

⭐ We answer 2 questions:

βœ… What is the AI being trusted with? [Contractual trust]

βœ… What differentiates trust which achieves this goal, and trust which does not? [Warranted and Unwarranted trust]

  • We formalize notions of intrinsic trust, which is based on the AI’s observable reasoning process, and extrinsic trust which is based on the AI’s external behaviour.
  • Finally, we come back to the question of evaluating trust. We answer this by discussing:
    1. The evaluation of the vulnerability in the interaction, and
    2. The evaluation of the ability to anticipate.

? Basic Definition of Trust

Examine research in philosophy, pscyhology, and sociology of how people trust each other (interpersonal trust).

⭐ Definition of Interpersonal Trust

Trust is a directional transaction between two parties: if A believes that B will act in A’s best interest, and accepts vulnerability of B’s actions, then A trusts B.

❓ What does it mean when we say ‘vulnerability’ and ‘anticipation’?

‘Anticipating’, here, refers to a belief that the trustee will act in the trustor’s best interests. We maintain that Human-AI trust exists for the same purpose. Trust, therefore, is an attempt to anticipate the impact of behavior under risk.

Based on this, we conclude that:

βœ… Risk is a prerequisite to the existence of Human-AI trust.

Admitting vulnerability means that the trustor perceives both of the following:

  1. That the event is undesirable
  2. That it is possible.

Ideally, the existence of trust can only be verified after verifying the existence of risk, i.e., by proving that both conditions hold.

? Example

AI-produced credit scoring represents a risk to the loan officer: a wrong decision carries a risk (among others) that the applicant defaults in the future. The loss event must be undesirable to the user (the loan officer), who must understand that the decision (credit score) could theoretically be incorrect, and that it is not certainly incorrect, for trust to manifest.

Similarly, from the side of the applicants, the associated risk is to be denied (or to be charged a higher interest rate) on a loan that they deserve, and trust manifests if they believe that the AI will work in their interest (the risk will not occur).

βœ… Distrust manifests in an attempt to mitigate the risk.

  • Tallant’s definition of distrust: A distrusts B if A does not accept vulnerability to B’s actions, because A believes that B may not act in A’s best interest.
  • Distrust is not equivalent to the absence of trust.

βœ… The ability to anticipate is a goal, but not necessarily a symptom, of Human-AI trust.

  • Anticipating intended behavior is the user’s goal in developing trust, but not necessarily the AI developer’s goal.

?Contractual Trust

❓ What does the human trustor anticipate in the AI model’s behaviour?

❓ What is the role of the ‘anticipated-behavior’ in the definition of Human-AI trust?

? Trust in Model Correctness

? Example

  • Consider some binary classification task, and suppose we have a baseline that is completely random by design, and a trained model that achieves the performance of the random baseline (i.e., 50% accuracy in this case).
  • Since the trained model performs poorly, a simple conclusion to draw is that we cannot trust this model to be correct.

❓ But, is this true?

  • Suppose now that the trained model with random baseline performance does not behave randomly. We can make others understand this bias with an explanation of the model behavior.
  • This explanation reveals that the model is more likely to be correct in some situations than other situations.

? Example

  • Consider a credit-scoring AI model which is morel likely to be correct for certain sub-populations.
  • The performance of the second model did not change, it is simply not entirely a generalized model. Yet, we can say that now, with the added explanation, a trustor may have more trust that the model is correct (on specific instances).

❓ What has changed?

The explanation about the model made it more ‘predictable’, such that the user can now better anticipate whether the model’s decision is correct or not for given inputs, compared to the model without any explanation.

With this we conclude that,

βœ… ‘Trust in model correctness’ refers to: it is in fact not trust in the general performance ability of the model, but that the patterns that distinguish the model’s correct and incorrect cases are available to the user.

? The General Case: Trust in a Contract

⭐ Definition of Contractual Trust

Contractual trust is a model of trust in which a trustor has a belief that the trustee will stick to a specific contract.

  • In this paper, we contend that all Human-AI trust is contractual and to discuss Human-AI trust, the contract must be explicit.
  • The contract may refer to any functionality which is considered useful that would yield results. Hence, model corerctness is only one instance of contractual trust.

? Example

A model trained to classify medical samples into classes can reveal strong correlations between attributes for one of those classes, giving leads to research on causation between them, even if the model was not useful for the original classification task.

  • People can trust something in one context but not other. How do we deal with this when it comes to AI models?

? Example

A model trained to classify medical samples into classes can perform strongly for samples that are similar to those in its training set, but poorly on those where some features were infrequent, even though the ‘contract’ appears the same.

We can now conclude that:

βœ… The contractual trust can be stated as being conditioned on the context.

Table of European requirements for trustworthy AI.
Table 1.
  • For defining contracts, the work that proposes standardized documentations to communicate the performance characteristics of trained AI models are:
  • Data statements, datasheets for datasets, model cards, reproducibility checklists, fairness checklists, and facthseets.

? Example

If transparency is the stated contract, then all of the mentioned documentations could be used to specify information that AI developers need to provide such that they can evaluate and increase users’ trust in transparency of an AI system.

From the above discussion, we conclude that:

βœ… Contracts specify the behavior to be anticipated and to trust the AI is to believe that a set of contracts will be upheld.


?Trustworthy AI

So far, we have covered:

? Trust enables the ability to anticipate intendend behaviour through the belief that a contract will be upheld.

? The ability to anticipate does not necessarily manifest with the existence of trust.

? It is possible for a user to trust a model despite their inability to anticipate its behaviour.

❓ What differentiates trust that ‘succeeds’ at this goal [enabling the ability of anticipation from the user] from trust that does not?

❓ What is the difference between ‘trust’ (an attitude of the trustor) and ‘trustworthy’ (a property of the trustee)?

We come to a conclusion that:

βœ… An AI model is trustworthy to some contract if it is capable of maintaining this contract.

βœ… Trustworthiness is not a prerequisite for trust.

βœ… Trust can exist in a model which is not trustworthy, and a trustworthy model does not necessarily gain trust.

⭐ Definition of warranted and unwarranted trust.

We say that trust is warranted if it is the result of trustworthiness, and otherwise it is unwarranted.

? Example

  • Consider a user interacting with an AI model via some visual interface (GUI), and the user trusts the AI to make a correct prediction on some task.
  • There is a correlation between a high-quality GUI and trustworthy AI models.
  • If the cause of the user’s trust is the model GUI, and later if the AI developer decides to increase or decrease the AI’s ability of accurately predicting something will not affect the trust. Thus, it is known as unwarranted trust. In simple words, the user is trusting the AI model blindly.
  • If the cause of the trust is the model’s performance ability, then if the AI developer decides to increase or decrease the AI’s ability of accurately predicting something might affect the level of trust a user will have on the AI model. Thus, it is known as warranted trust. In other words, the user is trusting that the AI model will uphold it’s contract.
An example of causes of trust, in the context of warranted and unwarranted trust.
Fig 1.
  • Companies are safe ONLY for a while if they build AI models based on unwarranted trust.

Formally, we define and conclude that:

βœ… Warranted Human-AI trust via a causal relationship with trustworthiness: incurred Human-AI trust is warranted if the trustworthiness of the model can be theoretically manipulated to affect the incurred trust.

βœ… It is possible for a trustworthy model to incur unwarranted Human-AI trust – in this case, the trust will not be betrayed, even though it is unwarranted.

βœ… When pursuing Human-AI trust, unwarranted trust should be explicitly evaluated against, and avoided or otherwise minimized.

βœ… If an AI model is incapable of maintaining a relevant contract, then the user will develop warranted distrust which is expected by the AI developer. This simply means, that the AI model has to live up to its contract.


? Defining Human-AI Trust

βœ… Trustworthiness is a prerequisite to warranted trust.

βœ… Human-AI trust: If a human perceives that an AI model is trustworthy to a contract, and accepts vulnerability of this model’s actions, then the human trusts the model contractually.


?Causes of Trust

We divide causes of warranted trust into 2 types: intrinsic and extrinsic.

? Intrinsic Trust

It is not simply enough to explain the user how an AI model works to build intrinsic trust. You need:

βœ… The user to successfully comprehend the true reasoning process of the model.

βœ… The reasoning process of the model matches the user’s expected process.

? Example

A decision tree is a model whose inner workings can be well-understood by the user (if it is small).

  • When there is a task that requires involving complex expert knowledge to make sense out of it, then a layman user will not gain intrinsic trust in the model regardless of how ‘simple’ and interpretable the model is.

βœ… If there is no ‘benchmark’ to be compared with the AI model, then it becomes difficult for the user to have intrinsic trust in the model.

βœ… If your AI is citing respectable sources for the reason of its predictions, then the user finds it is easy to gain trust in your model. βœ… Intrinsic trust can be increased in a disciplined manner by formalizing the priors behind trustworthy or non-suspicious behavior, and incorporating behavior which upholds those priors.

? Extrinsic Trust

βœ… It is possible for someone to have warranted trust on a model not through explanation, but through behavior.

  • This trust is based on the evaluation methodology or the evaluation data.
  • This is equivalent to a doctor who is considered more trustworthy because they have a long history of making correct diagnoses.
  • This is basically a trust in the evaluation scheme.
  • To increase extrinsic trust is a matter of justifying that a model can generalize to unseen instances based on expected behavior of the model on other unseen instances.

Two ways to get extrinsic trust:

βœ… When the model is trustworthy.

βœ… When the evaluation scheme is trustworthy.

Three main methods of evaluation towards extrinsic trust:

  1. By proxy. Expert human opinion on AI reasoning or behavior can enable non-experts to gain extrinsic trust in the AI.
  2. Post-deployment data. The examples that the model sees during production are the most trustworthy representatives of general behavior evaluation.
  3. Test sets. Sets of examples, distributed in some specific way, for which gold labels are available. Basically, a user trusts an AI model if there is sufficient diverse data.

❓ How can we verify whether an evaluation scheme (in particular, test sets and deployment data) is trustworthy?

  • The data accurately represents the underlying mechanism it comes from
  • That the underlying mechanism is negligibly affected by distribution shift over time
  • That the evaluation metrics represent the contract.

?Explainability and Trust

XAI for Trust (common): A key motivation of XAI and interpretability is to increase the trust of users in the AI.

XAI for Trust (extended): A key motivation of XAI and interpretability is to:

  1. Increase the trustworthiness of the AI
  2. Increase the trust of the user in a trustworthy AI
  3. Increase the distrust of the user in a non-trustworthy AI

?Evaluating Trust

What should evaluation of trust satisfy? Do current methods of evaluating trust fulfill these requirements?

? Vulnerability in Trust and Trustworthiness

⭐ Definition of Vulnerability in Trust

The question of trust does not exist when the user does not assume a risk in trusting the AI. The user must depend on the AI on some level for risk to manifest.

? Example

When the user is already confident in the answer and does not actually depend on the AI – as is the case for many machine learning datasets today – there is no unfavorable event that directly results from the AI’s actions.

βœ… Experiments which simply ask the users whether they trust the model for some trivial task evaluate neither trust nor trustworthiness.

Categorization of datasets that are commonly used to advance interpretable NLP.
Fig 2.

βœ… The question of trust can be considered meaningful for tasks above the ‘effort boundary’ and right from the ‘vulnerability boundary’.

⭐ Definition of Vulnerability in Trustworthiness

Whether a model is intrinsically or extrinsically trustworthy is unrelated to the existence of vulnerability in the user.

? Warranted and Unwarranted Trust

Kaur et al. show a synthetic experimental setup to evaluate unwarranted trust, and conclude that even data scientists are susceptible to develop unwarranted trust, despite some mathematical understanding of the models.

Evaluation protocol:

  1. Measure the level of trust in an interaction
  2. Manipulate the real trustworthiness of the model (by handicapping it in some way or by improving its predictions or even by replacing the model with an oracle).
  3. Measure the level of trust after the manipulation

The amount of change due to the manipulation indicates the level of warranted trust.

? Evaluating ‘Anticipation’

Simulatability is one of the methods to evaluate a users ability to successfully anticipate the AI’s behavior. It is the ability of the user to simulate the outcome of the AI on an instance level.


? Discussion

Trust in the AI Developer

βœ… Trust in the AI model based on trust in the AI developer is an instance of interpersonal trust by proxy, and not Human-AI trust.


? Conclusion

Finally, we reached the end of this paper. Here, I’ll summarize what we’ve discussed so far:

  1. The assessment of risk is necessary prior to the assessment of trust
  2. AI developers should be explicit about the contracts that their models maintain
  3. Successful anticipation, while the goal of trust, is not indicative of warranted trust
  4. Trust is only ethically desirable if it is warranted
  5. Distrust is not strictly undesirable if it is warranted
  6. Explanation seems to be uniquely positioned for Human-AI trust as a method for causing intrinsic trust for general users
  7. Methods of evaluating whether trust is warranted are underdeveloped, and require future work.

If you’ve read through and through until the end, I’d love to hear your opinion about my writing style in summarizing the paper. I try to put it in simple words, although most of the literature is taken as is from the paper to maintain the thought process behind the ideas. I did not risk in sending out the wrong ideas to the people. Let me know how else I can improve in making such long papers even more short and get others to start reading more and more scientific publications. ?

Resources

  1. Jacovi, A., Marasović, A., Miller, T., & Goldberg, Y. (2020, October 15). Formalizing Trust in Artificial Intelligence: Prerequisites, Causes and Goals of Human Trust in AI. Retrieved October 27, 2020, from https://arxiv.org/abs/2010.07487
  2. Michigan, H., Kaur, H., Michigan, U., Research, H., Nori, H., Research, M., . . . Authors: Harmanpreet Kaur University of Michigan. (2020, April 01). Interpreting Interpretability: Understanding Data Scientists’ Use of Interpretability Tools for Machine Learning. Retrieved October 27, 2020, from https://dl.acm.org/doi/fullHtml/10.1145/3313831.3376219

Read about me here.

The Curious Bunch πŸ’Œ

Join 380+ Curious Learners! Every Monday, you'll receive a newsletter that contains resources and tools to help you learn, increase productivity, and be creative!