featured

Managers versus ChatGPT: who is more buzzed by bias?

Research spotlight | 12 June 2024

Sam Kirshner

Associate Professor

Can AI be objective? New research explores how ChatGPT tackles business decisions impacted by human biases, writes UNSW Business School's Sam Kirshner

You manage a fleet of delivery trucks for a retail company. Ten distribution centres are located along a delivery route.

On Monday, your truck has time to stop at only two of these centres to pick up or drop off inventory.
On Wednesday, your truck has time to stop at only five of these centres to pick up or drop off inventory.
On Friday, your truck has time to stop at eight of these centres to pick up or drop off inventory.

Given the ten distribution centres along the delivery route:

How many different patterns can the truck make exactly two stops on Monday?
How many different patterns can the truck make exactly five stops on Wednesday?
How many different patterns can the truck make exactly eight stops on Friday?

Since there are ten possible stops, and on Mondays, the truck only needs to make two stops, surely there must be more possible combinations than on Friday, where the truck must make eight stops. Well, in fact, the number of potential combinations of two and eight stops is the same. On Wednesday, the truck makes exactly five stops, the number out of 10, representing the maximum possible number of combinations of stops. (The correct answer is based on the binomial coefficient of 10 choose r, which is symmetric around five).

People often fail to solve this problem accurately due to our tendency to rely on availability heuristics. The availability heuristic is a tendency to use data familiar to oneself, such as personal experiences and conversations, to judge risks.

Sam Kirshner_UNSW.jpg — UNSW Business School's Sam Kirshner says that when dealing with subjective decisions, GPT tends to favour lower-risk outcomes and strongly prefers certainty. Photo: supplied

Tversky and Kahneman demonstrated availability affecting judgment by giving subjects a visual task of estimating the possible ways a bus can make r stops travelling along a route with ten stops in total. It is easier to picture the bus stopping in many different ways for two than five and five than eight, leading to the fallacy. Although heuristics and decision biases can help us make snap decisions faster, they often lead to suboptimal business decision-making. Thus, an essential part of business is how managers, consumers, and value chain partners' behaviours impact operational and strategic decision-making.

If you ask ChatGPT to solve this and other problems, it can retrieve the correct formula to deliver the optimal solution. Thus, LLMs like ChatGPT could be a solution for making less biased business decision-making. Ultimately, algorithms do not experience emotions and are not subject to the same cognitive limitations as humans.

Solving business decisions that involve behavioural tendencies

Now consider the following problem: You are the operations manager at a warehouse. You have four storage areas: P (perishable Items), NP (non-perishable Items), T (temperature controlled), and NT (non-temperature controlled). Each storage area contains items that are either perishable or non-perishable and are either temperature-controlled or non-temperature controlled. You are given the following rule: “Every storage area that contains perishable items must be temperature-controlled.” Which storage areas must you inspect to ensure the rule is being followed?

Learn more: UNSW Business AI Lab

This problem examines a heuristic in the judgment process is confirmation bias. Confirmation bias is a tendency to seek evidence that could confirm someone’s a priori hypothesis or belief while ignoring evidence that could disprove it. The correct answer is to inspect the perishable and non-temperature-controlled areas, although most people would choose the perishable and temperature-controlled areas. This is because if the statement “if P then Q” is true, to falsify the rule, we need to show that the combination P and not-Q does not exist, which requires us to check Perishable and the Not Temperature controlled rooms. When asked to solve this problem, ChatGPT typically opts to check the temperature-controlled room, falling prey to the bias.

While ChatGPT can be unbiased, it is trained on human language, from which it acquires grammar, factual information, and reasoning skills, but it can also have inherent biases. So, which of the two forces will prevail when ChatGPT is asked to solve business decisions in scenarios where behavioural tendencies impact business decisions?

A relatively consistent pattern in GPT's decision-making

The answer to this question is explored in a working research paper I co-authored together with four colleagues, Yang Chen, Anton Ovchinnikov, Tracy Jenkins from Queen’s University, and Meena Andiappan from McMaster University. The paper, A Manager and an AI Walk into a Bar: Does ChatGPT Make Biased Decisions Like We Do?, provides a comprehensive look into how ChatGPT tackles problems. It examines 18 different biases and heuristics, analysing nearly 8000 chat responses to determine when ChatGPT shows bias.

LLMs such as ChatGPT can provide reliable support even when decision and problem contexts vary.jpeg — LLMs such as ChatGPT can provide reliable support even when decision and problem contexts vary. Photo: Adobe Stock

The results reveal that GPT's decision-making follows a relatively consistent pattern. When dealing with subjective decisions, GPT tends to favour lower-risk outcomes and strongly prefers certainty. In objective scenarios, GPT's approach is systematic. It first assesses whether a calculable solution is available for the problem. If it identifies a relevant formula, GPT applies it to find an answer. However, when confronted with objective questions that lack a straightforward formulaic solution, GPT employs heuristic-based reasoning, similar to human instinctual responses.

In many ways, this mirrors human behaviour, except that GPT's ability to recognise and accurately apply formulas surpasses that of humans. These findings suggest that organisations will gain the most by utilising GPT in more objective workflows that rely on established formulas, particularly those where humans might struggle due to limited cognitive capacity.

Despite the high sensitivity to prompt context reported in other studies, where even minor wording changes can significantly alter GPT’s responses, this study found GPT to be remarkably consistent across various contexts. At first glance, this result is surprising, especially given the substantial variations in the vignettes' contexts and lengths. However, GPT's systematic approach to problem-solving likely explains why its decisions remained broadly consistent across different contexts. These findings are encouraging for managers, suggesting that LLMs can provide reliable support even when decision and problem contexts vary.

Subscribe to BusinessThink for the latest research, analysis and insights from UNSW Business School

How ChatGPT-3.5 compares against ChatGPT-4

The research also compares ChatGPT-3.5 with ChatGPT-4. Choosing between GPT-3.5 and GPT-4 involves balancing cost and performance, a consideration for managers integrating GPT into their organisations. The free version of ChatGPT uses the GPT-3.5 model, while the premium and team versions use GPT-4. The results show that GPT-4 outperforms GPT-3.5, particularly in solving problems with objective solutions.

This suggests that managers should evaluate the trade-offs between model performance and cost based on the tasks at hand. The detailed comparison between GPT-3.5 and GPT-4 also gives researchers insights into the evolution of GPT's ability to handle decision biases. While GPT-4 is more accurate overall, its behavioural tendencies have also become more pronounced.

As LLMs like ChatGPT take on roles as advisors or even delegates in various operational tasks, understanding their biases becomes crucial: whether they act optimally, reflect human biases, or exhibit entirely different biases from human decision-making.

Sam Kirshner is an Associate Professor in the School of Information Systems and Technology Management at UNSW Business School. He also works with The UNSW Business AI Lab, which partners with organisations to discover new business opportunities, overcome challenges to AI adoption and accelerate the next generation of leaders and entrepreneurs. For more information, please contact Professor Mary-Anne Williams.

Technology Ethics Leadership Management Digital Business Strategy Innovation Analytics Data Behaviour Artificial Intelligence

Republish this article

Republish

You are free to republish this article both online and in print. We ask that you follow some simple guidelines.

Please do not edit the piece, ensure that you attribute the author, their institute, and mention that the article was originally published on Business Think.

By copying the HTML below, you will be adhering to all our guidelines.

Press Ctrl-C to copy

<h1>Managers versus ChatGPT: who is more buzzed by bias?</h1>

<figure><img src="https://assets-us-01.kc-usercontent.com:443/4df0558d-5779-0012-00f2-b3fa06d6c950/2139439d-71b8-4872-95ff-64ca4db86382/AI%20vs.%20Bias%20Can%20Managers%20Make%20Better%20Decisions%201.jpeg?w=1320" alt="AI vs. bias can managers make better decisions?" /><figcaption>AI vs. bias can managers make better decisions?</figcaption></figure>
 You manage a fleet of delivery trucks for a retail company. Ten distribution centres are located along a delivery route.
<ul>
 <li>On Monday, your truck has time to stop at only two of these centres to pick up or drop off inventory.</li>
 <li>On Wednesday, your truck has time to stop at only five of these centres to pick up or drop off inventory.</li>
 <li>On Friday, your truck has time to stop at eight of these centres to pick up or drop off inventory.</li>
</ul>
Given the ten distribution centres along the delivery route:
<ul>
 <li>How many different patterns can the truck make exactly two stops on Monday?</li>
 <li>How many different patterns can the truck make exactly five stops on Wednesday?</li>
 <li>How many different patterns can the truck make exactly eight stops on Friday?</li>
</ul>
Since there are ten possible stops, and on Mondays, the truck only needs to make two stops, surely there must be more possible combinations than on Friday, where the truck must make eight stops. Well, in fact, the number of potential combinations of two and eight stops is the same. On Wednesday, the truck makes exactly five stops, the number out of 10, representing the maximum possible number of combinations of stops. (The correct answer is based on the binomial coefficient of 10 choose r, which is symmetric around five).
People often fail to solve this problem accurately due to our tendency to rely on availability heuristics. The availability heuristic is a tendency to use data familiar to oneself, such as personal experiences and conversations, to judge risks. 
<figure class="figure"><img src="https://assets-us-01.kc-usercontent.com:443/4df0558d-5779-0012-00f2-b3fa06d6c950/9a92b4f9-167f-4032-8b75-71ec73ea056c/Sam%20Kirshner_UNSW.jpg" class="figure-img" alt="Sam Kirshner_UNSW.jpg"><figcaption class="figure-caption">UNSW Business School's Sam Kirshner says that when dealing with subjective decisions, GPT tends to favour lower-risk outcomes and strongly prefers certainty. Photo: supplied</figcaption></figure>
<a href="https://www2.psych.ubc.ca/~schaller/Psyc590Readings/TverskyKahneman1974.pdf" data-new-window="true" target="_blank" rel="noopener noreferrer">Tversky and Kahneman</a> demonstrated availability affecting judgment by giving subjects a visual task of estimating the possible ways a bus can make r stops travelling along a route with ten stops in total. It is easier to picture the bus stopping in many different ways for two than five and five than eight, leading to the fallacy. Although heuristics and decision biases can help us make snap decisions faster, they often lead to suboptimal business decision-making. Thus, an essential part of business is how managers, consumers, and value chain partners' behaviours impact operational and strategic decision-making. 
If you ask ChatGPT to solve this and other problems, it can retrieve the correct formula to deliver the optimal solution. Thus, LLMs like ChatGPT could be a solution for making less biased business decision-making. Ultimately, algorithms do not experience emotions and are not subject to the same cognitive limitations as humans.
<h2>Solving business decisions that involve behavioural tendencies</h2>
Now consider the following problem: You are the operations manager at a warehouse. You have four storage areas: P (perishable Items), NP (non-perishable Items), T (temperature controlled), and NT (non-temperature controlled). Each storage area contains items that are either perishable or non-perishable and are either temperature-controlled or non-temperature controlled. You are given the following rule: “Every storage area that contains perishable items must be temperature-controlled.” Which storage areas must you inspect to ensure the rule is being followed?
<a class="o-btn o-btn--primary" href="https://www.businessai.unsw.edu.au/" target="_blank">Learn more: UNSW Business AI Lab</a>
This problem examines a heuristic in the judgment process is confirmation bias. Confirmation bias is a tendency to seek evidence that could confirm someone’s a priori hypothesis or belief while ignoring evidence that could disprove it. The correct answer is to inspect the perishable and non-temperature-controlled areas, although most people would choose the perishable and temperature-controlled areas. This is because if the statement “if P then Q” is true, to falsify the rule, we need to show that the combination P and not-Q does not exist, which requires us to check Perishable and the Not Temperature controlled rooms. When asked to solve this problem, ChatGPT typically opts to check the temperature-controlled room, falling prey to the bias.
While ChatGPT can be unbiased, it is trained on human language, from which it acquires grammar, factual information, and reasoning skills, but it can also have inherent biases. So, which of the two forces will prevail when ChatGPT is asked to solve business decisions in scenarios where behavioural tendencies impact business decisions?
<h2>A relatively consistent pattern in GPT's decision-making</h2>
The answer to this question is explored in a working research paper I co-authored together with four colleagues, Yang Chen, Anton Ovchinnikov, Tracy Jenkins from Queen’s University, and Meena Andiappan from McMaster University. The paper, <a href="https://dx.doi.org/10.2139/ssrn.4380365" data-new-window="true" target="_blank" rel="noopener noreferrer">A Manager and an AI Walk into a Bar: Does ChatGPT Make Biased Decisions Like We Do?</a>, provides a comprehensive look into how ChatGPT tackles problems. It examines 18 different biases and heuristics, analysing nearly 8000 chat responses to determine when ChatGPT shows bias.
<figure class="figure"><img src="https://assets-us-01.kc-usercontent.com:443/4df0558d-5779-0012-00f2-b3fa06d6c950/920fe64c-a2d0-4a44-9c89-ab3512e90fc3/LLMs%20such%20as%20ChatGPT%20can%20provide%20reliable%20support%20even%20when%20decision%20and%20problem%20contexts%20vary.jpeg" class="figure-img" alt="LLMs such as ChatGPT can provide reliable support even when decision and problem contexts vary.jpeg"><figcaption class="figure-caption">LLMs such as ChatGPT can provide reliable support even when decision and problem contexts vary. Photo: Adobe Stock</figcaption></figure>
The results reveal that GPT's decision-making follows a relatively consistent pattern. When dealing with subjective decisions, GPT tends to favour lower-risk outcomes and strongly prefers certainty. In objective scenarios, GPT's approach is systematic. It first assesses whether a calculable solution is available for the problem. If it identifies a relevant formula, GPT applies it to find an answer. However, when confronted with objective questions that lack a straightforward formulaic solution, GPT employs heuristic-based reasoning, similar to human instinctual responses. 
In many ways, this mirrors human behaviour, except that GPT's ability to recognise and accurately apply formulas surpasses that of humans. These findings suggest that organisations will gain the most by utilising GPT in more objective workflows that rely on established formulas, particularly those where humans might struggle due to limited cognitive capacity.
Despite the high sensitivity to prompt context reported in other studies, where even minor wording changes can significantly alter GPT’s responses, this study found GPT to be remarkably consistent across various contexts. At first glance, this result is surprising, especially given the substantial variations in the vignettes' contexts and lengths. However, GPT's systematic approach to problem-solving likely explains why its decisions remained broadly consistent across different contexts. These findings are encouraging for managers, suggesting that LLMs can provide reliable support even when decision and problem contexts vary.
<a class="o-btn o-btn--primary" href="https://businessthink.unsw.edu.au/subscribe" target="_self">Subscribe to BusinessThink for the latest research, analysis and insights from UNSW Business School</a>
<h2>How ChatGPT-3.5 compares against ChatGPT-4</h2>
The research also compares ChatGPT-3.5 with ChatGPT-4. Choosing between GPT-3.5 and GPT-4 involves balancing cost and performance, a consideration for managers integrating GPT into their organisations. The free version of ChatGPT uses the GPT-3.5 model, while the premium and team versions use GPT-4. The results show that GPT-4 outperforms GPT-3.5, particularly in solving problems with objective solutions. 
This suggests that managers should evaluate the trade-offs between model performance and cost based on the tasks at hand. The detailed comparison between GPT-3.5 and GPT-4 also gives researchers insights into the evolution of GPT's ability to handle decision biases. While GPT-4 is more accurate overall, its behavioural tendencies have also become more pronounced.
As LLMs like ChatGPT take on roles as advisors or even delegates in various operational tasks, understanding their biases becomes crucial: whether they act optimally, reflect human biases, or exhibit entirely different biases from human decision-making.
<a href="https://www.unsw.edu.au/business/our-people/sam-kirshner" data-new-window="true" target="_blank" rel="noopener noreferrer">Sam Kirshner</a> is an Associate Professor in the <a href="https://www.unsw.edu.au/business/information-systems-technology-management/about-us" data-new-window="true" target="_blank" rel="noopener noreferrer">School of Information Systems and Technology Management</a> at UNSW Business School. He also works with <a href="https://www.businessai.unsw.edu.au/" data-new-window="true" target="_blank" rel="noopener noreferrer">The UNSW Business AI Lab</a>, which partners with organisations to discover new business opportunities, overcome challenges to AI adoption and accelerate the next generation of leaders and entrepreneurs. For more information, please contact <a href="https://www.unsw.edu.au/staff/mary-anne-williams" data-new-window="true" target="_blank" rel="noopener noreferrer">Professor Mary-Anne Williams</a>.

Comments

Research spotlight

featured

Managers versus ChatGPT: who is more buzzed by bias?

Solving business decisions that involve behavioural tendencies

A relatively consistent pattern in GPT's decision-making

How ChatGPT-3.5 compares against ChatGPT-4

Republish

Related

How social connections and lifestyle can reduce dementia risk

AI-powered marketing: From the classroom to the boardroom

Beyond black box AI: Pitfalls in machine learning interpretability

Find an expert

Connect