The Limitations of AI: Reasoning

An exploration on why AI hasn't cracked enterprises, specifically financial institutions. I attempt to explore the core limitation of AI: reasoning.

Nov 27, 2024

Introduction

I recently was interviewed by a friend for her class assignment on technology in the workplace. Given my summer internship in investment banking, she asked me how technology / artificial intelligence changed how I worked. My answer?

It didn’t really change.

Reflecting on this summer, a lot of my processes could’ve been automated. Certainly, AI would’ve made my life significantly easier had I been able to access ChatGPT from my work computer. But I wasn’t able to.

This made me question on why this is? Why hasn’t AI been more readily adopted in enterprises / in the financial services industries? My answer is that 1) compliance and culture and most importantly 2) AI is simply not good enough yet (due to their inability to reason).

AI x Financial Services

Before continuing, the data is quite interesting. NVIDIA’s fourth annual State of AI in Financial Services Report found that overwhelming 91% of financial services companies are either assessing AI or already using it in production. They’re using it to drive innovation, improve operational efficiency and enhance customer experiences. I dug a little deeper to found out exactly where the use case for AI is in the industry.

NVIDIA State of AI in Financial Services: 2024 Trends

As expected, the use cases for Generative AI is less than 50%. This tells me that financial institutions are using AI in areas where errors are unlikely to occur (pattern matching which is what transformers are great at — which is corroborated by the data on the chart above). AI, however, isn’t great at reasoning and explains why their generative use cases are severely underutilized. Digging a little deeper, I also found that the biggest challenges that remain in achieving a company’s AI goals is because of data issues. Companies struggle with data privacy, sovereignty, etc. With that said, the data supports my hypothesis on the two reasons why I think adoption is limited.

1) Compliance and Regulatory Concerns Limits AI Adoption

As evidenced in the NVIDIA report, privacy has been a huge concern for financial institutions when considering adoption. This is corroborated with my conversations that I had with full time analysts this summer. The bank’s main concern was feeding company proprietary information into these LLMs. They were particularly wary of how AI models process and learn from data, fearing that sensitive financial information could be inadvertently exposed or misused. This concern is not unfounded; the financial services industry is heavily regulated, and any breach of data privacy could lead to severe penalties and reputational damage.

Moreover, the culture within financial institutions often prioritizes risk aversion over innovation. Many firms have established protocols that are deeply ingrained in their operations, making it difficult to pivot towards more agile, tech-driven approaches. This cultural inertia can stifle creativity and slow down the integration of AI technologies, even when the potential benefits are clear. This is especially true at an investment bank. The MDs on these deal teams had to endure being an analyst themselves, and would much rather see their analyst work hard in the office until 2AM then leave at 5PM because AI can turn all of their comments.

2) AI is Simply not Good Enough

I wanted to focus the majority of this article to explain why AI is simply not there yet. We’ve focus heavily on the headline news on how AI is growing rapidly and how it’s only becoming better and better over time. While true, there are certainly limitations with AI. For example, I always end up double checking my work whenever I leverage AI for my own growth equity investment workflow. I haven’t established this level of trust yet with any AI model, and this because of hallucinations.

LLMs begin to hallucinate when performing tasks that require “reasoning.” Why is this the case? Several critics point to the underlying transformer architecture as the cause for these limitations. “A transformer model is a neural network that learns context and thus meaning by tracking relationships in sequential data like the words in this sentence.” To put it in plain English, it’s a type of technology that helps computers understand and generate human language. It does this through 1) positional encoding and 2) self-attention.

Let me explain this without going into the nitty gritty. Starting with positional encoding! We all know word order matters — this is what makes syntax so important (shoutout Linguistics class)! That said, “the mouse chased the cat” vs “the cat chased the mouse” are very different sentences. However, transformers don’t know this because they look at all the words at the same time, not one by one. Positional encoding is giving each word a name tag that tells the model its position in the sentence. Moving to self-attention, this helps the model decide which words are the most important for each other. Let’s take a look at this sentence as an example: “the cat chased the mouse, and it ran away.” The “it” looks at all the other words and realizes it’s connected to the word “mouse.” With that said, the transformer model means that these AI models are pretty terrible at reasoning because they don’t understand the meaning of these words like humans do.

This lack of formal reasoning limits adoption especially in the financial services industry where there’s no tolerance for error (trust me… you don’t want to make an error in banking, especially as an intern).

What about the release of OpenAI’s o1?

I remember when o1 first came out, everyone was excited for it’s “human-like reasoning.” Unlike previous models, o1 emphasized inference-time scaling rather than the pre-training. Pre-training is the initial phase in developing large language models (LLMs) where the model is trained on vast datasets containing diverse text. During this phase, it utilizes unsupervised learning to predict the next word in a sentence, allowing it to learn language patterns and contextual relationships, forming a strong foundational understanding of language. Inference-time scaling, in contrast, focuses on optimizing how the model generates responses in real time, placing emphasis on enhancing reasoning during the output generation phase rather than just during pre-training. However, these claims were quickly challenged by researchers from Apple. This led to a broader conversation of how AI models might’ve hit a plateau.

What does this mean for AI adoption?

One problem means there’s room for a new innovation to be the solution. There’s opportunity for a new architecture to challenge the transformer’s reign, especially as we see more of a need for reasoning capabilities.

References:

Next Big Teng
NVIDIA Transformer Model
NVIDIA AI in Financial Services
Decoder Apple AI Researchers Questions OpenAI
ChatGPT, Perplexity, and Unriddle AI were used in the research/writing of this article