Reining in the BS in AI

Reining in the BS in AI

Large language models trained on questionable stuff online will produce more of the same. Retrieval augmented generation is one way to get closer to truth.

Credit: alberto clemares exposito / Shutterstock

Even people not in tech seemed to have heard of Sam Altman’s ouster from OpenAI on Friday. I was with two friends the next day (one works in construction and the other in marketing) and both were talking about it. Generative AI (genAI) seems to have finally gone mainstream.

What it hasn’t done, however, is escape the gravitational pull of BS, as Alan Blackwell has stressed. No, I don’t mean that AI is vacuous, long on hype, and short on substance. AI is already delivering for many enterprises across a host of industries. Even genAI, a small subset of the overall AI market, is a game-changer for software development and beyond. And yet Blackwell is correct: “AI literally produces bullshit.” It makes up stuff that sounds good based on training data.

Even so, if we can “box it in,” as MIT professor of AI Rodney Brooks describes, genAI has potential to make a big difference in our lives.

‘ChatGPT is a bullshit generator’

Truth is not fundamental to how large language models function. LLMs are “deep learning algorithms that can recognise, summarise, translate, predict, and generate content using very large data sets.” Note that “truth” and “knowledge” have no place in that definition. LLMs aren’t designed to tell you the truth. As detailed in an OpenAI forum, “Large language models are probabilistic in nature and operate by generating likely outputs based on patterns they have observed in the training data. In the case of mathematical and physical problems, there may be only one correct answer, and the likelihood of generating that answer may be very low.”

That’s a nice way of saying you might not want to rely on ChatGPT to do basic multiplication problems for you, but it could be great at crafting an answer on the history of algebra. In fact, channeling Geoff Hinton, Blackwell says, “One of the greatest risks is not that chatbots will become super intelligent, but that they will generate text that is super persuasive without being intelligent.”

It’s like “fake news” on steroids. As Blackwell says, “We’ve automated bullshit.”

This isn’t surprising, given the primary sources for the LLMs underlying ChatGPT and other GenAI systems are Twitter, Facebook, Reddit, and “other huge archives of bullshit.” However, “there is no algorithm in ChatGPT to check which parts are true,” such that the “output is literally bullshit,” says Blackwell.

What to do?

‘You have to box things in carefully’

The key to getting some semblance of useful knowledge out of LLMs, according to Brooks, is “boxing in.” He says, “You have to box [LLMs] in carefully so that the craziness doesn’t come out, and the making stuff up doesn’t come out.” But how does one “box an LLM in?”

One critical way is through retrieval augmented generation (RAG). I love how Zachary Proser characterises it: “RAG is like holding up a cue card containing the critical points for your LLM to see.” It’s a way to augment an LLM with proprietary data, giving the LLM more context and knowledge to improve its responses.

RAG depends on vectors, which are a foundational element used in a variety of AI use cases. A vector embedding is just a long list of numbers that describe features of the data object, like a song, an image, a video, or a poem, stored in a vector database. They’re used to capture the semantic meaning of objects in relation to other objects. Similar objects are grouped together in the vector space. The closer two objects, the more similar they are. (For example, “rugby” and “football” will be closer to each other than “football” and “basketball”). You can then query for related entities that are similar based on their characteristics, without relying on synonyms or keyword matching.

As Proser concludes, “Since the LLM now has access to the most pertinent and grounding facts from your vector database, it can provide an accurate answer for your user. RAG reduces the likelihood of hallucination.” Suddenly, your LLM is much more likely to give you a true response, not merely a response that sounds true. This is the sort of “boxing in” that can make LLMs actually useful and not hype.

Otherwise, it’s just automated bullshit.

Follow Us

Join the newsletter!


Sign up to gain exclusive access to email subscriptions, event invitations, competitions, giveaways, and much more.

Membership is free, and your security and privacy remain protected. View our privacy policy before signing up.

Error: Please check your email address.
Show Comments