Exploring AI Information Retrieval: A Search Engine Expert Discusses the Pros and Cons of Allowing ChatGPT and Related Technologies to Browse the Inte

21 Mar 2023

Before search engines became popular, people relied on librarians and experts in specific subjects or searches to provide them with relevant information. This model of information access was more personal, interactive, and trustworthy as well as being completely transparent. Nowadays, the vast majority of people use search engines to find information but simply typing out a few keywords and relying on an algorithm to rank and display results is not always the best approach.

Cutting-edge information access systems that operate on artificial intelligence (AI), such as Bing/ChatGPT by Microsoft, Bard by Google, and LLaMA by Meta, are revolutionizing the conventional search engine technique of input and output. These advancements enable the utilization of entire sentences and even paragraphs as input, resulting in personalized, natural-sounding responses in language.

When you first see it, this may appear to be an ideal combination: unique responses that are friendly and knowledgeable, along with the vast expanse of information available on the internet. However, as someone who specializes in analyzing search and suggestion systems, I think the situation is far from clear-cut.

The use of AI systems such as ChatGPT and Bard is based on expansive language models. These models are created using machine learning techniques which analyze a significant amount of written content from various sources like Wikipedia and PubMed articles to understand patterns. Simply put, language models anticipate the most probable word to follow a group of words or a sentence. With this, they can generate corresponding responses, paragraphs, and even entire pages in response to specific user queries. Recently, OpenAI unveiled the latest version of GPT-4, an advanced technology that can process visual and written information. Microsoft also plans to incorporate this technology into its conversational Bing search engine.

This information retrieval method works really well because of advancements in machine learning techniques, including training on extensive text bodies and fine-tuning. With systems based on huge language models, users receive personalized answers to their information requests. The success of ChatGPT is evident with 100 million users reached in a fraction of the time it took TikTok to achieve that. ChatGPT users have utilized the system to do more than just find answers, with the ability to generate diagnoses, make diet plans, and even develop investment strategies.

Nonetheless, there are numerous disadvantages to large language models. To begin with, it's essential to explore the fundamental aspect of such models – the mechanism that links words together and possibly their interpretations. As a result, generated responses may seem intelligent, but these AI systems just repeat statements without truly comprehending their meaning. In other words, the smart responses are merely a set of word patterns that the system has discovered in relevant situations.

Large language model systems can be prone to creating fabricated or "hallucinatory" responses due to their limitations. These systems also lack the intelligence to recognize when a question is based on incorrect assumptions, causing them to provide erroneous answers. As an illustration, if asked to identify the U.S. president whose face is featured on the $100 bill, ChatGPT could inaccurately respond with "Benjamin Franklin," despite Franklin never being president and the faulty premise that a U.S. president's likeness is on the said currency note.

One issue is that these systems may be inaccurate 10% of the time, but it is unclear which 10%. Additionally, individuals are not able to easily verify the accuracy of the system's output. The root of this problem lies in the fact that the systems are not transparent, meaning they do not disclose what information they have been trained on, which sources they have used to make decisions, or how they generate their responses.

You can request ChatGPT to produce a technical report that includes citations. However, there's a likelihood that these citations may be invented by the system, wherein it makes up the titles of academic works and their authors. Moreover, the accuracy of the responses generated by these systems is not checked. Thus, the burden of validating these responses falls on the user, who may not possess the necessary motivation or abilities to do so, or may not even realize the importance of verifying the AI's responses.

Content Theft and the Implications on Website Traffic

Insufficient transparency may cause harm to both users and the originators of content, including authors, artists, and creators. Machine systems acquire information from these creators without revealing the sources or giving proper credit. In most cases, the creators are not compensated, acknowledged, or allowed to grant their permission.

There is also a financial aspect to this issue. In a usual search engine setting, the outcomes display with the links to the origins. This approach enables the consumer to confirm the responses and provides credit to those sources. Moreover, it leads to increased traffic for those sites, which is a vital source of their income. However, due to the nature of large language-model systems that provide uncomplicated answers without indicating their sources, these websites are likely to experience a downturn in their revenue streams.

Eliminating Opportunities for Learning and Discoveries by Chance

In the end, this novel approach of obtaining knowledge may have negative effects on individuals by robbing them of the opportunity to expand their horizons. The usual process of searching enables users to delve into various options regarding their information requirements, prompting them to refine their objectives. It also grants them the chance to discover new knowledge and understand relationships between different pieces of information to achieve their goals. Additionally, it provides room for unintended discoveries and fortunate occurrences.

These factors hold significant value for the search function. However, if the system generates outcomes without revealing its origins or leading the user towards a specific method, it deprives them of these opportunities.

Big language models are a significant advancement for accessing information. They allow for natural, conversational interactions and can generate tailored responses while uncovering insights and patterns that may be difficult for the average user to discern. However, their effectiveness is hindered by the way they learn and produce responses, which can result in incorrect, harmful, or prejudiced answers.

Although similar problems may arise with other data retrieval techniques, it's worth noting that big language model artificial intelligence systems also have issues with transparency. Furthermore, their automated verbal replies can lead to a mistaken belief in their credibility and power, which can be perilous for unaware individuals.