Digital Assistants Cannot Truly Mimic Human Communication

Google shifted its focus shortly after the financial statements for the second quarter of 2022 were released last July. They fell short of revenue projections. This failure, according to Google CEO Sundar Pichai, was caused by a “mismatch between the number of employees we have and our productivity.”

Google then guerrilla cut operational costs in all lines of business to reflect the shift in the company's focus. They abandoned the Loon project, halted development of the Pixel laptop, and, most shockingly, sentenced Stadia to death by dwarfing the "Area 120" capacity.

Google is reportedly becoming more passionate about cutting operational costs after releasing the 7th generation of Pixel phones in early October, as reported by Ron Amadeo for Ars Technica. They intend to limit the availability of Google Assistant to hardware manufactured by Google and not devices manufactured by Google’s competitors.

This shocking news appears to be true. The reason for this is that, after testing ad placement, it was discovered that there are no ads in Google’s digital assistant. Google receives no revenue from this service.

In fact, despite its appearance as a simple service, Google Assistant falls under the category of “high technology,” which is extremely difficult to create or develop. Although Google Assistant appears to be superior to similar services such as Siri, Alexa, and Cortana, it is not perfect, which requires high operational costs to perfect.

Stanley Kubrick, the Wayfinder

A computer called HAL was born over 50 years ago, in the icy hands of Stanley Kubrick’s film 2001: A Space Odyssey (1968). This computer is programmed to communicate in a soothing and sympathetic tone. It is said to have been tested for the first time in 1992 in a laboratory in Urbana, Illinois, USA.

HAL can communicate perfectly and understand the intent of humans who invite or are invited to communicate. According to John Seabrook in Hello, HAL: Will We Ever Get a Computer We Can Really Talk To?, science fiction in the computer field makes “all computer experts fantasize about being able to make HAL a reality.”

This film also “provoked Bill Gates, the figure behind the proliferation of computers in the world community, predicting that speech recognition (or the technology behind HAL) will become the next big thing in the world of computers in the future,” according to Seabrook.

Following the release of 2001: A Space Odyssey (1968), computer experts competed to create HAL. Foremost, it is translated by introducing the interactive voice response system, or IVR. Instead of allowing people to speak directly to customer service representatives, this telephone-based system instructs them to listen to the robot’s voice, which appears to interact naturally.

They have the option of pressing buttons 1, 2, or 3 based on the FAQ (frequently asked questions) that have been sorted. However, IVR gives up by directing the public to contact the customer service office directly by pressing the button 0 first.

However, IVR is evolving slowly. Sync emerged as an embedded system in the car because of the collaboration between Ford, Microsoft, and Nuance. This system uses the user’s voice to control the iPod or asks the navigation system to direct them to a destination.

IVR evolved into voice command, which was then perfected by Google with the creation of the 411 Service and Apple with Alex. These two voice-command systems use natural-language technology or allow users to command using different word expressions.

Siri was born from IVR, which evolved into voice command with the release of the iPhone 4S in 2011. The next digital assistants to appear were Google Assistant, Alexa, and Cortana. In Your iPhone Is Listening, Jacob Aron mentions that Siri and all digital assistants, aside from using natural language, are built using a technology known as active ontologies.

This technological innovation “restricts user queries (keywords spoken) to specific areas such as food or weather [...] Siri then accesses the database of information it has based on these restricted queries or shoots the information through an increasingly sophisticated application programming interface. The web is exploding with queries to provide answers and responses.”

Active ontologies are expanding and allowing users to ask questions with different queries thanks to the use of big data and neural networks. Siri and all digital assistants are spectacular technologies because, for the first time, the diversity of questions and answers between the user and the digital assistant can be done almost perfectly using active ontologies.

Unfortunately, active ontologies are essentially just a “win-win solution” presented by Apple, Google, Microsoft, and Amazon. The reason for this is that when a user asks Siri or Google Assistant a question that is not contained in active ontologies, the digital assistant goes blank or cannot respond.

In addition, unlike HAL in 2001: A Space Odyssey (1968), Siri/Google Assistant/Alexa/Cortana cannot interact perfectly when inviting or being invited to chat with users. Why can’t Siri/Google Assistant/Alexa/Cortana interact perfectly with their (human) users more than a half-century after Stanley Kubrick’s HAL debuted?

The solution is straightforward. Humans’ ability to communicate with one another is difficult for computers or robots to replicate. Yes, computers/robots today have mechanically mimicked how humans produce sound and hear.

Wolfgang von Kempelen, a Hungarian scientist, has built a machine that can speak by imitating the human vocal tract since the end of the 18th century. This machine generates the sound of exhaling while undermining from the lungs’ diaphragm to vibrate tiny membranes known as vocal cords.

Although the technique of how humans listen to sound is difficult to imitate because it requires signal processing to convert airwaves into electrical impulses, computers and robots have done so two centuries ago.

However, success in mimicking how humans make and listen to sounds is insufficient. Because, for starters, humans speak different languages and thus have different voice characteristics or accents. No matter how good a computer’s ability to mimic human hearing is, the human ear is extremely sensitive and capable of distinguishing between hot and cold coffee simply by listening to the sound produced when the coffee is poured.

Second, computers or robots have no common sense, as Matthew Hutson explains in Can Computers Learn Common Sense? For example, if a human complains to a friend about a problem he can’t solve despite the friend’s warning, the friend can respond with “I told you!” Of course, computers and robots do not understand the meaning of “I told you.”