The AI ecosystem
- 11 May, 2015 22:56
Interest and press around artificial intelligence (AI) comes and goes, but the reality is that we have had AI systems with us for quite some time. Because many of these systems are narrowly focused (and actually work), often times they are not thought of as being AI.
For example, when Netflix or Amazon suggests movies or books for you, they are actually doing something quite human. They look at what you have liked in the past (evidenced by what you have viewed or purchased), find people who have similar profiles, and then suggest things that they liked that you haven't seen yet. This, combined with knowing what you last viewed and the things that are similar to it, enable them to make recommendations for you. This is not unlike what you might do when you have two friends with a lot in common and use the likes and dislikes of one of them to figure out a gift for the other.
Whether these recommendations are good or bad is not the point. They are aimed at mirroring the very human ability to build profiles, figure out similarities, and then make predictions about one person's likes and dislikes based on those of someone who is similar to them. But because they are narrowly focused, we tend to forget that what they are doing is something that requires intelligence, and that occasionally they may be able to do it better than we do ourselves.
If we want to better understand where AI is today and the systems that are in use now, it is useful to look at the different components of AI and the human reasoning that it seeks to emulate.
So what do we do that makes us smart?
Sensing, reasoning & communicating
Generally, we can break intelligence or cognition into three main categories: sensing, reasoning and communicating. Within these macro areas, we can make more fine-grained distinctions related to speech and image recognition, different flavors of reasoning (e.g., logic versus evidence-based), and the generation of language to facilitate communication. In other words, cognition breaks down to taking stuff in, thinking about it and then telling someone what you have concluded.
The research in AI tends to parallel these different aspects of human reasoning separately. However, most of the deployed systems that we encounter, particularly the consumer-oriented products, make use of all three of these layers.
For example, the mobile assistants that we see today - Siri, Cortana and Google Now - all make use of each of these three layers. They use speech recognition to first identify the words that you have spoken to the system, and then they capture your voice and use the resulting waveform to recognize a set of words. Each of these systems uses it own version of voice recognitionwith Apple making use of a product built by Nuance and both Microsoft and Google rolling out their own. It is important to understand that this does not mean that they comprehend what those words mean at this point. They simply have access to the words you have said in the same way they would if you had typed them into your phone.
For example, they take input like the waveform below and transform it into the words "I want pizza!"
The result of this process is just a string of words. In order to make use of them, they have to reason about the words, what they mean and what you might want, and how they can help you get what you need. In this instance, doing this starts with a tiny bit of natural language processing (NLP).
Again, each of these systems has its own take on the problem, but all of them do very similar things with NLP. In this example, they might note the use of the term "pizza," which is marked as being food, see that there is no term such as "recipe" that would indicate that the speaker wanted to know how to make the pizza, and decide that the speaker is looking for a restaurant that serves pizza.
This is fairly lightweight language processing driven by simple definitions and relationships, but the end result is that these systems now know that the speaker wants a pizza restaurant or, more precisely, can infer that the speaker wants to know where he or she can find one.
This transition from sound, to words, to ideas, to actual user needs, provides these systems with what they require to now plan to satisfy those needs. In this case, the system grabs GPS info, looks up restaurants that serve pizza and ranks them by proximity, rating or price. Or if you have a history, it may want to suggest a place that you already seem to like.
Once all of this is done, it is a matter of organizing the results in a sentence or two this is a process called natural language generation, or NLG. These words will then turn into sounds (speech generation).
Broad AI, narrow AI
The interesting thing about these systems is their mix between broad and narrow approaches to AI. Their input and output -- speech recognition and generation -- are fairly general, so they are all pretty good at hearing what you say and giving voice to the results.
On the other hand, each of these systems has a fairly narrow set of tasks they can perform, and the actual reasoning they do is to decide which tasks (find a restaurant or find a recipe) they can accomplish. The tasks themselves tend to be search or query-oriented, sending requests for information to different sources with different queries based on text elements grabbed from the speech. So the real smarts inside these systems is essentially answering the question, "What do you want me to do?" by identifying the terms that indicate your wishes.
These systems tend to be brittle in that they know about a small number of tasks and how to decide between them, but, as we've all experienced, if you ask for something outside of their expertise, they really don't know what to do. Fortunately, when they are confused, they default to their respective search engines which at least provide search results.
These systems are just one class of animal in the new AI ecosystem, but you can see how the mix of elements plays out to provide powerful services. High-end speech recognition and generation supports interaction. Simple language processing extracts terms that drive a term-based decision model, which, in turn, figures out what you have requested and thus what task to perform. And, finally, a lightweight natural language generation model is used to craft a response. Each of these is a combination of intelligent functionalities that come together to create integrated systems that can genuinely understand your needs and provide desired services.
AI's capabilities around sensing, reasoning and communicating will be a dominant recurring theme that we will continue to explore. Next, I will discuss systems in which intelligence rises out of multiple, and sometimes competing, components.