In this month’s column, we return to our discussion on natural language processing.
In the last two columns, we have been discussing computer science interview questions. This month, let’s continue our ongoing discussion on natural language processing (NLP) by looking at one of its emerging applications.
Automation is one of the areas likely to benefit from NLP techniques and tools. Many of us have interacted with customer service agents from different sectors such as e-commerce, banking, hospitality, etc, over the phone, via email or chat, at all hours of the day, 24×7. Some of these communication channels are in real-time, such as the phone or chat, while others are offline, such as email.
Today, while these services are provided by human agents, there is an increasing push towards automation. The chat typically starts off with the agents greeting the customer and asking what they can do for the latter. The customers then state their problem or put forth their query, which is then followed by the agents asking for more details from the customers to authenticate their identity, obtain the order information, and then provide steps to answer the query or solve the problem. This is an interactive conversation which has a typical structure and follows a well-defined pattern of dialogue between the customer and agent. The agents are typically provided with a ‘FAQ’ or ‘problem-solution’ manual in which information on resolving customer queries is systematically documented, with details on each step the agent needs to follow. The conversations are audited to ensure that the agents were courteous in their communication with customers, that they follow the appropriate problem-resolution script, and that they use the appropriate greetings at the start and end of communication.
Let us consider a simple scenario. You have ordered an item on an e-commerce website and have not yet received it. So you are using the Web chat interface to speak to the customer service agent. After the customary greeting, you tell the agent that you have ordered item XYZ one week back but it has not yet been delivered. The agent then asks you for the order details. He then apologises to you for putting you on hold while he looks up the backend database. Once he has the details, he either mentions that, “The item has already been shipped and you can expect to receive it on date XX-YY-ZZ,” or he tells you that the delivery has been delayed and gives you the reason for the delay, etc. Now, if you analyse the many thousands of such conversations that take place every day, you can find that they follow a typical pattern. From that, the crucial next question that springs to mind is, “Can these queries be answered by an automated agent instead of a human agent?”
Well, the answer is yes, and NLP comes to our rescue in automating these communication services. While at first glance it seems quite difficult to replace the human-to-human communication with human-to-virtual agent communication, it is indeed possible. In many cases, virtual agents provide initial communication support for simple queries. However, human agents can take over in case the conversation becomes complex and deviates a lot from the normal script.
There are two key components to a solution to virtualise the communication between human customer and virtual agent. First, the human customer’s part of the conversation needs to be analysed and then the appropriate response and follow-up question, if any, needs to be generated in natural language format and fed to the virtual agent. So both natural language processing and natural language generation need to be done. The key challenge is to ensure that the conversational experience does not deteriorate by automating the process and the human customer does not feel dissatisfied (of course, the ideal experience would be when the human customer cannot detect whether he is talking to a virtual agent or a human agent).
Let us analyse the problem in a little more detail, focusing first on the natural language processing part. Let us assume that we are given a database of thousands of chats that have happened between human agents and human customers. We are now asked to build a system where it is possible to train virtual agents that can take the place of human customer service agents in answering customer queries. As we mentioned before, the communication can take place over either the phone, chat, or email. The voice communication is further complicated by the fact that speech-to-text conversion accuracies are still limited when it occurs in real-time. Voice communication also requires that the speaker’s conversational cues such as emotion, pitch and tone should be detected, and the appropriate response cues should be used by the virtual agent. This is a difficulty we will set aside for now and focus on text based communication only, namely chat/email.
Virtual agent email replies to customer queries are simplified by the fact that these can be driven by an offline process, and the replies generated by the virtual agent can be subjected to random human reviews and various other checks before they are shared with human customers, since this is not done in real-time. On the other hand, chat communication needs to be done in real-time and hence requires a shorter response time, which doesn’t allow any offline review of the virtual agent’s communication and thereby requires greater precision and accuracy.
What are the key problems that need to be addressed in analysing the online chat? Let us make some simplifying assumptions. We will assume that the agent starts the dialogue with the standard greeting, which is immediately followed by the customer explaining what the problem is. The first step is to identify the category of the problem/query associated with a specific customer. There can be different problem/query categories for an e-commerce website, such as queries to track an item, cancellation of a purchase, delay in refunds, an address change for a customer, etc.
We will assume that there are a fixed set of problem categories. So this step gets reduced to the following problem: Given a small piece of text, the problem is to classify it into one of the N known problem categories. This is a well-known document classification problem. Given a short document, can you classify it into a known category? If we assume that there are already annotated conversations where the problem category has been identified, we can build a supervised training based classifier to classify a new incoming document into one of the known categories.
Here is a question for our readers: If you are asked to come up with a supervised classification scheme for this problem, given annotated data of conversations and their problem categories, (a) What would be the features you will use? (b) Which supervised classifier will you use? Remember that this is a multi-class classification problem, since there are many problem categories. Now let us make the assumption that we have access to the corpus of a bunch of conversations, but the conversations are not annotated with the problem category. Can you describe an unsupervised technique with which you can still determine the problem category of an incoming chat, given the corpus?
Now let us assume that we have correctly identified the problem category. As we said before, human agents are typically given access to the solutions database or FAQs, which they consult to find the answers for each problem category. They use these answer templates and provide the response to the user. If we don’t have access to this solutions database, is it possible for us to analyse the conversations corpus and automatically identify the answer template for each problem category? First, we cluster the chats as per their problem category. Now, we have a subset of chats belonging to each problem category. Given this subset S for problem category P, how do we find out what is the possible answer template script for this problem category? Note that the human agent verbalises the answer template in a suitable form for a specific customer, personalising it for each customer. Also, given that it is an interactive dialogue, there can be considerable noise introduced for each customer, based on the specific query formulation. One possibility is to analyse the conversational snippets uttered by the agent as part of the chat, and find out the common sequence of actions suggested by the agent across these conversations.
Here is a question for our readers: Given a corpus of conversational chats corresponding to a specific problem category, marked with agent utterance and customer utterance in the dialogue, you are asked to find out the underlying common action sequence, if any, suggested by the agent. Let us make some simplifying assumptions. Let us assume that such an action sequence definitely exists in all the conversations. We will also assume that the verb phrases used in the agent conversation denote the steps of the action sequence. Let us assume that each unique verb in our vocabulary can be represented by a unique character. Given that, we can represent this as a simpler problem: You are given N sequences of character strings (each string represents the action sequence of verbs in one chat each). You are then asked to find the longest common sub-sequence of all these sequences. Can you come up with an algorithm? Please note that we are looking for the longest common sub-sequence across N multiple sequences where N can be greater than two. Can you come with a solution which runs in polynomial time? Please send us your responses in the comment section below.