Makes e-commerce replies warmer and closer to users

2021.12 Team innovation proposal

Han Xinyi (designer) | Xu Bai (designer) | Qiyue (algorithm)

Communication is an art and a norm.

本方案的背景是什么？

需求痛点：

不同人群的服务语言风格体系不同，需要转变语言风格、话术表达都需要针对性的调整；在线聊天对话进行客户服务的工作人员，在面对形形色色不同的服务对象时候可能以下痛点：

需要服务人员非常专业的转换经验，新手较难把控，表达不恰当很有能回造成误解；
转换聊天风格对于服务人员来说耗时耗心力，需要去反复修正语言措辞。

机会洞察：

越来越多的客户服务者需要通过在线即时会话的沟通进行工作的洽谈和对接，提供服务和信息，对于服务的语言风格的要求会越来越高，好的会话者能够通过自身语言“艺术”获取成功，经验匮乏的会话者可能会引起误会和资损。
针对语言的修正，有一定成熟的nlp算法训练方案和相似的产品，正在致力于帮助用户解决语言表达的问题。该方案运用类似的智能算法技术，可以实现对于语气等的情感化表达的修正，对于有这类精准贴合表达或者风险风控较高的会话场景，会很有帮助。

行业竞品信息

相关产品：火龙果 / grammarly / outwrite

竞品与本方案相似的方案及其缺点

方案DEMO展示

第一步：设置服务模式

进入设置面板进行设置，该自定义语言模式的设置会对语言的语气表达和图形文字排版标版式产生影响，并且可以提供行业敏感词预警提供优化修正的建议

第二步：实时会话优化建议

在进入聊天会话打字过程中，点击插件icon、「A」，即刻进行语气纠错，用户可以根据建议进行自行修正或者键替换。

【附】语言模式设置参数：
服务对象：老人、小孩、女人、男人
服务场景：电商、医疗、政法、教育、保险、管家
感情色彩：热情的、幽默的、傲娇的、贴心的、活泼的、搞怪的、冷静的、中立的、严肃的
语气分量：设置该语气的程度
表现形式：中立、适合老年人的、适合小孩的

与竞品的差异对比

1、训练针对中文的语言模型——补充除了语法纠错之外的语气情绪等更加侧重语言风格表达的纠正

2、针对实时会话的聊天表达引导，而非文档书面表达建议——在打字聊天对话的同时可以即使使用，插件激活在打字区域，当需要检查的时候点击按钮即可查看修正提示

本方案的关键技术创新点

1、问题定义：

给定一段为经修饰的文本，模型根据预先设定的语言模式参数对文本进行处理，输出一段经过模式修订的文本。

2、相关研究：

在学术界中属于文本风格迁移任务。文本风格迁移根据方法的监督性可以划分为Supervised Style Transfer和Unsupervised Style Transfer。

2.1 Supervised Style Transfer

监督式的文本风格迁移方法多是基于NMT的encoder-decoder框架。有些工作会在此基础上采用多任务的方式，比如在做风格迁移的时候，进行风格的分类任务或者联合翻译任务、语法纠错任务等。

2.2 Unsupervised Style Transfer

Entangled methods

Entangled方法不将风格向量和语义向量解耦分离，而是直接对整个语句或语句表达的向量进行修正，从而实现风格迁移。

下图是一种Entangled的方法，它通过预先训练的风格分类器，在inference阶段对latent space通过梯度传导进行修改，使得修改后的向量能够被分类器识别为另一种风格，之后再将该向量进行解码，生成“另一种风格”的文本。

Disentangled methods

Disentangled方法可以大致概括为三个步骤，将原风格文本x编码到一个隐向量表示z；对z中的语义和风格进行解耦，从z中剥离原始风格；结合目标风格对z进行解码，得到包含目标风格的文本x'。Disentangled方法需要考虑如何在latent space中分离出style information和content information，即latent space由两个相互独立的 style space与content space构成。这样就需要：

生成内容是一个文法自然的句子
确保style相关信息保留在style space

确保content相关信息保留在content space

下图为Style Transformer的方法，在训练过程中，加入x和s的风格一致，则计算x和y的重构损失；如果x和s风格不一致，则x生成的y_hat需要重构回y，计算x和y的重构损失，并且需要判别y_hat的风格类别。

Delete-Retrieve-Generate

DRG（Delete-Retrieve-Generate）是比较有代表性的一种无监督方法，同样也分为3个步骤：

Delete：在原句子中删除风格词汇a
Retrieve：从数据库中检索有目标风格的词汇a'

Generate：在保障通顺性的前提下，将a'填入原句子（生成、槽填充等方法）

3、技术方案：

基于Transformers代码仓库的BART模型可以实现我们的能力，下面将从数据和具体的模型结构上进行阐述。

数据

3.1 数据格式

模型所需要的输入输出格式如下，因此在构建模型训练数据的时候，按照句子对的形式进行构造< 原始句子，语言模型参数，改写句子>

输入：原始句子 + 语言模式参数

输出：改写句子

算法方案需要的数据可以从两方面进行采集

给定原始句子以及语言模型参数，众包人员按照不同的语言模式参数进行改写句子的撰写。
日常客户服务的过程中，服务过程进行沉淀记录，往往回积累很多优质案例，通过机器学习的方式可以对现有的文本进行挖掘，以此来构成模型需要的句子对。

3.2 模型结构

在模型设计上有两种方案

方案一：原始句子->语言模式参数限制->改写句子

模型结构为End2End结构，将原始句子和语言模式参数进行统一建模，模型在接收到原始句子和语言模式参数后，直接经过推理得到改写后的句子，这种方案的优点是简单快速，缺点是需要大量的标注数据才能达到一个比较好的效果。

方案二：原始句子-> 核心词抽取-> 核心词改写短句->融入特定语言模式参数限制的标记->改写句子

区别于上述方案，在模型整体推理链路中加入了对于核心词的抽取和改写，深度学习模型其中的一个难点就是训练数据的获取，而基于核心词改写的方式可以帮助我们从现有的案例库中挖掘大量语料，以此形成<核心词，核心词改写短句>的句子对，用于训练根据核心词来生成短句的模型，后续再训练模型将生成好的短句选择性的插入到原始句子中得到改写句子。

Background of this project

Demand pain points:

Different groups of people have different service language styles and systems, and they need to change their language style and expressions and need to make targeted adjustments; staff who provide customer service through online chat conversations may have the following pain points when facing a variety of different service objects:

It requires very professional conversion experience from service personnel. It is difficult for novices to control it, and inappropriate expressions may cause misunderstandings;

Changing chat styles is time-consuming and laborious for service staff, requiring repeated revisions of language.

Opportunity insights:

More and more customer service providers need to conduct work negotiations and docking, and provide services and information through online real-time conversational communication. The requirements for the language style of services will become higher and higher. Good conversationalists can use their own language "art" "To be successful, inexperienced conversationalists can cause misunderstandings and damage.

For language correction, there are certain mature NLP algorithm training solutions and similar products, which are dedicated to helping users solve language expression problems. This solution uses similar intelligent algorithm technology to correct emotional expressions such as tone, which will be very helpful for conversation scenarios with such precise expressions or high risk control.

Competitive product information

Related products: dragon fruit / grammarly / outwrite

DEMO display

Step 1: Set up service mode

Enter the settings panel to make settings. The settings of the custom language mode will affect the tone expression of the language and the layout of graphics and text, and can provide early warnings for industry sensitive words and provide suggestions for optimization and correction.

Step Two: Live Session Optimization Recommendations

While typing in a chat session, click on the plug-in icon, "A" to instantly correct tone errors. Users can make their own corrections or key replacements based on suggestions.

[Attachment] Language mode setting parameters:
Target audience: the elderly, children, women, men
Service scenarios: e-commerce, medical care, politics and law, education, insurance, housekeeping
Emotional color: passionate, humorous, arrogant, considerate, lively, funny, calm, neutral, serious
Mood component: Set the degree of the tone
Expression form: neutral, suitable for the elderly, suitable for children

Compare the differences with competing products

1. Training a language model for Chinese - supplementing corrections such as tone, emotion, etc. that focus more on language style expression in addition to grammatical error correction

2. Chat expression guidance for real-time conversations rather than document written expression suggestions - it can be used while typing the chat conversation. The plug-in is activated in the typing area. When you need to check, click the button to view the correction tips.

Key technical innovation points of this plan

1. Problem definition:

Given a modified text, the model processes the text according to the preset language mode parameters and outputs a text modified by the mode.

2. Related research:

It is a text style transfer task in academia. Text style transfer can be divided into Supervised Style Transfer and Unsupervised Style Transfer based on the supervision of the method.

2.1 Supervised Style Transfer

Supervised text style transfer methods are mostly based on the NMT encoder-decoder framework. Some work will adopt a multi-task approach on this basis, such as performing style classification tasks or joint translation tasks, grammatical error correction tasks, etc. when doing style transfer.

2.2 Unsupervised Style Transfer

Entangled methods

The Entangled method does not decouple and separate style vectors and semantic vectors, but directly corrects the entire statement or the vector expressed by the statement to achieve style migration.

The figure below is an Entangled method. It uses a pre-trained style classifier to modify the latent space through gradient conduction in the inference stage, so that the modified vector can be recognized as another style by the classifier, and then the The vectors are decoded to produce "another style" of text.

Disentangled methods

The Disentangled method can be roughly summarized into three steps: encoding the original style text Contains the text x' of the target style. The Disentangled method needs to consider how to separate style information and content information in latent space, that is, latent space consists of two independent style spaces and content spaces. This requires:

The generated content is a grammatically natural sentence

Ensure that style related information is retained in the style space

Ensure that content-related information is retained in the content space

The figure below shows the Style Transformer method. During the training process, if the styles of x and s are consistent, the reconstruction loss of x and y is calculated. If the styles of x and s are inconsistent, the y_hat generated by x needs to be reconstructed back to y. Calculate The reconstruction loss of x and y, and the style category of y_hat needs to be determined.

Delete-Retrieve-Generate

DRG (Delete-Retrieve-Generate) is a relatively representative unsupervised method, which is also divided into 3 steps:

Delete: Delete the style word a in the original sentence

Retrieve: Retrieve the target-style vocabulary a' from the database

Generate: On the premise of ensuring smoothness, fill a' into the original sentence (generation, slot filling, etc.)

3. Technical solution:

The BART model based on the Transformers code repository can realize our capabilities, which will be explained below in terms of data and specific model structure.

data

3.1 Data format

The input and output format required by the model is as follows, so when constructing the model training data, it is constructed in the form of sentence pairs <original sentence, language model parameters, rewritten sentence>

Input: original sentence + language mode parameters

Output: Rewrite the sentence

The data required for the algorithm solution can be collected from two aspects

Given the original sentence and language model parameters, crowdsourcing personnel rewrite the sentence according to different language model parameters.

In the process of daily customer service, the service process is recorded and many high-quality cases are often accumulated. The existing text can be mined through machine learning to form the sentence pairs required by the model.

3.2 Model structure

There are two options for model design

Option 1: Original sentence -> Language mode parameter restrictions -> Rewrite the sentence

The model structure is an End2End structure, which unifies the original sentence and language mode parameters. After receiving the original sentence and language mode parameters, the model directly obtains the rewritten sentence through reasoning. The advantage of this solution is that it is simple and fast, but the disadvantage is A large amount of annotated data is required to achieve a better effect.

Option 2: Original sentence -> core word extraction -> core word rewritten short sentence -> tag integrated into specific language mode parameter restrictions -> rewritten sentence

Different from the above solution, the extraction and rewriting of core words are added to the overall reasoning link of the model. One of the difficulties in the deep learning model is the acquisition of training data, and the method based on core word rewriting can help us learn from existing cases. A large amount of corpus is mined from the database to form a sentence pair of <core word, core word rewrites a short sentence>, which is used to train a model to generate short sentences based on core words. Subsequent retraining of the model will selectively insert the generated short sentences. Go to the original sentence to get the rewritten sentence.

Innovation Patent｜AI-plugin