The company is a technology-enabled service provider, operating in the Digital Solutions, Engineering and Business Process Management space. It services a global clientele across the US, Europe and Australia, ranging from SMBs to Fortune 500 companies.
The company was looking to build a machine learning algorithm to analyze and leverage customer sentiments in near real time (every 12 hours) based on customer emails. The insights generated would be harnessed to preempt problems in customer journey and enhance customer experience.
As a first step, it was important to feed the model with training data and this task involved:
- Capturing tens of thousands of customer emails from email inboxes, integrating and structuring the data
- Manually tagging/annotating/labeling each email against a pre-defined sentiment
The follow-up step required validation of model performance against manually annotated data.
The client approached HitechDigital to label the emails against multiple categories and sub-categories, defining customer sentiment. The annotated data was to be used by the machine learning algorithm being developed to analyze customer sentiment.
- Capturing the emails from various email inboxes and putting them together in a structured format.
- Decoding various terms, slangs, acronyms or colloquialism and assigning a contextual sentiment tag required high level of human intelligence.
- Creating accurate mapping guidelines of key phrases in emails against pre-defined customer sentiment categories. Also, factoring in variations in sentiment expression across geographies and industries.
- Building consensus amongst stakeholders on mapping certain subjective phrases against customer sentiment.
The HitechDigital data annotators delivered tens of thousands of accurately decoded, labelled and annotated customer email records in a standardized database. The emails were annotated against predefined customer sentiment categories. The annotated data proved to be a perfect training baseline to feed the customer sentiment analysis ML algorithms.
- Custom outlook plug-ins were developed to fetch select email data from stakeholder systems and save in a centralized database. Care was taken to exclude business sensitive information such as client name, revenue details, terms of agreement etc.
- API-based connectors ensured absolute data security during transition and flow of data.
- Rules-driven algorithms standardized and segmented the data based on multiple criteria such as source, geography etc.
- The records were then cleansed using a blend of automated and manual processes to remove duplicates, jargon, peripheral content etc.
- A library of annotation guidelines for phrase-based sentiment category mapping was prepared to ensure homogeneity across the various feedback documents and annotators. All data annotators were trained on these guidelines.
- Data annotators then reviewed each sentiment record and manually labelled keywords in each sentence as positive, negative or neutral. Categories and sub categories for tagging each sentiment were defined such as Quality, Timeliness, Productivity, Communication, Attitude etc.
- Specific color codes were used to highlight individual sentiments.
- Text blocks were annotated with metadata such as likely to be repeat customers, become business influencers, lost customer, etc.
- A multi-level quality check was applied on the records which included rules- driven validation and manual audit by senior annotators.
- Random audits for model performance conducted against manual annotated records to verify accuracy.
- 80 to 90% accuracy achieved.
Technology and Software Used:
- PHP based front-end
- MySQL Database
- XLNET ML Framework
- Data Warehouse
Accurately annotated data provided deeper understanding of customer sentiment for real-time and informed actions
Drove tailored strategies based on customer responses
Accuracy of annotated data improved performance of NLP algorithms
Offshore delivery model offered cost advantage and increased operational efficiency