Thursday, 5 December 2013

The Art of Conversation and the Science of Service



The frequent launch of a variety of products across emerging customer segments has led to the proliferation of a number of devices and equipment. With increasing customer expectations and the ease in which complaints can be lodged and enquiries sought online, there is a surge in synchronous and asynchronous conversation. Consequently, we see product manufacturers and OEMs competing to provide higher quality of customer service. The first step in addressing customer needs is to converse, engage and address. This is imperative to retain or gain market share.

This competition in “service lifecycle management” has resulted in the rapid rise in data mining activities on customer complaints, inquiries and the like, as recorded in contact center logs. These activities are aimed at identifying the root causes of specific issues .The proactive resolution through alerts is often through the deployment and usage of smart algorithms.

Recently our team interacted with OEMs in the telecom, manufacturing and healthcare industry segments. We discovered the root cause analysis and the subsequent proactive resolution of customer issues, to be the significant element in their overall customer acquisition and retention strategy. Not surprisingly, the spurt of such activities has been catalyzed by the evolution of marketing strategies, leveraging customers’ sentiments about different product or service offerings, expressed through social networks. The arcane world of Natural Language Processing has now come to the forefront as an application to aid in addressing these novel business challenges.

Above all, research has added a host of new algorithms in this area, such as, dialog act tagging, topic modeling, sentence boundary detection, text tiling and so on, enabling to bring in solutions to hitherto unsolvable problems in this field.

However, when I delved a bit deeper into the problem, I found that there are distinctly three kinds of issues which come from the data sources in this field:

(i)            mining event logs from devices

(ii)           analyzing tweets or comments from social networks

(iii)          identifying root cause of problem from call notes.

Initially we thought they are very similar as they have a few common characteristics: they are unstructured text, are event-driven and have an underlying taxonomy of products, events or sentiments.

Interestingly, however, they are significantly different in some key aspects, necessitating the need for applying different solution strategies. While device logs are easier to deal with due to the standardized response pre-configured in systems, such as, SMART logs, sentiment analysis is more complex due to the wide diversity of the ways in which people react to similar scenarios and express their sentiments.

The human element in social network feeds introduces a lot more noise, such as, acronyms and polysemy. We still have scope to model discussion topics based on a apriori defined taxonomy or a taxonomy built on-the fly based on known language constructs, such as, synonyms, phrases, adjectives, and so on. So the problem of extracting actionable insights from device logs, can be considered, as a subset of the problem of developing targeted marketing strategies based on the analysis tweets or other forms for social network feeds.

Identifying root cause of problems from contact center or technical support logs, sometimes known as conversation mining or interaction mining, is an even more complex problem, due to several reasons,for example:

(i)            difference in terminology of the expert call center agent and the novice customer,
(ii)            iterative and asynchronous nature of messages,
(iii)          huge boundary of the problem scope, and so on. Several approaches have been developed in recent times, but most of them seem to elude the real solution to this problem.

Our attempts to  solve the problem by integrating topic modeling, hyperclique pattern discovery and sequence mining to detect anomalies and thereby identifying root causes underlying customer complaints is in well underway. The stated evaluation metrics as reference points for conversation mining will be key to defining the success.

Somjit Amrit is the Chief Business Officer of Technosoft Corporation
He can be reached at somjit.amrit@technosoftcorp.com

Sunday, 6 October 2013

The trickiness of forecasting rainfall



Believing  in the weather guy  to keep the golf course dry ? – Forecasting rainfall could be tricky ..

Forecasting rainfall, and at a broader level, forecasting the weather is a problem fraught with challenges. Around the world, various initiatives, such as, Deep Thunder by IBM and Named Peril by Climate Corporation, have been undertaken to address this challenge. In India, the Indian Meteorology Department and the Centre for Mathematical Modeling and Computer Simulation have developed several solutions to address this problem.
The Government of India recently announced a country wide competition, among software companies to forecast rainfall among other similar topics by sharing the historical rainfall and temperature time series data across the different sub-divisions of the Indian subcontinent.

The opportunity to bring in Big Data and the related techniques to help forecast rainfall is immense. Rainfall prediction helps in agriculture, irrigation, disaster management during floods/droughts, and so on. Several attempts have been made, primarily using statistical models, to predict rainfall.

Natural systems, like atmosphere system, exhibit complex dynamical behavior and their data is considered to exhibit nonlinear dynamical, even chaotic behavior. To add to this rainfall data is more sensitive to temporal and spatial variations than other climatic variables. The rainfall time series data, as was provided for the said competition was used to train the neural networks capable of emulating similar behaviors, including deterministic chaos.
So what are the challenges in rainfall forecasting and why these are unique? Chaotic dynamics is associated with extreme sensitivity to initial conditions, exponential divergence of proximal trajectories and have very low predictability horizons. Moreover, most of the variables in the atmospheric state-space are not even measurable, giving rise to computationally intractable behaviors. Hence it is difficult to model and predict atmospheric systems using even higher-order multi-parameter statistical models.

Therefore, machine learning algorithms, specifically recurrent neural networks, capable of emulating the nonlinear dynamical, including chaotic, behaviors have become increasingly popular in the domain of rainfall prediction.

Given the advantages of recurrent neural networks in explaining the nonlinear behavior between the inputs and output, the same were explored to forecast the rainfall of 36 meteorological sub-divisions of India. The model used the past years’ of rainfall data only, to forecast the monsoon rainfall of coming year. The application presented would be providing the forecast for a rolling 18 months period.

A number of algorithms, Elman neural network, Jordan neural network, Radial Basis Function neural network, were applied in a competitive way to obtain the best possible prediction accuracy. The performance of the algorithms was tested across the different regions, categorized by aridity, as the underlying dynamics were expected to change, influenced by geographical location, ocean currents, and so on.
I firmly believe that given the strong emergence of the usage of machine learning algorithms, this challenge of forecasting rainfall with increased degree of accuracy would be a reality in the coming years, and we could believe in the weather guy to keep the golf course dry.

Somjit Amrit is the Chief Business Officer, Technosoft Corporation 
He can be reached at somjit.amrit@technosoftcorp.com

Monday, 23 September 2013

Do the maverick mavens need managers?

Do the maverick mavens need managers?
The McKinsey and company report “Big data: The next frontier for innovation, competition and productivity” (May 2011) is a well publicized and circulated one on the internet .

The report projected that the demand for deep analytical positions in a big data world in the United States could exceed the supply based on the trends seen ( in 2011) , by 140,000 to 190,000 positions. While this was widely referred to , in various reports, what intrigued me was the additional information, tucked in there , which has been often ignored .

The report further suggested that there would be a projection of a need for 1.5 million additional managers and analysts in the United States who can ask the right questions and consume the results of analysis of big data effectively. All these add up to a number which will make anyone to sit up and take notice.

This brings up the moot question- does an ensemble of Data Scientists need managers and that too in a proportion as being mentioned. Who could be those managers, what would be the skill and expertise and where would one get them from ?

Let us take a minute to discuss how a Data Science project is different from a typical application development project.

Data Science projects are different !
A Data Science project could be more akin to a typical R&D project which would be more of a process of discovery. The discovery as an output / outcome is expected to be more abstract and less definite .

For example, I was engaged in a Data Science project related to the area of weather forecasting .This is an area where predictive analytics is deployed , the deployment of the appropriate algorithms is the key to yield the results with the right accuracy .The skill would lie on the identification , selection and the deployment of the right approach, to deliver the highest possible accuracy of prediction .

The right approach is a combination of the appropriate method of enriching the input data, competing a set of most applicable algorithm(s), tuning the parameters of the algorithm(s), ensuring that no over fitting is done and using the right metric to test the performance of the algorithm(s).

The team structure in a Data Science team is necessarily a flat structure .There is a reason. The roles are distinct and the level of peer collaboration is intense. Designated Data Scientists would take up defined role as per the “triangle of intelligence” namely aggregated content ( the raw data ) , algorithm ( the thinking ) and reference structures ( the domain knowledge base , ontology ) .

An application development project, more often than not, has a defined output based on the requirements gathered and the business objectives which have to be derived. Execution of such projects require a hierarchical team structure with different types of work packets being defined and allocated across functions (requirement gathering , design , development , testing , release management , change management ..warranty support) . The output here is often defined , stable and possibly repetitive .

So while a manager is the need in a hierarchical setup, as in the latter case does one need one in a flat structure,as in the former ?

Going back to the moot question , “ the industry would require “1.5 million of additional managers and analysts…. “ . Who would constitute these managers?

Let us analyze the work elements that constitute a Data Science project and the team members for a typical project. They would be :
• Data Access ( immediate and unfettered access to data ) – Project sponsor mostly from the business side
• Data Investigation ( defining the business problem and objectives ) – Business Analyst
• Data Preparation (constitutes parsing , cleaning , de/re-normalizing , linking , indexing , interpreting of data ) – Data Quality engineer
• Data Interpretation ( appreciation of the business process and the need to work with subject matter expert for contextual understanding ) – Subject Matter expert
• Data Engineering ( connecting the dots between data sources , construction of quality algorithms , coding ) – Programmer
• Data Analysis ( ability to do statistical sense checking , statistical validity of results ) – Data Scientist
• Data Presentation ( communication of the nuanced message as discovered and help bridge the gap between results derived and actions taken ) – Sr Data Scientist/ Manager

So a typical project would have a core team of Data Scientists with a surround team as mentioned. This could well attempt to address the McKinsey’s estimate .

True Blue Data Scientists are outliers and they stick to their ilk :

Top notch Data Scientists are hard to come by . These mavens, most of the time are non conformists , individualists and are uncommonly precise . Can we say they are mavericks ? However, having closely observed them , I can safely conclude that they like to be around others of their ilk and this is more out of a necessity .

The subject is vast , deep and to many arcane . It cuts across domains , statistics and computational science . The inherent complexity brings with it, the need to use the collective mind of the “commune” of data scientists. This could best address the business problems by devising a repertoire of solution approaches through a degree of intense and needful collaboration .

In such a situation where does then a manager fit in? How will the manager provide value? As we have seen the projects are different from normal application development projects and these projects are best run in a nonhierarchical and flat structure.

This field is evolving, the eco-system is becoming more enabling as Data Scientists would be reached out to solve complex business problems across industries ..

If a manager has to play a valuable role, some areas that need to be considered are :
1. Given the “Triangle of Intelligence” , the manager should truly understand the domain and the needs of the business and be the glue to help interact between the business users and the data scientist plus provide the wherewithal to the team to get the access to the input data, the quality of which would be critical to the success of the project .
2. Have the ability to toggle between business jargon and the language of data scientists. Understand the complexity of implementing a solution and optimizing its performance; hence can estimate the time required to implement the solution.
3. Have the experience to disaggregate the complex activities in Data Science projects and help integrate the multiple models ( eg . combination of algorithms ) .
4. Capable of providing the right inputs for innovation that is frequently required in this field.

Then the question is – would Data Science managers grow from within the “commune” of Data Scientists , or could they migrate from the business domain ? We will get the answer as we see more mainstream activities in this area in the coming months .

Managing is about coping with complexity, but leadership is about coping with change ? Given that Data Scientists are already groomed to manage complexity ( and hence could manage themselves ) , do we need then Data Science leaders? May be the need of the hour is to lead rather than manage .

Somjit Amrit Chief Business Officer, Technosoft Corporation
Somjit.amrit@technosoftcorp.com

Monday, 16 September 2013

Do not stifle the questions visualized data raises!



Do not stifle the questions visualized data raises!
Extracting meaningful insights from data to address business needs has benefited immensely from the availability of data visualization  tools that have  data more approachable. Today the proliferation of off-the-shelf tools, which are easy to learn and are web enabled, have democratized the way data is presented and consumed. Tools like Spotfire, Tableau, Qikview have helped breathe life into data. They provide a professional look and feel and give an inherent feel of fidelity of the data that is being visualized, more than when data is simply presented as text .

Well designed and deployed Data visualization many a time could lead the user to a question which could spark the need for deeper insights and which possibly cannot be answered by the visualization software. But before we delve into this problem let us understand the primary goals of data visualization
Essentially good Data Visualization is expected to 

  • Provide the interpretation of the data  
  • Bring in relevance and context to the data  
  • Reveal elusive insights to spark deeper analysis
  • Drive management  by exception  
  • Embed intelligence in the reports 


Recently we had an opportunity to work on a project with the objective to understand and analyze the failures in a telecom network, and the related quality issues which could have direct bearing on customer churn apart from the repair and service costs. The goal was to enhance the quality of service (QoS) of the telecom network provider using predictive analytics. The added expectation was to enable the service manager to take proactive decisions on repairs using machine learning models. 

We developed dashboards for the equipment maintenance manager, using a well regarded  commercial off-the-shelf visualization tool. We also developed the advanced failure prediction model using the input data as the standard telecom equipment log files and used techniques such as  pattern mining and event sequence analysis to predict equipment failure.   R open source programming language was used to create this model.

We had  two options to present this data. In one scenario the maintenance manager used the visualization tool to drill into the failures to locate the regions or equipment models with high failure. This information was then shared with the engineering team for root cause analysis who used our R based model to predict failure. 

The visualized data provided information to act on, but was it intelligent enough to bring in preventive maintenance?

The manager had a number of questions  during such slice and dice analysis, namely - Why is a region doing better than others? Why are some failures more common in one model and a particular region and not the others? Unfortunately, simple data visualization cannot provide answers to these questions and they get stifled. Consequently the service manager hopes that his engineering team will come up with the right answers. Many a time, the well represented and slickly visualised data is “counter-productive” by making the user numb to the questions which could get triggered.

Our approach was to integrate the relevant prebuilt sequence mining models in R and integrate it with the off-the shelf visualization tool.  This approach immediately gave the manager the freedom to ask even deeper questions about the state of equipment he managed.

In the new approach the manager did not send the information to his engineering team for analysis but felt empowered to do the same .  

Once the region associated in the problem got identified, next , running the advanced sequence mining algorithms, identification of the frequently occurring patterns in the repair history were  carried  out.. The patterns in data showed that two components were failing in tandem. While the short term measures would be generally to replace the defective parts, but more importantly the findings were passed on to product engineering team to redesign the part which would be more fault tolerant.

Here is an example of using the power of R aided by the inherent strengths of good visualization tool to build intelligent and  actionable data visualization.The service manager did not have to leave his data visualization  environment nor wait for the engineering team to do background analysis. 

I would like to conclude that in data visualization, there is more to the choice of representation of data only. Data visualization should not make the user numb with its slick representation, but should help  in revealing those elusive insights by goading the user to ask questions which he would have  never asked.   

by Somjit Amrit
Somjit is the Chief Business Officer of Technosoft Corporation, an IT Outsourcing Services Provider
He can be reached at somjit.amrit@technosoftcorp.com

Tuesday, 10 September 2013

“The Purple People” - The conundrum of finding business expertise among Data Scientists

“The Purple People” - The conundrum of finding business expertise among Data Scientists.

Over the last several months, as I looked at addressing the business needs across various industries as someone leading a team of  Data Scientists, the question of domain expertise invariably cropped up.

Attending one meeting with a Pharmaceutical company, I was posed with the question of, "Have you done work in the areas of Rare Signal detection?" In a similar vein, while preparing for a meeting with an Auto finance major, the question was in the area of using Auto telemetry data and deploying it to work on fraud detection in auto-insurance claims.

Multiply the business problems with the numerous industries and the enormity of the challenge becomes apparent. More so since it may not be possible to be a domain expert in every possible industry. Which begs the question, is there a line between domain knowledge and domain expertise?

While domain knowledge would refer to the appreciation of the industry, its business processes and challenges, domain expertise is expected to be much deeper. It would be addressing meaningfully, through the effective use of technology, a problem or two which a particular industry is grappling with.

Is the expectation then from the prospective customer 'You need to understand my industry language (i.e. domain knowledge)' or is it 'You need the expertise to solve an industry specific problem (i.e. domain expertise)'?

How then does one manage the expectations here? A typical Data Scientist needs to be adept in understanding the business problem, have a good handle of the data on hand, and have a grasp of the algorithms which would aid him/her in the journey of  discovery, design, deployment and ultimately delivering  the results .

These would come in various shades across Data Scientists. As the area becomes mainstream at a furious pace, primarily driven by storage and accessibility costs, the need to balance out the three, otherwise known as the “Triangle of Intelligence", namely business (knowledge or expertise), data (content) and algorithm (thinking)  will possibly decide the difference between resounding success and abysmal failure, while addressing a business problem.

I came across an interesting note by Caitlin Garrett in the blog http://rapidinsight.blogspot.in/2013/06/data-scientists-next-generation.html

Caitlin rightly mentions that as a practitioner of Data Science, it is mandatory to have analytical thinking, mathematical/statistical ability, a knack for communicating results to non-data people, and creativity. Her blog however does not make a reference to the business (knowledge or expertise).

Can we surmise then, that the ability to articulate and appreciate the business problem and marrying the expected ways to get this addressed through analytical skills could be a good starting point to provide that confidence to the business user that  the  problem at  hand can be addressed?

The combination of business acumen and technical skill isn’t easy to come by.

David Logan in his interesting blog:
http://www.itweb.co.za/index.php?option=com_content&view=article&id=63781 

has mentioned about the "Purple People", folks who are blessed with the business acumen and the analytical abilities. Purple is the blend of Red (Business acumen) and Blue (Analytical abilities) and we would by now know that this would be a very hard to get profile. 

However as this area matures and moves beyond the hype cycle, we may have the luxury of seeing experts who cover both areas well.

The question still remains. The profile as described is rare but the expectations of the prospective customers most of the time is  "Can you solve the point problem which I have been grappling with for so long?" for which domain expertise and NOT knowledge would be required.

The only way to address this is to borrow the domain expertise from the prospective customer and build that from the base of domain knowledge one is expected to have and move forward.

But as one is trying to demonstrate credibility to the prospect , this may be easier said than done.


by Somjit Amrit
Somjit is the Chief Business Officer of Technosoft Corporation, an IT Outsourcing Services Provider
He can be reached at somjit.amrit@technosoftcorp.com


Monday, 9 September 2013

Welcome to Technosoft Corporation's blog

On these pages, you will find the thoughts, the musings and the ideas of our employees - who come from varied backgrounds but driven by a mission to solve our client's challenging technology problems.

So welcome and we hope to see you often.