Monday, 23 September 2013

Do the maverick mavens need managers?

Do the maverick mavens need managers?
The McKinsey and company report “Big data: The next frontier for innovation, competition and productivity” (May 2011) is a well publicized and circulated one on the internet .

The report projected that the demand for deep analytical positions in a big data world in the United States could exceed the supply based on the trends seen ( in 2011) , by 140,000 to 190,000 positions. While this was widely referred to , in various reports, what intrigued me was the additional information, tucked in there , which has been often ignored .

The report further suggested that there would be a projection of a need for 1.5 million additional managers and analysts in the United States who can ask the right questions and consume the results of analysis of big data effectively. All these add up to a number which will make anyone to sit up and take notice.

This brings up the moot question- does an ensemble of Data Scientists need managers and that too in a proportion as being mentioned. Who could be those managers, what would be the skill and expertise and where would one get them from ?

Let us take a minute to discuss how a Data Science project is different from a typical application development project.

Data Science projects are different !
A Data Science project could be more akin to a typical R&D project which would be more of a process of discovery. The discovery as an output / outcome is expected to be more abstract and less definite .

For example, I was engaged in a Data Science project related to the area of weather forecasting .This is an area where predictive analytics is deployed , the deployment of the appropriate algorithms is the key to yield the results with the right accuracy .The skill would lie on the identification , selection and the deployment of the right approach, to deliver the highest possible accuracy of prediction .

The right approach is a combination of the appropriate method of enriching the input data, competing a set of most applicable algorithm(s), tuning the parameters of the algorithm(s), ensuring that no over fitting is done and using the right metric to test the performance of the algorithm(s).

The team structure in a Data Science team is necessarily a flat structure .There is a reason. The roles are distinct and the level of peer collaboration is intense. Designated Data Scientists would take up defined role as per the “triangle of intelligence” namely aggregated content ( the raw data ) , algorithm ( the thinking ) and reference structures ( the domain knowledge base , ontology ) .

An application development project, more often than not, has a defined output based on the requirements gathered and the business objectives which have to be derived. Execution of such projects require a hierarchical team structure with different types of work packets being defined and allocated across functions (requirement gathering , design , development , testing , release management , change management ..warranty support) . The output here is often defined , stable and possibly repetitive .

So while a manager is the need in a hierarchical setup, as in the latter case does one need one in a flat structure,as in the former ?

Going back to the moot question , “ the industry would require “1.5 million of additional managers and analysts…. “ . Who would constitute these managers?

Let us analyze the work elements that constitute a Data Science project and the team members for a typical project. They would be :
• Data Access ( immediate and unfettered access to data ) – Project sponsor mostly from the business side
• Data Investigation ( defining the business problem and objectives ) – Business Analyst
• Data Preparation (constitutes parsing , cleaning , de/re-normalizing , linking , indexing , interpreting of data ) – Data Quality engineer
• Data Interpretation ( appreciation of the business process and the need to work with subject matter expert for contextual understanding ) – Subject Matter expert
• Data Engineering ( connecting the dots between data sources , construction of quality algorithms , coding ) – Programmer
• Data Analysis ( ability to do statistical sense checking , statistical validity of results ) – Data Scientist
• Data Presentation ( communication of the nuanced message as discovered and help bridge the gap between results derived and actions taken ) – Sr Data Scientist/ Manager

So a typical project would have a core team of Data Scientists with a surround team as mentioned. This could well attempt to address the McKinsey’s estimate .

True Blue Data Scientists are outliers and they stick to their ilk :

Top notch Data Scientists are hard to come by . These mavens, most of the time are non conformists , individualists and are uncommonly precise . Can we say they are mavericks ? However, having closely observed them , I can safely conclude that they like to be around others of their ilk and this is more out of a necessity .

The subject is vast , deep and to many arcane . It cuts across domains , statistics and computational science . The inherent complexity brings with it, the need to use the collective mind of the “commune” of data scientists. This could best address the business problems by devising a repertoire of solution approaches through a degree of intense and needful collaboration .

In such a situation where does then a manager fit in? How will the manager provide value? As we have seen the projects are different from normal application development projects and these projects are best run in a nonhierarchical and flat structure.

This field is evolving, the eco-system is becoming more enabling as Data Scientists would be reached out to solve complex business problems across industries ..

If a manager has to play a valuable role, some areas that need to be considered are :
1. Given the “Triangle of Intelligence” , the manager should truly understand the domain and the needs of the business and be the glue to help interact between the business users and the data scientist plus provide the wherewithal to the team to get the access to the input data, the quality of which would be critical to the success of the project .
2. Have the ability to toggle between business jargon and the language of data scientists. Understand the complexity of implementing a solution and optimizing its performance; hence can estimate the time required to implement the solution.
3. Have the experience to disaggregate the complex activities in Data Science projects and help integrate the multiple models ( eg . combination of algorithms ) .
4. Capable of providing the right inputs for innovation that is frequently required in this field.

Then the question is – would Data Science managers grow from within the “commune” of Data Scientists , or could they migrate from the business domain ? We will get the answer as we see more mainstream activities in this area in the coming months .

Managing is about coping with complexity, but leadership is about coping with change ? Given that Data Scientists are already groomed to manage complexity ( and hence could manage themselves ) , do we need then Data Science leaders? May be the need of the hour is to lead rather than manage .

Somjit Amrit Chief Business Officer, Technosoft Corporation
Somjit.amrit@technosoftcorp.com

Monday, 16 September 2013

Do not stifle the questions visualized data raises!



Do not stifle the questions visualized data raises!
Extracting meaningful insights from data to address business needs has benefited immensely from the availability of data visualization  tools that have  data more approachable. Today the proliferation of off-the-shelf tools, which are easy to learn and are web enabled, have democratized the way data is presented and consumed. Tools like Spotfire, Tableau, Qikview have helped breathe life into data. They provide a professional look and feel and give an inherent feel of fidelity of the data that is being visualized, more than when data is simply presented as text .

Well designed and deployed Data visualization many a time could lead the user to a question which could spark the need for deeper insights and which possibly cannot be answered by the visualization software. But before we delve into this problem let us understand the primary goals of data visualization
Essentially good Data Visualization is expected to 

  • Provide the interpretation of the data  
  • Bring in relevance and context to the data  
  • Reveal elusive insights to spark deeper analysis
  • Drive management  by exception  
  • Embed intelligence in the reports 


Recently we had an opportunity to work on a project with the objective to understand and analyze the failures in a telecom network, and the related quality issues which could have direct bearing on customer churn apart from the repair and service costs. The goal was to enhance the quality of service (QoS) of the telecom network provider using predictive analytics. The added expectation was to enable the service manager to take proactive decisions on repairs using machine learning models. 

We developed dashboards for the equipment maintenance manager, using a well regarded  commercial off-the-shelf visualization tool. We also developed the advanced failure prediction model using the input data as the standard telecom equipment log files and used techniques such as  pattern mining and event sequence analysis to predict equipment failure.   R open source programming language was used to create this model.

We had  two options to present this data. In one scenario the maintenance manager used the visualization tool to drill into the failures to locate the regions or equipment models with high failure. This information was then shared with the engineering team for root cause analysis who used our R based model to predict failure. 

The visualized data provided information to act on, but was it intelligent enough to bring in preventive maintenance?

The manager had a number of questions  during such slice and dice analysis, namely - Why is a region doing better than others? Why are some failures more common in one model and a particular region and not the others? Unfortunately, simple data visualization cannot provide answers to these questions and they get stifled. Consequently the service manager hopes that his engineering team will come up with the right answers. Many a time, the well represented and slickly visualised data is “counter-productive” by making the user numb to the questions which could get triggered.

Our approach was to integrate the relevant prebuilt sequence mining models in R and integrate it with the off-the shelf visualization tool.  This approach immediately gave the manager the freedom to ask even deeper questions about the state of equipment he managed.

In the new approach the manager did not send the information to his engineering team for analysis but felt empowered to do the same .  

Once the region associated in the problem got identified, next , running the advanced sequence mining algorithms, identification of the frequently occurring patterns in the repair history were  carried  out.. The patterns in data showed that two components were failing in tandem. While the short term measures would be generally to replace the defective parts, but more importantly the findings were passed on to product engineering team to redesign the part which would be more fault tolerant.

Here is an example of using the power of R aided by the inherent strengths of good visualization tool to build intelligent and  actionable data visualization.The service manager did not have to leave his data visualization  environment nor wait for the engineering team to do background analysis. 

I would like to conclude that in data visualization, there is more to the choice of representation of data only. Data visualization should not make the user numb with its slick representation, but should help  in revealing those elusive insights by goading the user to ask questions which he would have  never asked.   

by Somjit Amrit
Somjit is the Chief Business Officer of Technosoft Corporation, an IT Outsourcing Services Provider
He can be reached at somjit.amrit@technosoftcorp.com

Tuesday, 10 September 2013

“The Purple People” - The conundrum of finding business expertise among Data Scientists

“The Purple People” - The conundrum of finding business expertise among Data Scientists.

Over the last several months, as I looked at addressing the business needs across various industries as someone leading a team of  Data Scientists, the question of domain expertise invariably cropped up.

Attending one meeting with a Pharmaceutical company, I was posed with the question of, "Have you done work in the areas of Rare Signal detection?" In a similar vein, while preparing for a meeting with an Auto finance major, the question was in the area of using Auto telemetry data and deploying it to work on fraud detection in auto-insurance claims.

Multiply the business problems with the numerous industries and the enormity of the challenge becomes apparent. More so since it may not be possible to be a domain expert in every possible industry. Which begs the question, is there a line between domain knowledge and domain expertise?

While domain knowledge would refer to the appreciation of the industry, its business processes and challenges, domain expertise is expected to be much deeper. It would be addressing meaningfully, through the effective use of technology, a problem or two which a particular industry is grappling with.

Is the expectation then from the prospective customer 'You need to understand my industry language (i.e. domain knowledge)' or is it 'You need the expertise to solve an industry specific problem (i.e. domain expertise)'?

How then does one manage the expectations here? A typical Data Scientist needs to be adept in understanding the business problem, have a good handle of the data on hand, and have a grasp of the algorithms which would aid him/her in the journey of  discovery, design, deployment and ultimately delivering  the results .

These would come in various shades across Data Scientists. As the area becomes mainstream at a furious pace, primarily driven by storage and accessibility costs, the need to balance out the three, otherwise known as the “Triangle of Intelligence", namely business (knowledge or expertise), data (content) and algorithm (thinking)  will possibly decide the difference between resounding success and abysmal failure, while addressing a business problem.

I came across an interesting note by Caitlin Garrett in the blog http://rapidinsight.blogspot.in/2013/06/data-scientists-next-generation.html

Caitlin rightly mentions that as a practitioner of Data Science, it is mandatory to have analytical thinking, mathematical/statistical ability, a knack for communicating results to non-data people, and creativity. Her blog however does not make a reference to the business (knowledge or expertise).

Can we surmise then, that the ability to articulate and appreciate the business problem and marrying the expected ways to get this addressed through analytical skills could be a good starting point to provide that confidence to the business user that  the  problem at  hand can be addressed?

The combination of business acumen and technical skill isn’t easy to come by.

David Logan in his interesting blog:
http://www.itweb.co.za/index.php?option=com_content&view=article&id=63781 

has mentioned about the "Purple People", folks who are blessed with the business acumen and the analytical abilities. Purple is the blend of Red (Business acumen) and Blue (Analytical abilities) and we would by now know that this would be a very hard to get profile. 

However as this area matures and moves beyond the hype cycle, we may have the luxury of seeing experts who cover both areas well.

The question still remains. The profile as described is rare but the expectations of the prospective customers most of the time is  "Can you solve the point problem which I have been grappling with for so long?" for which domain expertise and NOT knowledge would be required.

The only way to address this is to borrow the domain expertise from the prospective customer and build that from the base of domain knowledge one is expected to have and move forward.

But as one is trying to demonstrate credibility to the prospect , this may be easier said than done.


by Somjit Amrit
Somjit is the Chief Business Officer of Technosoft Corporation, an IT Outsourcing Services Provider
He can be reached at somjit.amrit@technosoftcorp.com


Monday, 9 September 2013

Welcome to Technosoft Corporation's blog

On these pages, you will find the thoughts, the musings and the ideas of our employees - who come from varied backgrounds but driven by a mission to solve our client's challenging technology problems.

So welcome and we hope to see you often.