Problem denition: Many companies compete for highly-valued contracts in lengthy tender processes. Our problem is to predict, relatively early in the engagement cycle, whether prospective deals will result in signed contracts.
Academic/Practical relevance: We address four issues of practical and academic relevance related to this fundamental business-to-business problem: (a) How can one predict whether a company will win a prospective deal? (b) What is the relative contribution to predictive accuracy of static prospect features (about the project, client, and competitors), dynamic measures of milestones achieved, and subjective reports from sales teams (analyzed through natural language processing)? (c) How can the analysis account for endogeneity issues and form counterfactual predictions? And (d) how can a company encourage truthful reporting from sales teams?
Methodology: We propose a recursive system to predict whether prospective deals will result in signed contracts. This model combines measures of prospect features, milestones achieved, and a qualitative summary score of the sentiment from weekly sales team reports. We compare our model with machine learning techniques. Our analysis of the text data involves unsupervised and supervised learning with a novel semantic extension of key words. At the end of the paper, we also discuss incentive compatible designs, drawing from the mechanism design literature in game theory.
Results: Client geography, client industry, past relationship with client, and a milestone index are important features for model prediction. Adding a qualitive index of the text data of weekly sales reports (using natural language processing) yields mean accuracy of 77% for a parsimonious probit model and 81% for the best of ten machine learning techniques using more predictor variables. This represents an improvement in accuracy of ten and twelve percentage points over the associated model using only structured data. Counterfactual analysis indicates that the number of (and revenue from) won contracts would be 23% - 53% higher if the entire dataset came from the geographic region that is most receptive to the focal company’s services. By contrast, the number of signed contracts would be 7% to 18% fewer if all the prospects came from new clients (v. coming from existing clients). This provides some indication of the value of an existing client. Our models were deployed at a large IT service provider resulting in significant impact.
Managerial implications: 1. Combining structured and unstructured data significantly enhances prediction accuracy and produces a prediction superior to subjective human predictions. 2. Collecting seller comments and updating achieved prospect milestones can significantly enhance the prediction accuracy. 3. Managers can use our approach to better manage sales resources across their contract pipeline.