Tag Archives: NLP

Do Incentivized Reviews Poison the Well? Evidence from a Natural Experiment at Amazon.com

Park, Jaecheol, Arslan Aziz, Gene Moo Lee. “Do Incentivized Reviews Poison the Well? Evidence from a Natural Experiment at Amazon.comWorking Paper.

  • Presentations: UBC (2021), KrAIS (2021), WISE (2021), PACIS (2022), SCECR (2022), BU Platform (2022)

The rapid growth in e-commerce has led to a concomitant increase in consumers’ reliance on digital word-of-mouth to inform their choices. As such, there is an increasing incentive for sellers to solicit reviews for their products. Recent studies have examined the direct effect of receiving incentives or introducing incentive policy on review writing behavior. However, since incentivized reviews are often only a small proportion of the overall reviews on a platform, it is important to understand whether their presence on the platform has spillover effects on the unincentivized reviews which are often in the majority. Using the state-of-the-art language model, Bidirectional Encoder Representations from Transformers (BERT) to identify incentivized reviews, a document embedding method, Doc2Vec to create matched pairs of Amazon and non-Amazon branded products, and a natural experiment caused by a policy change on Amazon.com in October 2016, we conduct a difference-in-differences analysis to identify the spillover effects of banning incentivized reviews on unincentivized reviews. Our results suggest that there are positive spillover effects of the ban on the review sentiment, length, helpfulness, and frequency, suggesting that the policy stimulates more reviews in the short-run and more positive, lengthy, and helpful reviews in the long run. Thus, we find that the presence of incentivized reviews on the platform poisons the well of reviews for unincentivized reviews.

Corporate Social Network Analysis: A Deep Learning Approach

Cao, Rui, Gene Moo Lee, Hasan Cavusoglu. “Corporate Social Network Analysis: A Deep Learning Approach,” Working Paper.

Identifying inter-firm relationships is critical in understanding the industry landscape. However, due to the dynamic nature of such relationships, it is challenging to capture corporate social networks in a scalable and timely manner. To address this issue, this research develops a framework to build corporate social network representations by applying natural language processing (NLP) techniques on a corpus of 10-K filings, describing the reporting firms’ perceived relationships with other firms. Our framework uses named-entity recognition (NER) to locate the corporate names in the text, topic modeling to identify types of relationships included, and BERT to predict the type of relationship described in each sentence. To show the value of the network measures created by the proposed framework, we conduct two empirical analyses to see their impacts on firm performance. The first study shows that competition relationship and in-degree measurements on all relationship types have prediction power in estimating future earnings. The second study focuses on the difference between individual perspectives in an inter-firm social network. Such a difference is measured by the direction of mentions and is an indicator of a firm’s success in network governance. Receiving more mentions from other firms is a positive signal to network governance and it shows a significant positive correlation with firm performance next year.

Strategic Competitive Positioning: An Unsupervised Structural Hole-based Firm-specific Measure

Lee, Myunghwan, Gene Moo Lee, Hasan Cavusoglu, Marc-David L. Seidel. “Strategic Competitive Positioning: An Unstructured Structural Hole-based Firm-specific Measure”, Under Review. [Submitted: May 13, 2022]

  • doc2vec model of 10-K reports: Link
  • Presented at UBC MIS Seminar 2018, CIST 2019 (Seattle, WA), KrAIS 2019 (Munich, Germany), DS 2021 (online), KrAIS 2021 (Austin, TX), UT Dallas 2022, KAIST 2022, Korea Univ 2022, INFORMS 2022 (Indianapolis, IN)
  • Funded by Sauder Exploratory Grant 2019
  • Research assistants: Raymond Situ, Sahil Jain

In this research methods paper, we propose a firm-specific strategic competitive positioning (SCP) measure to capture a firm’s unique competitive and strategic positioning based on annual corporate filings. Using an unsupervised machine learning approach, we use structural holes, a concept in network theory, to develop and operationalize an SCP measure derived from a strategic similarity matrix of all existing U.S. publicly traded firms. This enables us to construct a robust firm-level SCP measure with minimal human intervention. Our measure dynamically captures competitive positioning across different firms and years without using artificially bounded industry classification systems. We illustrate how the measure dynamically captures firm-level, industry-level, and cross-industry strategic changes. Then, we demonstrate the effectiveness of our measure with an empirical demonstration showing the imprinting effect of SCP at the time of initial public offering (IPO) on the subsequent performance of the firm. The results show that our unsupervised SCP measure predicts post-IPO performance. This paper makes a significant methodological contribution to the information systems and strategic management literature by proposing a network theory-based unsupervised approach to dynamically measure firm-level strategic competitive positioning. The measure can be easily applied to firm-specific, industry-level, and cross-industry research questions across a wide variety of fields and contexts.

Matching Mobile Applications for Cross Promotion (ISR 2020)

Lee, Gene Moo, Shu He, Joowon Lee, Andrew B. Whinston (2020) Matching Mobile Applications for Cross-Promotion. Information Systems Research 31(3), pp. 865-891.

  • Based on an industry collaboration with IGAWorks
  • Presented in Chicago Marketing Analytics (Chicago, IL 2013), WeB (Auckland, New Zealand 2014), Notre Dame (2015), Temple (2015), UC Irvine (2015), Indiana (2015), UT Dallas (2015), Minnesota (2015), UT Arlington (2015), Michigan State (2016), Korea Univ (2021)
  • Dissertation Paper #3
  • Research assistant: Raymond Situ

The mobile applications (apps) market is one of the most successful software markets. As the platform grows rapidly, with millions of apps and billions of users, search costs are increasing tremendously. The challenge is how app developers can target the right users with their apps and how consumers can find the apps that fit their needs. Cross-promotion, advertising a mobile app (target app) in another app (source app), is introduced as a new app-promotion framework to alleviate the issue of search costs. In this paper, we model source app user behaviors (downloads and postdownload usages) with respect to different target apps in cross-promotion campaigns. We construct a novel app similarity measure using latent Dirichlet allocation topic modeling on apps’ production descriptions and then analyze how the similarity between the source and target apps influences users’ app download and usage decisions. To estimate the model, we use a unique data set from a large-scale random matching experiment conducted by a major mobile advertising company in Korea. The empirical results show that consumers prefer more diversified apps when they are making download decisions compared with their usage decisions, which is supported by the psychology literature on people’s variety-seeking behavior. Lastly, we propose an app-matching system based on machine-learning models (on app download and usage prediction) and generalized deferred acceptance algorithms. The simulation results show that app analytics capability is essential in building accurate prediction models and in increasing ad effectiveness of cross-promotion campaigns and that, at the expense of privacy, individual user data can further improve the matching performance. This paper has implications on the trade-off between utility and privacy in the growing mobile economy.

Predicting Litigation Risk via Machine Learning

Lee, Gene Moo*, James Naughton*, Xin Zheng*, Dexin Zhou* (2020) “Predicting Litigation Risk via Machine Learning,” Working Paper. [SSRN] (* equal contribution)

This study examines whether and how machine learning techniques can improve the prediction of litigation risk relative to the traditional logistic regression model. Existing litigation literature has no consensus on a predictive model. Additionally, the evaluation of litigation model performance is ad hoc. We use five popular machine learning techniques to predict litigation risk and benchmark their performance against the logistic regression model in Kim and Skinner (2012). Our results show that machine learning techniques can significantly improve the predictability of litigation risk. We identify two best-performing methods (random forest and convolutional neural networks) and rank the importance of predictors. Additionally, we show that models using economically-motivated ratio variables perform better than models using raw variables. Overall, our results suggest that the joint consideration of economically-meaningful predictors and machine learning techniques maximize the improvement of predictive litigation models.

On the Spillover Effects of Online Product Reviews on Purchases: Evidence from Clickstream Data (ISR 2021)

Kwark, Young.*, Gene Moo Lee*, Paul A. Pavlou*, Liangfei Qiu* (2021) “On the Spillover Effects of Online Product Reviews on Purchases: Evidence from Clickstream Data“. Information Systems Research 32(3): 895-913. (* equal contribution)

  • Data awarded by Wharton Consumer Analytics Initiative
  • Presented in WCBI (Snowbird, UT 2015), KMIS (Busan, Korea 2016), Minnesota (2016), ICIS (Dublin, Ireland 2016), Boston Univ. (2017), HEC Paris (2017), and Korea Univ. (2018)
  • An earlier version was published in ICIS 2016
  • Research assistants: Bolat Khojayev, Raymond Situ

We study the spillover effects of the online reviews of other covisited products on the purchases of a focal product using clickstream data from a large retailer. The proposed spillover effects are moderated by (a) whether the related (covisited) products are complementary or substitutive, (b) the choice of media channel (mobile or personal computer (PC)) used, (c) whether the related products are from the same or a different brand, (d) consumer experience, and (e) the variance of the review ratings. To identify complementary and substitutive products, we develop supervised machine-learning models based on product characteristics, such as product category and brand, and novel text-based similarity measures. We train and validate the machine-learning models using product pair labels from Amazon Mechanical Turk. Our results show that the mean rating of substitutive (complementary) products has a negative (positive) effect on purchasing of the focal product. Interestingly, the magnitude of the spillover effects of the mean ratings of covisited (substitutive and complementary) products is significantly larger than the effects on the focal product, especially for complementary products. The spillover effect of ratings is stronger for consumers who use mobile devices versus PCs. We find the negative effect of the mean ratings of substitutive products across different brands on purchasing of a focal product to be significantly higher than within the same brand. Lastly, the effect of the mean ratings is stronger for less experienced consumers and for ratings with lower variance. We discuss implications on leveraging the spillover effect of the online product reviews of related products to encourage online purchases.

Does Deceptive Marketing Pay? The Evolution of Consumer Sentiment Surrounding a Pseudo-Product-Harm Crisis (J. Business Ethics 2019)

Song, Reo, Ho Kim, Gene Moo Lee, and Sungha Jang (2019) Does Deceptive Marketing Pay? The Evolution of Consumer Sentiment Surrounding a Pseudo-Product-Harm CrisisJournal of Business Ethics, 158(3), pp. 743-761.

The slandering of a firm’s products by competing firms poses significant threats to the victim firm, with the resulting damage often being as harmful as that from product-harm crises. In contrast to a true product-harm crisis, however, this disparagement is based on a false claim or fake news; thus, we call it a pseudo-product-harm crisis. Using a pseudo-product-harm crisis event that involved two competing firms, this research examines how consumer sentiments about the two firms evolved in response to the crisis. Our analyses show that while both firms suffered, the damage to the offending firm (which spread fake news to cause the crisis) was more detrimental, in terms of advertising effectiveness and negative news publicity, than that to the victim firm (which suffered from the false claim). Our study indicates that, even apart from ethical concerns, the false claim about the victim firm was not an effective business strategy to increase the offending firm’s performance.

A Friend Like Me: Modeling Network Formation in a Location-Based Social Network (JMIS 2016)

Lee, Gene Moo*, Liangfei Qiu*, Andrew B. Whinston* (2016) A Friend Like Me: Modeling Network Formation in a Location-Based Social Network, Journal of Management Information Systems 33(4), pp. 1008-1033. (* equal contribution)

  • Best Paper Nomination at HICSS 2016
  • Presented in WITS (Auckland, New Zealand 2014), and WISE (Auckland, New Zeland 2014), HICSS (Kauai, HI 2016)
  • Dissertation Paper #2

This article studies the strategic network formation in a location-based social network. We build an empirical model of social link creation that incorporates individual characteristics and pairwise user similarities. Specifically, we define four user proximity measures from biography, geography, mobility, and short messages. To construct proximity from unstructured text information, we build topic models using Latent Dirichlet Allocation. Using Gowalla data with 385,306 users, 3 million locations, and 35 million check-in records, we empirically estimate the model to find evidence on the homophily effect on network formation. To cope with possible endogeneity issues, we use exogenous weather shocks as our instrumental variables and find the empirical results are robust: network formation decisions are significantly affected by our proximity measures.