Tag Archives: topic modeling

Corporate Social Network Analysis: A Deep Learning Approach

Cao, Rui, Gene Moo Lee, Hasan Cavusoglu. “Corporate Social Network Analysis: A Deep Learning Approach,” Working Paper.

Identifying inter-firm relationships is critical in understanding the industry landscape. However, due to the dynamic nature of such relationships, it is challenging to capture corporate social networks in a scalable and timely manner. To address this issue, this research develops a framework to build corporate social network representations by applying natural language processing (NLP) techniques on a corpus of 10-K filings, describing the reporting firms’ perceived relationships with other firms. Our framework uses named-entity recognition (NER) to locate the corporate names in the text, topic modeling to identify types of relationships included, and BERT to predict the type of relationship described in each sentence. To show the value of the network measures created by the proposed framework, we conduct two empirical analyses to see their impacts on firm performance. The first study shows that competition relationship and in-degree measurements on all relationship types have prediction power in estimating future earnings. The second study focuses on the difference between individual perspectives in an inter-firm social network. Such a difference is measured by the direction of mentions and is an indicator of a firm’s success in network governance. Receiving more mentions from other firms is a positive signal to network governance and it shows a significant positive correlation with firm performance next year.

IT Risk and Stock Price Crash Risk (Working Paper)

Song, Victor, Hasan Cavusoglu, Mary L. Z. Ma, Gene Moo Lee (2023) “IT Risk and Stock Price Crash Risk,” Under review.

IT risk, especially cybersecurity risk, has rapidly increased and become a top concern for researchers, regulators, firm managers, and investors. This study creates a novel firm-level IT risk measure applicable to all US-listed firms by applying the BERTopic topic modeling to risk factors reported in Item 1A of the 10-K annual reports. We validate the measure with multiple approaches including cross-validations, presenting illustrative excerpts of IT risk factors, conducting cross-sectional and over-time distribution analyses, and analyzing firm characteristics associated with IT risk. The measure is found to be heightened in IT-intensive industries and for firms with larger sizes, higher profits, and better growth potential, and it can predict future data breaches. Using this ex-ante IT risk measure, we examine the relation between IT risk and stock price crash risk, which reflects a firm’s propensity to stock price crashes. Our findings suggest that IT risk is positively associated with crash risk, and we also identify that downward operating risk and predictability for data breaches are two mechanisms for the crash risk effect of IT risk. By decomposing IT risk into cybersecurity risk and non-cybersecurity IT risk, we find that both types of IT risk increase crash risk, but the effect of cybersecurity risk is stronger than that of non-cybersecurity IT risk, consistent with their different risk natures. We further observe that the novelty and readability of IT risk factors strengthen the crash risk effects of IT risk, consistent with the notion that the novelty represents updated and increased IT risk, and readability improves the understanding of IT risk. Lastly, difference-in-differences analyses reveal that IT risk increases stock price crash risk, not the other way around. We conclude the paper by discussing academic contributions and practical implications in the context of the SEC’s directives on reporting and managing IT risk and cybersecurity risk.

Enhancing Social Media Analysis with Visual Data Analytics: A Deep Learning Approach (MISQ 2020)

Shin, Donghyuk, Shu He, Gene Moo Lee, Andrew B. Whinston, Suleyman Cetintas, Kuang-Chih Lee (2020) Enhancing Social Media Analysis with Visual Data Analytics: A Deep Learning Approach, MIS Quarterly, 44(4), pp. 1459-1492. [SSRN]

  • Based on an industry collaboration with Yahoo! Research
  • The first MISQ methods article based on machine learning
  • Presented in WeB (Fort Worth, TX 2015), WITS (Dallas, TX 2015), UT Arlington (2016), Texas FreshAIR (San Antonio, TX 2016), SKKU (2016), Korea Univ. (2016), Hanyang (2016), Kyung Hee (2016), Chung-Ang (2016), Yonsei (2016), Seoul National Univ. (2016), Kyungpook National Univ. (2016), UKC (Dallas, TX 2016), UBC (2016), INFORMS CIST (Nashville, TN 2016), DSI (Austin, TX 2016), Univ. of North Texas (2017), Arizona State (2018), Simon Fraser (2019), Saarland (2021), Kyung Hee (2021), Tennessee Chattanooga (2021), Rochester (2021), KAIST (2021), Yonsei (2021), UBC (2022), Temple (2023)

This research methods article proposes a visual data analytics framework to enhance social media research using deep learning models. Drawing on the literature of information systems and marketing, complemented with data-driven methods, we propose a number of visual and textual content features including complexity, similarity, and consistency measures that can play important roles in the persuasiveness of social media content. We then employ state-of-the-art machine learning approaches such as deep learning and text mining to operationalize these new content features in a scalable and systematic manner. For the newly developed features, we validate them against human coders on Amazon Mechanical Turk. Furthermore, we conduct two case studies with a large social media dataset from Tumblr to show the effectiveness of the proposed content features. The first case study demonstrates that both theoretically motivated and data-driven features significantly improve the model’s power to predict the popularity of a post, and the second one highlights the relationships between content features and consumer evaluations of the corresponding posts. The proposed research framework illustrates how deep learning methods can enhance the analysis of unstructured visual and textual data for social media research.

Matching Mobile Applications for Cross Promotion (ISR 2020)

Lee, Gene Moo, Shu He, Joowon Lee, Andrew B. Whinston (2020) Matching Mobile Applications for Cross-Promotion. Information Systems Research 31(3), pp. 865-891.

  • Based on an industry collaboration with IGAWorks
  • Presented in Chicago Marketing Analytics (Chicago, IL 2013), WeB (Auckland, New Zealand 2014), Notre Dame (2015), Temple (2015), UC Irvine (2015), Indiana (2015), UT Dallas (2015), Minnesota (2015), UT Arlington (2015), Michigan State (2016), Korea Univ (2021)
  • Dissertation Paper #3
  • Research assistant: Raymond Situ

The mobile applications (apps) market is one of the most successful software markets. As the platform grows rapidly, with millions of apps and billions of users, search costs are increasing tremendously. The challenge is how app developers can target the right users with their apps and how consumers can find the apps that fit their needs. Cross-promotion, advertising a mobile app (target app) in another app (source app), is introduced as a new app-promotion framework to alleviate the issue of search costs. In this paper, we model source app user behaviors (downloads and postdownload usages) with respect to different target apps in cross-promotion campaigns. We construct a novel app similarity measure using latent Dirichlet allocation topic modeling on apps’ production descriptions and then analyze how the similarity between the source and target apps influences users’ app download and usage decisions. To estimate the model, we use a unique data set from a large-scale random matching experiment conducted by a major mobile advertising company in Korea. The empirical results show that consumers prefer more diversified apps when they are making download decisions compared with their usage decisions, which is supported by the psychology literature on people’s variety-seeking behavior. Lastly, we propose an app-matching system based on machine-learning models (on app download and usage prediction) and generalized deferred acceptance algorithms. The simulation results show that app analytics capability is essential in building accurate prediction models and in increasing ad effectiveness of cross-promotion campaigns and that, at the expense of privacy, individual user data can further improve the matching performance. This paper has implications on the trade-off between utility and privacy in the growing mobile economy.

Development of Topic Trend Analysis Model for Industrial Intelligence using Public Data (J. Technology Innovation 2018)

Park, S., Lee, G. M., Kim, Y.-E., Seo, J. (2018). Development of Topic Trend Analysis Model for Industrial Intelligence using Public Data (in Korean)Journal of Technology Innovation, 26(4), 199-232.

  • Funded by the Korea Institute of Science and Technology Information (KISTI)
  • Demo website: https://misr.sauder.ubc.ca/edgar_dashboard/
  • Presented at UKC (2017), KISTI (2017), WITS (2017), Rutgers Business School (2018)

There are increasing needs for understanding and fathoming of the business management environment through big data analysis at the industrial and corporative level. The research using the company disclosure information, which is comprehensively covering the business performance and the future plan of the company, is getting attention. However, there is limited research on developing applicable analytical models leveraging such corporate disclosure data due to its unstructured nature. This study proposes a text-mining-based analytical model for industrial and firm-level analyses using publicly available company disclosure data. Specifically, we apply LDA topic model and word2vec word embedding model on the U.S. SEC data from the publicly listed firms and analyze the trends of business topics at the industrial and corporate levels.

Using LDA topic modeling based on SEC EDGAR 10-K document, whole industrial management topics are figured out. For comparison of different pattern of industries’ topic trend, software and hardware industries are compared in recent 20 years. Also, the changes in management subject at the firm level are observed with a comparison of two companies in the software industry. The changes in topic trends provide a lens for identifying decreasing and growing management subjects at industrial and firm-level. Mapping companies and products(or services) based on dimension reduction after using word2vec word embedding model and principal component analysis of 10-K document at the firm level in the software industry, companies and products(services) that have similar management subjects are identified and also their changes in decades.

For suggesting a methodology to develop an analytical model based on public management data at the industrial and corporate level, there may be contributions in terms of making the ground of practical methodology to identifying changes of management subjects. However, there are required further researches to provide a microscopic analytical model with regard to the relation of technology management strategy between management performance in case of related to the various pattern of management topics as of frequent changes of management subject or their momentum. Also, more studies are needed for developing competitive context analysis model with product(service)-portfolios between firms.

On the Spillover Effects of Online Product Reviews on Purchases: Evidence from Clickstream Data (ISR 2021)

Kwark, Young*, Gene Moo Lee*, Paul A. Pavlou*, Liangfei Qiu* (2021) On the Spillover Effects of Online Product Reviews on Purchases: Evidence from Clickstream Data. Information Systems Research 32(3): 895-913. (* equal contribution)

  • Data awarded by Wharton Consumer Analytics Initiative
  • Presented in WCBI (Snowbird, UT 2015), KMIS (Busan, Korea 2016), Minnesota (2016), ICIS (Dublin, Ireland 2016), Boston Univ. (2017), HEC Paris (2017), and Korea Univ. (2018)
  • An earlier version was published in ICIS 2016
  • Research assistants: Bolat Khojayev, Raymond Situ

We study the spillover effects of the online reviews of other covisited products on the purchases of a focal product using clickstream data from a large retailer. The proposed spillover effects are moderated by (a) whether the related (covisited) products are complementary or substitutive, (b) the choice of media channel (mobile or personal computer (PC)) used, (c) whether the related products are from the same or a different brand, (d) consumer experience, and (e) the variance of the review ratings. To identify complementary and substitutive products, we develop supervised machine-learning models based on product characteristics, such as product category and brand, and novel text-based similarity measures. We train and validate the machine-learning models using product pair labels from Amazon Mechanical Turk. Our results show that the mean rating of substitutive (complementary) products has a negative (positive) effect on purchasing of the focal product. Interestingly, the magnitude of the spillover effects of the mean ratings of covisited (substitutive and complementary) products is significantly larger than the effects on the focal product, especially for complementary products. The spillover effect of ratings is stronger for consumers who use mobile devices versus PCs. We find the negative effect of the mean ratings of substitutive products across different brands on purchasing of a focal product to be significantly higher than within the same brand. Lastly, the effect of the mean ratings is stronger for less experienced consumers and for ratings with lower variance. We discuss implications on leveraging the spillover effect of the online product reviews of related products to encourage online purchases.

A Friend Like Me: Modeling Network Formation in a Location-Based Social Network (JMIS 2016)

Lee, Gene Moo*, Liangfei Qiu*, Andrew B. Whinston* (2016) A Friend Like Me: Modeling Network Formation in a Location-Based Social Network, Journal of Management Information Systems 33(4), pp. 1008-1033. (* equal contribution)

  • Best Paper Nomination at HICSS 2016
  • Presented in WITS (Auckland, New Zealand 2014), and WISE (Auckland, New Zeland 2014), HICSS (Kauai, HI 2016)
  • Dissertation Paper #2

This article studies the strategic network formation in a location-based social network. We build an empirical model of social link creation that incorporates individual characteristics and pairwise user similarities. Specifically, we define four user proximity measures from biography, geography, mobility, and short messages. To construct proximity from unstructured text information, we build topic models using Latent Dirichlet Allocation. Using Gowalla data with 385,306 users, 3 million locations, and 35 million check-in records, we empirically estimate the model to find evidence on the homophily effect on network formation. To cope with possible endogeneity issues, we use exogenous weather shocks as our instrumental variables and find the empirical results are robust: network formation decisions are significantly affected by our proximity measures.

Toward a Better Measure of Business Proximity: Topic Modeling for Industry Intelligence (MISQ 2016)

Shi, Zhan, Gene Moo Lee, Andrew B. Whinston (2016) Toward a Better Measure of Business Proximity: Topic Modeling for Industry IntelligenceMIS Quarterly 40(4), pp. 1035-1056.

In this article, we propose a new data-analytic approach to measure firms’ dyadic business proximity. Specifically, our method analyzes the unstructured texts that describe firms’ businesses using the statistical learning technique of topic modeling, and constructs a novel business proximity measure based on the output. When compared with existent methods, our approach is scalable for large datasets and provides finer granularity on quantifying firms’ positions in the spaces of product, market, and technology. We then validate our business proximity measure in the context of industry intelligence and show the measure’s effectiveness in an empirical application of analyzing mergers and acquisitions in the U.S. high technology industry. Based on the research, we also build a cloud-based information system to facilitate competitive intelligence on the high technology industry.

The Spillover Effects of User-Generated Online Product Reviews on Purchases: Evidence from Clickstream Data (ICIS 2016)

Kwark, Y., Lee, G. M., Pavlou, P. A., Qiu, L. (2016). The Spillover Effects of User-Generated Online Product Reviews on Purchases: Evidence from Clickstream DataProceedings of International Conference on Information Systems (ICIS 2016), Dublin, Ireland.

We analyze the spillover effect of online product reviews on purchases using clickstream data from a large retailer by investigating (a) whether the products are complementary/substitutive; (b) whether the products are from the same or a different brand, and (c) which media channel (mobile or PC) is used. To identify complementary/substitutive products, we used a text-mining approach of topic modeling on product descriptions to quantify the functional similarity of pairwise products. Our empirical analysis shows that the mean rating of online reviews of substitutive products has a negative role in purchasing, while the rating of complementary products has a positive role. Also, we find the negative spillover effect among substitutive products of different brands to be significantly greater than those of the same brand and for consumers who used mobile devices versus traditional PCs. Our study has implications on leveraging the spillover effect of online product reviews on substitutive/complementary products.

Strategic Network Formation in a Location-Based Social Network: A Topic Modeling Approach (HICSS 2016)

Lee, G. M., Qiu, L., Whinston, A. B. (2016). Strategic Network Formation in a Location-Based Social Network: A Topic Modeling ApproachProceedings of Hawaii International Conference on System Sciences (HICSS 2016), Kauai, Hawaii. Nominated for Best Paper Award

This paper studies strategic network formation in a location-based social network. We build a structural model of social link creation that incorporates individual characteristics and pairwise user similarities. Specifically, we define four user proximity measures from biography, geography, mobility, and short messages. To construct proximity from unstructured text information, we build topic models using latent Dirichlet allocation. Using Gowalla data with 385,306 users, three million locations, and 35 million check-in records, we empirically estimate the structural model to find evidence on the homophily effect in network formation.