Tag Archives: big data

IT Risk and Stock Price Crashes (Working Paper)

Song, Victor, Hasan Cavusoglu, Gene Moo Lee, Mary L. Z. Ma (2021) “IT Risk and Stock Price Crashes,” Under Review. [HICSS version]

As firms increasingly depend on Information Technology (IT) in their business strategies and value creation activities, risks associated with IT have become one of the top concerns for managers and investors. This study examines the relation between IT-related risk factor information in Item 1A of the 10-K annual reports and a firm’s stock price crash risk, a firm-specific propensity to stock price crashes. Using the text-mining approach of Latent Dirichlet Allocation topic modeling to identify IT-related risk factors, we find that IT risk emerges as one of the firms’ key risk factors and that IT risk is positively associated with a firm’s future stock price crash risk. We further separate IT risk factors into cybersecurity risk that potentially leads to a loss or leak of data, and IT value risk that relates to a firm’s reliance on IT for its competitive advantage and value creation activities. We find that cybersecurity risk continues to affect crash risk, but IT value risk does not, consistent with their different risk natures. We also find that the readability, novelty, and the order of appearance of the IT risk factor information, specifically cybersecurity risk, in Item 1A enhance the information content of risk factors and strengthen their relation with stock price crash risk.

Enhancing Social Media Analysis with Visual Data Analytics: A Deep Learning Approach (MISQ 2020)

Shin, Donghyuk, Shu He, Gene Moo Lee, Andrew B. Whinston, Suleyman Cetintas, Kuang-Chih Lee (2020) “Enhancing Social Media Analysis with Visual Data Analytics: A Deep Learning Approach,” MIS Quarterly, 44(4), pp. 1459-1492. [SSRN]

  • Based on an industry collaboration with Yahoo! Research
  • The first MISQ methods article based on machine learning
  • Presented in WeB (Fort Worth, TX 2015), WITS (Dallas, TX 2015), UT Arlington (2016), Texas FreshAIR (San Antonio, TX 2016), SKKU (2016), Korea Univ. (2016), Hanyang (2016), Kyung Hee (2016), Chung-Ang (2016), Yonsei (2016), Seoul National Univ. (2016), Kyungpook National Univ. (2016), UKC (Dallas, TX 2016), UBC (2016), INFORMS CIST (Nashville, TN 2016), DSI (Austin, TX 2016), Univ. of North Texas (2017), Arizona State (2018), Simon Fraser (2019), Saarland (2021), Kyung Hee (2021), Tennessee Chattanooga (2021), Rochester (2021), KAIST (2021)

This research methods article proposes a visual data analytics framework to enhance social media research using deep learning models. Drawing on the literature of information systems and marketing, complemented with data-driven methods, we propose a number of visual and textual content features including complexity, similarity, and consistency measures that can play important roles in the persuasiveness of social media content. We then employ state-of-the-art machine learning approaches such as deep learning and text mining to operationalize these new content features in a scalable and systematic manner. For the newly developed features, we validate them against human coders on Amazon Mechanical Turk. Furthermore, we conduct two case studies with a large social media dataset from Tumblr to show the effectiveness of the proposed content features. The first case study demonstrates that both theoretically motivated and data-driven features significantly improve the model’s power to predict the popularity of a post, and the second one highlights the relationships between content features and consumer evaluations of the corresponding posts. The proposed research framework illustrates how deep learning methods can enhance the analysis of unstructured visual and textual data for social media research.

Understanding Security Vulnerability Awareness, Firm Incentives, and ICT Development in Pan-Asia (JMIS 2020)

Zhuang, Yunhui, Yunsik Choi, Shu He, Alvin Chung Man Leung, Gene Moo Lee, Andrew B. Whinston (2020) “Understanding Security Vulnerability Awareness, Firm Incentives, and ICT Development in Pan-Asia“. Journal of Management Information Systems, 37(3): 668-693.

This paper investigates how the awareness of a security vulnerability index affects firms’ security protection strategy and how the information awareness effect interacts with firm incentives and country-wide IT development level. The security index is constructed based on outgoing spams and phishing website hosting, which may serve as an indicator of a firm’s security controls. To study whether security vulnerability awareness causes firms to improve their security, we conducted a randomized field experiment on 1,262 firms in six Pan-Asian countries and regions. Among 631 randomly selected treated firms, we alerted them of their security vulnerability index and their relative rankings compared to their peers via advisory emails and websites. Difference-in-differences analyses show that compared with the controls, the treated firms improve their security over time, with a statistically significant reduction of outgoing spam volume according to one of the data sources but not phishing website hosting. However, a statistically significant reduction in phishing website hosting was observed among non-web hosting firms, suggesting that firms’ underlying incentives play an important role in the treatment effect. Lastly, exploiting the multi-country nature of the data, we found that firms in countries with high information and communications technology (ICT) development are more responsive to our intervention because they have higher IT capabilities and more resources to resolve security issues. Our study provides cybersecurity policymakers with useful insights on how firm incentives and ICT environments play roles in firms’ security measure adoption.

Development of Topic Trend Analysis Model for Industrial Intelligence using Public Data (J. Technology Innovation 2018)

Park, S., Lee, G. M., Kim, Y.-E., Seo, J. (2018). Development of Topic Trend Analysis Model for Industrial Intelligence using Public Data (in Korean)Journal of Technology Innovation, 26(4), 199-232.

  • Funded by the Korea Institute of Science and Technology Information (KISTI)
  • Demo website: https://misr.sauder.ubc.ca/edgar_dashboard/
  • Presented at UKC (2017), KISTI (2017), WITS (2017), Rutgers Business School (2018)

There are increasing needs for understanding and fathoming of the business management environment through big data analysis at the industrial and corporative level. The research using the company disclosure information, which is comprehensively covering the business performance and the future plan of the company, is getting attention. However, there is limited research on developing applicable analytical models leveraging such corporate disclosure data due to its unstructured nature. This study proposes a text-mining-based analytical model for industrial and firm-level analyses using publicly available company disclosure data. Specifically, we apply LDA topic model and word2vec word embedding model on the U.S. SEC data from the publicly listed firms and analyze the trends of business topics at the industrial and corporate levels.

Using LDA topic modeling based on SEC EDGAR 10-K document, whole industrial management topics are figured out. For comparison of different pattern of industries’ topic trend, software and hardware industries are compared in recent 20 years. Also, the changes in management subject at the firm level are observed with a comparison of two companies in the software industry. The changes in topic trends provide a lens for identifying decreasing and growing management subjects at industrial and firm-level. Mapping companies and products(or services) based on dimension reduction after using word2vec word embedding model and principal component analysis of 10-K document at the firm level in the software industry, companies and products(services) that have similar management subjects are identified and also their changes in decades.

For suggesting a methodology to develop an analytical model based on public management data at the industrial and corporate level, there may be contributions in terms of making the ground of practical methodology to identifying changes of management subjects. However, there are required further researches to provide a microscopic analytical model with regard to the relation of technology management strategy between management performance in case of related to the various pattern of management topics as of frequent changes of management subject or their momentum. Also, more studies are needed for developing competitive context analysis model with product(service)-portfolios between firms.

Developing Cyber Risk Assessment Framework for Cyber Insurance: A Big Data Approach (KIRI Research Report 2018)

Lee, G. M. (2018). Developing Cyber Risk Assessment Framework for Cyber Insurance: A Big Data Approach (in Korean)KIRI Research Report 2018-15.

As our society is heavily dependent on information and communication technology, the associated risk has also significantly increased. Cyber insurance has been emerged as a possible means to better manage such cyber risk. However, the cyber insurance market is still in a premature stage due to the lack of data sharing and standards on cyber risk and cyber insurance. To address this issue, this research proposes a data-driven framework to assess cyber risk using externally observable cyber attack data sources such as outbound spam and phishing websites. We show that the feasibility of such an approach by building cyber risk assessment reports for Korean organizations. Then, by conducting a large-scale randomized field experiment, we measure the causal effect of cyber risk disclosure on organizational security levels. Finally, we develop machine-learning models to predict data breach incidents, as a case of cyber incidents, using the developed cyber risk assessment data. We believe that the proposed data-driven methods can be a stepping-stone to enable information transparency in the cyber insurance market.

Predicting Litigation Risk via Machine Learning

Lee, Gene Moo*, James Naughton*, Xin Zheng*, Dexin Zhou* (2020) “Predicting Litigation Risk via Machine Learning,” Working Paper. [SSRN] (* equal contribution)

This study examines whether and how machine learning techniques can improve the prediction of litigation risk relative to the traditional logistic regression model. Existing litigation literature has no consensus on a predictive model. Additionally, the evaluation of litigation model performance is ad hoc. We use five popular machine learning techniques to predict litigation risk and benchmark their performance against the logistic regression model in Kim and Skinner (2012). Our results show that machine learning techniques can significantly improve the predictability of litigation risk. We identify two best-performing methods (random forest and convolutional neural networks) and rank the importance of predictors. Additionally, we show that models using economically-motivated ratio variables perform better than models using raw variables. Overall, our results suggest that the joint consideration of economically-meaningful predictors and machine learning techniques maximize the improvement of predictive litigation models.

A Friend Like Me: Modeling Network Formation in a Location-Based Social Network (JMIS 2016)

Lee, Gene Moo*, Liangfei Qiu*, Andrew B. Whinston* (2016) A Friend Like Me: Modeling Network Formation in a Location-Based Social Network, Journal of Management Information Systems 33(4), pp. 1008-1033. (* equal contribution)

  • Best Paper Nomination at HICSS 2016
  • Presented in WITS (Auckland, New Zealand 2014), and WISE (Auckland, New Zeland 2014), HICSS (Kauai, HI 2016)
  • Dissertation Paper #2

This article studies the strategic network formation in a location-based social network. We build an empirical model of social link creation that incorporates individual characteristics and pairwise user similarities. Specifically, we define four user proximity measures from biography, geography, mobility, and short messages. To construct proximity from unstructured text information, we build topic models using Latent Dirichlet Allocation. Using Gowalla data with 385,306 users, 3 million locations, and 35 million check-in records, we empirically estimate the model to find evidence on the homophily effect on network formation. To cope with possible endogeneity issues, we use exogenous weather shocks as our instrumental variables and find the empirical results are robust: network formation decisions are significantly affected by our proximity measures.

How would information disclosure influence organizations’ outbound spam volume? Evidence from a field experiment (J. Cybersecurity 2016)

He, Shu*, Gene Moo Lee*, Sukjin Han, Andrew B. Whinston (2016) How Would Information Disclosure Influence Organizations’ Outbound Spam Volume? Evidence from a Field ExperimentJournal of Cybersecurity 2(1), pp. 99-118. (* equal contribution)

Cyber-insecurity is a serious threat in the digital world. In the present paper, we argue that a suboptimal cybersecurity environment is partly due to organizations’ underinvestment on security and a lack of suitable policies. The motivation for this paper stems from a related policy question: how to design policies for governments and other organizations that can ensure a sufficient level of cybersecurity. We address the question by exploring a policy devised to alleviate information asymmetry and to achieve transparency in cybersecurity information sharing practice. We propose a cybersecurity evaluation agency along with regulations on information disclosure. To empirically evaluate the effectiveness of such an institution, we conduct a large-scale randomized field experiment on 7919 US organizations. Specifically, we generate organizations’ security reports based on their outbound spam relative to the industry peers, then share the reports with the subjects in either private or public ways. Using models for heterogeneous treatment effects and machine learning techniques, we find evidence from this experiment that the security information sharing combined with publicity treatment has significant effects on spam reduction for original large spammers. Moreover, significant peer effects are observed among industry peers after the experiment.

Toward a Better Measure of Business Proximity: Topic Modeling for Industry Intelligence (MISQ 2016)

Shi, Zhan, Gene Moo Lee, Andrew B. Whinston (2016) Toward a Better Measure of Business Proximity: Topic Modeling for Industry IntelligenceMIS Quarterly 40(4), pp. 1035-1056.

In this article, we propose a new data-analytic approach to measure firms’ dyadic business proximity. Specifically, our method analyzes the unstructured texts that describe firms’ businesses using the statistical learning technique of topic modeling, and constructs a novel business proximity measure based on the output. When compared with existent methods, our approach is scalable for large datasets and provides finer granularity on quantifying firms’ positions in the spaces of product, market, and technology. We then validate our business proximity measure in the context of industry intelligence and show the measure’s effectiveness in an empirical application of analyzing mergers and acquisitions in the U.S. high technology industry. Based on the research, we also build a cloud-based information system to facilitate competitive intelligence on the high technology industry.

Strategic Network Formation in a Location-Based Social Network: A Topic Modeling Approach (HICSS 2016)

Lee, G. M., Qiu, L., Whinston, A. B. (2016). Strategic Network Formation in a Location-Based Social Network: A Topic Modeling ApproachProceedings of Hawaii International Conference on System Sciences (HICSS 2016), Kauai, Hawaii. Nominated for Best Paper Award

This paper studies strategic network formation in a location-based social network. We build a structural model of social link creation that incorporates individual characteristics and pairwise user similarities. Specifically, we define four user proximity measures from biography, geography, mobility, and short messages. To construct proximity from unstructured text information, we build topic models using latent Dirichlet allocation. Using Gowalla data with 385,306 users, three million locations, and 35 million check-in records, we empirically estimate the structural model to find evidence on the homophily effect in network formation.