Tag Archives: big data

IS Papers on Big Data, Analytics, and AI

First published: Feb 25, 2020, Last update: Dec 4, 2025.

My research involves Big Data Analytics and AI in Information Systems literature. This post tries to keep track of the editorial and seminal articles on the topic of Big Data, Data Science, Analytics, and AI in the Information Systems and Management literature. The papers are listed in chronological order:

  1. Bapna, Goes, Gopal, Marsden (2006) Moving from Data-Constrained to Data-Enabled Research: Experiences and Challenges in Collecting, Validating and Analyzing Large-Scale e-Commerce Data, Statistical Science 21(2): 116-130.
  2. Shmueli and Koppius (2011) Predictive Analytics in Information Systems Research, MIS Quarterly 35(3): 553-572
  3. Chen, Chiang, Storey, (2012) Business Intelligence and Analytics: From Big Data to Big Impact, MIS Quarterly 36(4): 1164-1188
  4. Lin, Lucas Jr., Shmueli (2013) Research Commentary: Too Big to Fail: Large Samples and the p-Value Problem, Information Systems Research 24(4): 906-917.
  5. Agarwal, Dhar (2014) Editorial – Big Data, Data Science, and Analytics: The Opportunity and Challenge for IS Research, Information Systems Research 25(3): 443-448
  6. Varian (2014) Big Data: New Tricks for Econometrics, Journal of Economic Perspectives 28(2): 3-28
  7. Goes (2014) Editor’s Comments: Big Data and IS Research, MIS Quarterly 38(3): iii-viii
  8. AMJ Editors (2016) From the Editors: Big Data and Data Science Methods for Management Research, Academy of Management Journal 59(5): 1493-1507
  9. Abbasi, Sarker, Chiang (2016) Big Data Research in Information Systems: Toward an Inclusive Research Agenda, Journal of the Association for Information Systems 17(2): i-xxxii
  10. Rai (2016) Editor’s Comments: Synergies Between Big Data and Theory, MIS Quarterly 40(2): iii-ix
  11. Baesens, Bapna, Marsden, Vanthienen, Zhao (2016) Transformational Issues of Big Data and Analytics in Networked Business, MIS Quarterly 40(4): 807-818
  12. Athey (2017) Beyond Prediction: Using Big Data for Policy Problems, Science 355(6324): 483-485
  13. Chiang, Grover, Liang, Zhang (2018) Special Issue: Strategic Value of Big Data and Business Analytics, Journal of Management Information Systems 35(2): 383-387
  14. Delen, Ram (2018) Research challenges and opportunities in business analytics, Journal of Business Analytics 1(1): 2-12.
  15. Maass, Parsons, Puraro, Storey, Woo (2018) Data-Driven Meets Theory-Driven Research in the Era of Big Data: Opportunities and Challenges for Information Systems Research, Journal of the Association for Information Systems 19(12): 1253-1273
  16. Yang, Adomavicius, Burtch, Ren (2018) Mind the Gap: Accounting for Measurement Error and Misclassification in Variables Generated via Data Mining, Information Systems Research 29(1): 4-24.
  17. Berente, Seidel, Safadi (2019) Research Commentary: Data-Driven Computationally Intensive Theory Development, Information Systems Research 30(1), 50-64.
  18. Johnson, Gray, Sarker (2019) Revisiting IS Research Practice in the Era of Big Data, Information and Organization 29(1): 41-56
  19. Grover, Lindberg, Benbasat, Lyytinen (2020) The Perils and Promises of Big Data Research in Information Systems, Journal of the Association for Information Systems 21(2): 268-291.
  20. Shmueli (2021) INFORMS Journal of Data Science (IJDS) Editorial #1: What is an IJDS paper?, INFORMS Journal of Data Science.
  21. Ram, Goes (2021) Focusing on Programmatic High Impact Information Systems Research, not Theory, to Address Grand Challenges, MIS Quarterly 45(1): 479-483.
  22. Burton-Jones, Boh, Oborn, Padmanabhan (2021) Editor’s Comments: Advancing Research Transparency at MIS Quarterly: A Pluralistic Approach, MIS Quarterly 45(2): iii-xviii.
  23. Berente, Gu, Recker, Santhanam (2021) Special Issue Editor’s Comments: Managing Artificial Intelligence, MIS Quarterly 45(3): 1433-1450.
  24. Jain, Padmanabhan, Pavlou, Raghu (2021) Editorial for the Special Section on Humans, Algorithms, and Augmented Intelligence: The Future of Work, Organizations, and Society, Information Systems Research 32(3): 675-687.
  25. Padmanabhan, Fang, Sahoo, Burton-Junes (2022) Editor’s Comments: Machine Learning in Information Systems Research, MIS Quarterly 46(1): iii-xix.
  26. Abbasi, Parsons, Pant, Sheng, Sarker (2024) Pathways for Design Research on Artificial Intelligence. Information Systems Research 35(2):441-459.
  27. Gopal, Ram D., Jingjing Li, Kai Riemer, Suprateek Sarker, Param Vir Singh, Anjana Susarla, Martin Bichler, Jason Bennett Thatcher (2025) Inventing with Machines: Generative AI and the Evolving Landscape of IS Research. Information Systems Research.
  28. Caro, Felipe, Jean-Edouard Colliard, Elena Katok, Axel Ockenfels, Nicolas Stier-Moses, Catherine Tucker, D. J. Wu (2025) Introduction to the Special Issue on the Human-Algorithm Connection. Management Sciencehttps://doi.org/10.1287/mnsc.2023.intro.v72.n1

 

When Does Congruence Matter for Pre-roll Video Ads? The Effect of Multimodal, Ad-Content Congruence on the Ad Completion

Park, Sungho, Gene Moo Lee, Donghyuk Shin, Sang-Pil Han. “When Does Congruence Matter for Pre-roll Video Ads? The Effect of Multimodal, Ad-Content Congruence on the Ad Completion, Working Paper [Last update: Jan 29, 2023]

  • Previous title: Targeting Pre-Roll Ads using Video Analytics
  • Funded by Sauder Exploratory Research Grant 2020
  • Presented at Southern Methodist University (2020), University of Washington (2020), INFORMS (2020), AIMLBA (2020), WITS (2020), HKUST (2021), Maryland (2021), American University (2021), National University of Singapore (2021), Arizona (2022), George Mason (2022), KAIST (2022), Hanyang (2022), Kyung Hee (2022), McGill (2022)
  • Research assistants: Raymond Situ, Miguel Valarao

Pre-roll video ads are gaining industry traction because the audience may be willing to watch an ad for a few seconds, if not the entire ad, before the desired content video is shown. Conversely, a popular skippable type of pre-roll video ads, which enables viewers to skip an ad in a few seconds, creates opportunity costs for advertisers and online video platforms when the ad is skipped. Against this backdrop, we employ a video analytics framework to extract multimodal features from ad and content videos, including auditory signals and thematic visual information, and probe into the effect of ad-content congruence at each modality using a random matching experiment conducted by a major video advertising platform. The present study challenges the widely held view that ads that match content are more likely to be viewed than those that do not, and investigates the conditions under which congruence may or may not work. Our results indicate that non-thematic auditory signal congruence between the ad and content is essential in explaining viewers’ ad completion, while thematic visual congruence is only effective if the viewer has sufficient attentional and cognitive capacity to recognize such congruence. The findings suggest that thematic videos demand more cognitive processing power than auditory signals for viewers to perceive ad-content congruence, leading to decreased ad viewing. Overall, these findings have significant theoretical and practical implications for understanding whether and when viewers construct congruence in the context of pre-roll video ads and how advertisers might target their pre-roll video ads successfully.

Price Competition and Active or Inactive Consumer Search

Koh, Yumi, Gea M. Lee, Gene Moo Lee (2023) “Price Competition and Active or Inactive Consumer Search”. Working Paper. [Latest version: May 31, 2023] [SSRN]

We propose a price-competition model in which prices are dispersed and a fraction of consumers decide whether to make an immediate purchase without actively searching for prices or to search sequentially. We use an incomplete-information setting with heterogeneous production costs and information frictions:  rms’ production cost types are drawn from an interval and are privately observed. The model includes active or inactive consumer search as an equilibrium outcome and allows a competition-induced switch between the two outcomes. We study how firms and consumers interact in determining prices and making an active or inactive search when competition intensifies with more firms.

Computational Framework for Measuring Strategic Opportunities Based on Structural Hole Theory (JMIS 2026)

Lee, Myunghwan, Gene Moo Lee, Hasan Cavusoglu, Marc-David L. Seidel (2026) “Computational Framework for Measuring Strategic Opportunities Based on Structural Hole Theory“, Journal of Management Information Systems, Forthcoming.

Although opportunities play a central role in firm innovation and performance, prior research lacks a scalable, theory-grounded approach to measuring them. Existing measures are either context-specific or detached from explicit relational mechanisms, limiting their generalizability and interpretability. To address this gap, we propose a structural hole theory-guided computational design framework that enables fine-grained strategic opportunity measures: hole-opening, hole-entering, and non-hole positions. We demonstrate the effectiveness of this framework through a systematic analysis of IPO outcomes using panel data on U.S. public firms. We find that hole-opening positions are associated with higher post-IPO valuations, but a lower likelihood of M&A exits, whereas hole-entering and non-hole positions are linked to lower IPO valuations but higher probabilities of M&A outcomes. These patterns highlight distinct opportunity roles embedded in firms’ structural positions. We conclude the paper by discussing the broad applicability of the theory-guided computational framework for opportunity measurement in various IS research contexts.

IT Risk and Stock Price Crash Risk

Song, Victor, Hasan Cavusoglu, Jaecheol Park, Mary L. Z. Ma, Gene Moo Lee (2026) “IT Risk and Stock Price Crash Risk,” Under review.

This study examines whether and how firm-level information technology (IT) risk contributes to stock price crash risk. We construct a novel measure of ex-ante IT risk from risk factor disclosures in Item 1A of firms’ 10-K filings using advanced machine learning approaches. We find that higher IT risk is associated with greater stock price crash risk. Mechanism analyses indicate that this effect operates primarily through increased downside operating risk, rather than through heightened exposure to data breach events. We further document heterogeneity in the relationship between IT risk and stock price crash risk: (1) cybersecurity risk has a stronger effect than noncybersecurity IT risk; (2) the effect is stronger for newly disclosed IT risk factors; and (3) higher readability amplifies the crash risk effect. Together, these findings highlight IT risk as a previously underexplored determinant of stock price crash risk and offer new insights into the capital market consequences of firms’ IT-related disclosures.

Enhancing Social Media Analysis with Visual Data Analytics: A Deep Learning Approach (MISQ 2020)

Shin, Donghyuk, Shu He, Gene Moo Lee, Andrew B. Whinston, Suleyman Cetintas, Kuang-Chih Lee (2020) Enhancing Social Media Analysis with Visual Data Analytics: A Deep Learning Approach, MIS Quarterly, 44(4), pp. 1459-1492. [SSRN]

  • Based on an industry collaboration with Yahoo! Research
  • The first MISQ methods article based on machine learning
  • Presented in WeB (Fort Worth, TX 2015), WITS (Dallas, TX 2015), UT Arlington (2016), Texas FreshAIR (San Antonio, TX 2016), SKKU (2016), Korea Univ. (2016), Hanyang (2016), Kyung Hee (2016), Chung-Ang (2016), Yonsei (2016), Seoul National Univ. (2016), Kyungpook National Univ. (2016), UKC (Dallas, TX 2016), UBC (2016), INFORMS CIST (Nashville, TN 2016), DSI (Austin, TX 2016), Univ. of North Texas (2017), Arizona State (2018), Simon Fraser (2019), Saarland (2021), Kyung Hee (2021), Tennessee Chattanooga (2021), Rochester (2021), KAIST (2021), Yonsei (2021), UBC (2022), Temple (2023)

This research methods article proposes a visual data analytics framework to enhance social media research using deep learning models. Drawing on the literature of information systems and marketing, complemented with data-driven methods, we propose a number of visual and textual content features including complexity, similarity, and consistency measures that can play important roles in the persuasiveness of social media content. We then employ state-of-the-art machine learning approaches such as deep learning and text mining to operationalize these new content features in a scalable and systematic manner. For the newly developed features, we validate them against human coders on Amazon Mechanical Turk. Furthermore, we conduct two case studies with a large social media dataset from Tumblr to show the effectiveness of the proposed content features. The first case study demonstrates that both theoretically motivated and data-driven features significantly improve the model’s power to predict the popularity of a post, and the second one highlights the relationships between content features and consumer evaluations of the corresponding posts. The proposed research framework illustrates how deep learning methods can enhance the analysis of unstructured visual and textual data for social media research.

Understanding Security Vulnerability Awareness, Firm Incentives, and ICT Development in Pan-Asia (JMIS 2020)

Zhuang, Yunhui, Yunsik Choi, Shu He, Alvin Chung Man Leung, Gene Moo Lee, Andrew B. Whinston (2020) Understanding Security Vulnerability Awareness, Firm Incentives, and ICT Development in Pan-Asia. Journal of Management Information Systems, 37(3): 668-693.

This paper investigates how the awareness of a security vulnerability index affects firms’ security protection strategy and how the information awareness effect interacts with firm incentives and country-wide IT development level. The security index is constructed based on outgoing spams and phishing website hosting, which may serve as an indicator of a firm’s security controls. To study whether security vulnerability awareness causes firms to improve their security, we conducted a randomized field experiment on 1,262 firms in six Pan-Asian countries and regions. Among 631 randomly selected treated firms, we alerted them of their security vulnerability index and their relative rankings compared to their peers via advisory emails and websites. Difference-in-differences analyses show that compared with the controls, the treated firms improve their security over time, with a statistically significant reduction of outgoing spam volume according to one of the data sources but not phishing website hosting. However, a statistically significant reduction in phishing website hosting was observed among non-web hosting firms, suggesting that firms’ underlying incentives play an important role in the treatment effect. Lastly, exploiting the multi-country nature of the data, we found that firms in countries with high information and communications technology (ICT) development are more responsive to our intervention because they have higher IT capabilities and more resources to resolve security issues. Our study provides cybersecurity policymakers with useful insights on how firm incentives and ICT environments play roles in firms’ security measure adoption.

Development of Topic Trend Analysis Model for Industrial Intelligence using Public Data (J. Technology Innovation 2018)

Park, S., Lee, G. M., Kim, Y.-E., Seo, J. (2018). Development of Topic Trend Analysis Model for Industrial Intelligence using Public Data (in Korean)Journal of Technology Innovation, 26(4), 199-232.

  • Funded by the Korea Institute of Science and Technology Information (KISTI)
  • Demo website: https://misr.sauder.ubc.ca/edgar_dashboard/
  • Presented at UKC (2017), KISTI (2017), WITS (2017), Rutgers Business School (2018)

There are increasing needs for understanding and fathoming of the business management environment through big data analysis at the industrial and corporative level. The research using the company disclosure information, which is comprehensively covering the business performance and the future plan of the company, is getting attention. However, there is limited research on developing applicable analytical models leveraging such corporate disclosure data due to its unstructured nature. This study proposes a text-mining-based analytical model for industrial and firm-level analyses using publicly available company disclosure data. Specifically, we apply LDA topic model and word2vec word embedding model on the U.S. SEC data from the publicly listed firms and analyze the trends of business topics at the industrial and corporate levels.

Using LDA topic modeling based on SEC EDGAR 10-K document, whole industrial management topics are figured out. For comparison of different pattern of industries’ topic trend, software and hardware industries are compared in recent 20 years. Also, the changes in management subject at the firm level are observed with a comparison of two companies in the software industry. The changes in topic trends provide a lens for identifying decreasing and growing management subjects at industrial and firm-level. Mapping companies and products(or services) based on dimension reduction after using word2vec word embedding model and principal component analysis of 10-K document at the firm level in the software industry, companies and products(services) that have similar management subjects are identified and also their changes in decades.

For suggesting a methodology to develop an analytical model based on public management data at the industrial and corporate level, there may be contributions in terms of making the ground of practical methodology to identifying changes of management subjects. However, there are required further researches to provide a microscopic analytical model with regard to the relation of technology management strategy between management performance in case of related to the various pattern of management topics as of frequent changes of management subject or their momentum. Also, more studies are needed for developing competitive context analysis model with product(service)-portfolios between firms.

Developing Cyber Risk Assessment Framework for Cyber Insurance: A Big Data Approach (KIRI Research Report 2018)

Lee, G. M. (2018). Developing Cyber Risk Assessment Framework for Cyber Insurance: A Big Data Approach (in Korean)KIRI Research Report 2018-15.

As our society is heavily dependent on information and communication technology, the associated risk has also significantly increased. Cyber insurance has been emerged as a possible means to better manage such cyber risk. However, the cyber insurance market is still in a premature stage due to the lack of data sharing and standards on cyber risk and cyber insurance. To address this issue, this research proposes a data-driven framework to assess cyber risk using externally observable cyber attack data sources such as outbound spam and phishing websites. We show that the feasibility of such an approach by building cyber risk assessment reports for Korean organizations. Then, by conducting a large-scale randomized field experiment, we measure the causal effect of cyber risk disclosure on organizational security levels. Finally, we develop machine-learning models to predict data breach incidents, as a case of cyber incidents, using the developed cyber risk assessment data. We believe that the proposed data-driven methods can be a stepping-stone to enable information transparency in the cyber insurance market.

Predicting Litigation Risk via Machine Learning

Lee, Gene Moo*, James Naughton*, Xin Zheng*, Dexin Zhou* (2020) “Predicting Litigation Risk via Machine Learning,” Working Paper. [SSRN] (* equal contribution)

This study examines whether and how machine learning techniques can improve the prediction of litigation risk relative to the traditional logistic regression model. Existing litigation literature has no consensus on a predictive model. Additionally, the evaluation of litigation model performance is ad hoc. We use five popular machine learning techniques to predict litigation risk and benchmark their performance against the logistic regression model in Kim and Skinner (2012). Our results show that machine learning techniques can significantly improve the predictability of litigation risk. We identify two best-performing methods (random forest and convolutional neural networks) and rank the importance of predictors. Additionally, we show that models using economically-motivated ratio variables perform better than models using raw variables. Overall, our results suggest that the joint consideration of economically-meaningful predictors and machine learning techniques maximize the improvement of predictive litigation models.