Tag Archives: collaboration

Development of Topic Trend Analysis Model for Industrial Intelligence using Public Data (J. Technology Innovation 2018)

Park, S., Lee, G. M., Kim, Y.-E., Seo, J. (2018). Development of Topic Trend Analysis Model for Industrial Intelligence using Public Data (in Korean)Journal of Technology Innovation, 26(4), 199-232.

  • Funded by the Korea Institute of Science and Technology Information (KISTI)
  • Demo website: https://misr.sauder.ubc.ca/edgar_dashboard/
  • Presented at UKC (2017), KISTI (2017), WITS (2017), Rutgers Business School (2018)

There are increasing needs for understanding and fathoming of the business management environment through big data analysis at the industrial and corporative level. The research using the company disclosure information, which is comprehensively covering the business performance and the future plan of the company, is getting attention. However, there is limited research on developing applicable analytical models leveraging such corporate disclosure data due to its unstructured nature. This study proposes a text-mining-based analytical model for industrial and firm-level analyses using publicly available company disclosure data. Specifically, we apply LDA topic model and word2vec word embedding model on the U.S. SEC data from the publicly listed firms and analyze the trends of business topics at the industrial and corporate levels.

Using LDA topic modeling based on SEC EDGAR 10-K document, whole industrial management topics are figured out. For comparison of different pattern of industries’ topic trend, software and hardware industries are compared in recent 20 years. Also, the changes in management subject at the firm level are observed with a comparison of two companies in the software industry. The changes in topic trends provide a lens for identifying decreasing and growing management subjects at industrial and firm-level. Mapping companies and products(or services) based on dimension reduction after using word2vec word embedding model and principal component analysis of 10-K document at the firm level in the software industry, companies and products(services) that have similar management subjects are identified and also their changes in decades.

For suggesting a methodology to develop an analytical model based on public management data at the industrial and corporate level, there may be contributions in terms of making the ground of practical methodology to identifying changes of management subjects. However, there are required further researches to provide a microscopic analytical model with regard to the relation of technology management strategy between management performance in case of related to the various pattern of management topics as of frequent changes of management subject or their momentum. Also, more studies are needed for developing competitive context analysis model with product(service)-portfolios between firms.

Developing Cyber Risk Assessment Framework for Cyber Insurance: A Big Data Approach (KIRI Research Report 2018)

Lee, G. M. (2018). Developing Cyber Risk Assessment Framework for Cyber Insurance: A Big Data Approach (in Korean)KIRI Research Report 2018-15.

As our society is heavily dependent on information and communication technology, the associated risk has also significantly increased. Cyber insurance has been emerged as a possible means to better manage such cyber risk. However, the cyber insurance market is still in a premature stage due to the lack of data sharing and standards on cyber risk and cyber insurance. To address this issue, this research proposes a data-driven framework to assess cyber risk using externally observable cyber attack data sources such as outbound spam and phishing websites. We show that the feasibility of such an approach by building cyber risk assessment reports for Korean organizations. Then, by conducting a large-scale randomized field experiment, we measure the causal effect of cyber risk disclosure on organizational security levels. Finally, we develop machine-learning models to predict data breach incidents, as a case of cyber incidents, using the developed cyber risk assessment data. We believe that the proposed data-driven methods can be a stepping-stone to enable information transparency in the cyber insurance market.

On the Spillover Effects of Online Product Reviews on Purchases: Evidence from Clickstream Data (ISR 2021)

Kwark, Young*, Gene Moo Lee*, Paul A. Pavlou*, Liangfei Qiu* (2021) On the Spillover Effects of Online Product Reviews on Purchases: Evidence from Clickstream Data. Information Systems Research 32(3): 895-913. (* equal contribution)

  • Data awarded by Wharton Consumer Analytics Initiative
  • Presented in WCBI (Snowbird, UT 2015), KMIS (Busan, Korea 2016), Minnesota (2016), ICIS (Dublin, Ireland 2016), Boston Univ. (2017), HEC Paris (2017), and Korea Univ. (2018)
  • An earlier version was published in ICIS 2016
  • Research assistants: Bolat Khojayev, Raymond Situ

We study the spillover effects of the online reviews of other covisited products on the purchases of a focal product using clickstream data from a large retailer. The proposed spillover effects are moderated by (a) whether the related (covisited) products are complementary or substitutive, (b) the choice of media channel (mobile or personal computer (PC)) used, (c) whether the related products are from the same or a different brand, (d) consumer experience, and (e) the variance of the review ratings. To identify complementary and substitutive products, we develop supervised machine-learning models based on product characteristics, such as product category and brand, and novel text-based similarity measures. We train and validate the machine-learning models using product pair labels from Amazon Mechanical Turk. Our results show that the mean rating of substitutive (complementary) products has a negative (positive) effect on purchasing of the focal product. Interestingly, the magnitude of the spillover effects of the mean ratings of covisited (substitutive and complementary) products is significantly larger than the effects on the focal product, especially for complementary products. The spillover effect of ratings is stronger for consumers who use mobile devices versus PCs. We find the negative effect of the mean ratings of substitutive products across different brands on purchasing of a focal product to be significantly higher than within the same brand. Lastly, the effect of the mean ratings is stronger for less experienced consumers and for ratings with lower variance. We discuss implications on leveraging the spillover effect of the online product reviews of related products to encourage online purchases.

AppPrint: Automatic Fingerprinting of Mobile Applications in Network Traffic (PAM 2015)

Miskovic, S., Lee, G. M., Liao, Y., and Baldi, M. (2015). AppPrint: Automatic Fingerprinting of Mobile Applications in Network Traffic, In Proceedings of Passive and Active Measurement Conference (PAM 2015), New York, New York.

  • Based on an industry collaboration with Narus (then Boeing subsidiary, now acquired by Symantec)
  • PAM is a premier conference in the network measurement area (h5-index: 24).

Increased adoption of mobile devices introduces a new spin to the Internet: mobile apps are becoming a key source of user traffic. Surprisingly, service providers and enterprises are largely unprepared for this change as they increasingly lose understanding of their traffic and fail to persistently identify individual apps. App traffic simply appears no different than any other HTTP data exchange. This raises a number of concerns for security and network management. In this paper, we propose AppPrint, a system that learns fingerprints of mobile apps via comprehensive traffic observations. We show that these fingerprints identify apps even in small traffic samples where app identity cannot be explicitly revealed in any individual traffic flows. This unique AppPrint feature is crucial because explicit app identifiers are extremely scarce, leading to a very limited characterization coverage of the existing approaches. In fact, our experiments on a nation-wide dataset from a major cellular provider show that AppPrint significantly outperforms any existing app identification. Moreover, the proposed system is robust to the lack of key app-identification sources, i.e., the traffic related to ads and analytic services commonly leveraged by the state-of-the-art identification methods.

Event Detection using Customer Care Calls (INFOCOM 2013)

Chen, Y., Lee, G. M., Duffield, N., Qiu, L., and Wang, J. (2013). Event Detection using Customer Care Calls. In Proceedings of IEEE International Conference on Computer Communications (INFOCOM 2013), Turin, Italy.

  • Based on an industry collaboration with AT&T Labs – Research.
  • INFOCOM is a top-tier conference in the networking area (h5-index: 72)

Customer care calls serve as a direct channel for a service provider to learn feedbacks from their customers. They reveal details about the nature and impact of major events and problems observed by customers. By analyzing customer care calls, a service provider can detect important events to speed up problem resolution. However, automating event detection based on customer care calls poses several significant challenges. First, the relationship between customers’ calls and network events is blurred because customers respond to an event in different ways. Second, customer care calls can be labeled inconsistently across agents and across call centers, and a given event naturally gives rise to calls spanning a number of categories. Third, many important events cannot be detected by looking at calls in one category. How to aggregate calls from different categories for event detection is important but challenging. Lastly, customer care call records have high dimensions (e.g., thousands of categories in our dataset). In this paper, we propose a systematic method for detecting events in a major cellular network using customer care call data. It consists of three main components: (i) using a regression approach that exploits temporal stability and low-rank properties to automatically learn the relationship between customer calls and major events, (ii) reducing the number of unknowns by clustering call categories and using L 1 norm minimization to identify important categories, and (iii) employing multiple classifiers to enhance the robustness against noise and different response time. For the detected events, we leverage Twitter social media to summarize them and to locate the impacted regions. We show the effectiveness of our approach using data from a large cellular service provider in the US.