Author Archives: gene lee

Toward a Better Measure of Business Proximity: Topic Modeling for Industry Intelligence (MISQ 2016)

Shi, Zhan, Gene Moo Lee, Andrew B. Whinston (2016) Toward a Better Measure of Business Proximity: Topic Modeling for Industry Intelligence. MIS Quarterly 40(4), pp. 1035-1056.

Business proximity demo site: https://misr.sauder.ubc.ca/bizprox/
Media coverage: [Huffington Post] [ACM TechNews] [UTA Inquiry] [W.P. Carey KnowIT]
Presented in ACM EC (Stanford, CA 2014), MISQ Workshop (Lueven, Belgium 2015), KOCSEA (Vienna, VA 2016), UBC (2016), UT Arlington (2017), Rutgers (2018)
Dissertation Paper #1
Slides

In this article, we propose a new data-analytic approach to measure firms’ dyadic business proximity. Specifically, our method analyzes the unstructured texts that describe firms’ businesses using the statistical learning technique of topic modeling, and constructs a novel business proximity measure based on the output. When compared with existent methods, our approach is scalable for large datasets and provides finer granularity on quantifying firms’ positions in the spaces of product, market, and technology. We then validate our business proximity measure in the context of industry intelligence and show the measure’s effectiveness in an empirical application of analyzing mergers and acquisitions in the U.S. high technology industry. Based on the research, we also build a cloud-based information system to facilitate competitive intelligence on the high technology industry.

The Spillover Effects of User-Generated Online Product Reviews on Purchases: Evidence from Clickstream Data (ICIS 2016)

Kwark, Y., Lee, G. M., Pavlou, P. A., Qiu, L. (2016). The Spillover Effects of User-Generated Online Product Reviews on Purchases: Evidence from Clickstream Data, Proceedings of International Conference on Information Systems (ICIS 2016), Dublin, Ireland.

We analyze the spillover effect of online product reviews on purchases using clickstream data from a large retailer by investigating (a) whether the products are complementary/substitutive; (b) whether the products are from the same or a different brand, and (c) which media channel (mobile or PC) is used. To identify complementary/substitutive products, we used a text-mining approach of topic modeling on product descriptions to quantify the functional similarity of pairwise products. Our empirical analysis shows that the mean rating of online reviews of substitutive products has a negative role in purchasing, while the rating of complementary products has a positive role. Also, we find the negative spillover effect among substitutive products of different brands to be significantly greater than those of the same brand and for consumers who used mobile devices versus traditional PCs. Our study has implications on leveraging the spillover effect of online product reviews on substitutive/complementary products.

Strategic Network Formation in a Location-Based Social Network: A Topic Modeling Approach (HICSS 2016)

Lee, G. M., Qiu, L., Whinston, A. B. (2016). Strategic Network Formation in a Location-Based Social Network: A Topic Modeling Approach, Proceedings of Hawaii International Conference on System Sciences (HICSS 2016), Kauai, Hawaii. Nominated for Best Paper Award

This paper studies strategic network formation in a location-based social network. We build a structural model of social link creation that incorporates individual characteristics and pairwise user similarities. Specifically, we define four user proximity measures from biography, geography, mobility, and short messages. To construct proximity from unstructured text information, we build topic models using latent Dirichlet allocation. Using Gowalla data with 385,306 users, three million locations, and 35 million check-in records, we empirically estimate the structural model to find evidence on the homophily effect in network formation.

Link Formation in Mobile and Economic Networks: Model and Empirical Analysis (Ph.D. Dissertation 2015)

Gene Moo Lee (2015). Link Formation in Mobile and Economic Networks: Model and Empirical Analysis. UT Austin Ph.D. Dissertation, Austin, TX, August 2015.

In this dissertation, we study three link formation problems in mobile and economic networks: (i) company matching for mergers and acquisitions (M&A) network in the high-technology (high-tech) industry, (ii) mobile application (app) matching for cross-promotion network in mobile app markets, and (iii) online friendship formation in mobile social networks. Each problem can be modeled as link formation problem in a graph, where nodes represent independent entities (e.g., companies, apps, users) and edges represent interactions (e.g., transactions, promotions, friendships) among the nodes.

First, we propose a new data-analytic approach to measure firms’ dyadic business proximity to analyze M&A network in the high-tech industry. Specifically, our method analyzes the unstructured texts that describe firms’ businesses using latent Dirichlet allocation (LDA) topic modeling, and constructs a novel business proximity measure based on the output. Using CrunchBase data including 24,382 high-tech companies and 1,689 M&A transactions, we empirically validate our business proximity measure in the context of industry intelligence and show the measure’s effectiveness in an application of M&A network analysis. Based on the research, we build a cloud-based information system to facilitate competitive intelligence on the high-tech industry.

Second, we analyze mobile app matching for cross promotion network in mobile app markets. Cross promotion (CP) is a new app promotion framework, in which a mobile app is promoted to the users of another app. Using IGAWorks data covering 1,011 CP campaigns, 325 apps, and 301,183 users, we evaluate the effectiveness of CP campaigns in comparison with existing ad channels such as mobile display ads. While CP campaigns, on average, are still suboptimal as compared with display ads, we find evidence that a careful matching of mobile apps can significantly improve the effectiveness of CP campaigns. Our empirical results show that app similarity, measured by LDA from apps’ text descriptions, is a significant factor that increases the user engagement in CP campaigns. With this observation, we propose an app matching mechanism for the CP network to improve the ad effectiveness.

Third, we study friendship network formation in a location-based social network. We build a structural model of social link creation that incorporates individual characteristics and pairwise user similarities. Specifically, we define four user proximity measures from biography, geography, mobility, and short messages (i.e., tweets). To construct proximity from unstructured text information, we build LDA topic models of user biography texts and tweets. Using Gowalla data with 385,306 users, three million locations, and 35 million check-in records, we empirically estimate the structural model to find evidence on the homophily effect in network formation.

AppPrint: Automatic Fingerprinting of Mobile Applications in Network Traffic (PAM 2015)

Miskovic, S., Lee, G. M., Liao, Y., and Baldi, M. (2015). AppPrint: Automatic Fingerprinting of Mobile Applications in Network Traffic, In Proceedings of Passive and Active Measurement Conference (PAM 2015), New York, New York.

Based on an industry collaboration with Narus (then Boeing subsidiary, now acquired by Symantec)
PAM is a premier conference in the network measurement area (h5-index: 24).

Increased adoption of mobile devices introduces a new spin to the Internet: mobile apps are becoming a key source of user traffic. Surprisingly, service providers and enterprises are largely unprepared for this change as they increasingly lose understanding of their traffic and fail to persistently identify individual apps. App traffic simply appears no different than any other HTTP data exchange. This raises a number of concerns for security and network management. In this paper, we propose AppPrint, a system that learns fingerprints of mobile apps via comprehensive traffic observations. We show that these fingerprints identify apps even in small traffic samples where app identity cannot be explicitly revealed in any individual traffic flows. This unique AppPrint feature is crucial because explicit app identifiers are extremely scarce, leading to a very limited characterization coverage of the existing approaches. In fact, our experiments on a nation-wide dataset from a major cellular provider show that AppPrint significantly outperforms any existing app identification. Moreover, the proposed system is robust to the lack of key app-identification sources, i.e., the traffic related to ads and analytic services commonly leveraged by the state-of-the-art identification methods.

Towards a Better Measure of Business Proximity: Topic Modeling for Analyzing M&As (EC 2014)

Shi, Z., Lee, G. M., Whinston, A. B. (2014). Towards a Better Measure of Business Proximity: Topic Modeling for Analyzing M&As, Proceedings of ACM Conference on Economics and Computation (EC 2014), Palo Alto, California

Event Detection using Customer Care Calls (INFOCOM 2013)

Chen, Y., Lee, G. M., Duffield, N., Qiu, L., and Wang, J. (2013). Event Detection using Customer Care Calls. In Proceedings of IEEE International Conference on Computer Communications (INFOCOM 2013), Turin, Italy.

Based on an industry collaboration with AT&T Labs – Research.
INFOCOM is a top-tier conference in the networking area (h5-index: 72)

Customer care calls serve as a direct channel for a service provider to learn feedbacks from their customers. They reveal details about the nature and impact of major events and problems observed by customers. By analyzing customer care calls, a service provider can detect important events to speed up problem resolution. However, automating event detection based on customer care calls poses several significant challenges. First, the relationship between customers’ calls and network events is blurred because customers respond to an event in different ways. Second, customer care calls can be labeled inconsistently across agents and across call centers, and a given event naturally gives rise to calls spanning a number of categories. Third, many important events cannot be detected by looking at calls in one category. How to aggregate calls from different categories for event detection is important but challenging. Lastly, customer care call records have high dimensions (e.g., thousands of categories in our dataset). In this paper, we propose a systematic method for detecting events in a major cellular network using customer care call data. It consists of three main components: (i) using a regression approach that exploits temporal stability and low-rank properties to automatically learn the relationship between customer calls and major events, (ii) reducing the number of unknowns by clustering call categories and using L ₁ norm minimization to identify important categories, and (iii) employing multiple classifiers to enhance the robustness against noise and different response time. For the detected events, we leverage Twitter social media to summarize them and to locate the impacted regions. We show the effectiveness of our approach using data from a large cellular service provider in the US.

Improving the Interaction between Overlay Routing and Traffic Engineering (Networking 2008)

Lee, G. M., and Choi, T. (2008). Improving the Interaction between Overlay Routing and Traffic Engineering, In Proceedings of IFIP Networking Conference (Networking 2008), Singapore.

Networking is a premier conference in the networking area (h5-index: 23)

Overlay routing has been successful as an incremental method to improve Internet routing by allowing its own users to select their logical routing. In the meantime, traffic engineering (TE) is being used to reduce the whole network cost by adapting physical routing in response to varying traffic patterns. Previous studies [1,2] have shown that the interaction of the two network components can cause huge network cost increases and oscillations. In this paper, we improve the interaction between overlay routing and TE by modifying the objectives of both parties. For the overlay part, we propose TE-awareness which limits the selfishness by some bounds so that the action of overlay does not offensively affect TE’s optimization process. Then, we suggest COPE [3] as a strong candidate that achieves close-to-optimal performance for predicted traffic matrices and that handles unpredictable overlay traffic efficiently. With extensive simulation results, we show the proposed methods can significantly improve the interaction with lower network cost and smaller oscillation problems.

Designing an Incentive-Based Framework for Overlay Routing (Technical Report 2007)

Lee, G. M., Choi, T., and Zhang, Y. (2007). Designing an Incentive-Based Framework for Overlay Routing. UTCS Technical Report, January 2007.

Overlay routing becomes popular as an incremental mechanism to improve internet routing. So far, overlay nodes are always assumed to cooperate with each other. In this paper, we analyze overlay routing in a new viewpoint, in which the overlay nodes act independently to maximize their own payoff. We use a game-theoretic approach to analyze the transit traffic forwarding and realize that overlay nodes are not likely to cooperate with each other in our new scenario.

In order to stimulate the independent overlay nodes to cooperate with each other, we design and propose an incentive-based framework. We introduce three possible systems and evaluate them analytically. Among the candidates, we use simulation to verify the feasibility of our proposed framework generalized punish-and-reward system. The performance gets closer to social optimum as we increase the number of punishments. In addition, the system shows tolerance against impatient players.

Gene Moo Lee, Ph.D.

Associate Professor of Information Systems, UBC Sauder School of Business

Author Archives: gene lee

Toward a Better Measure of Business Proximity: Topic Modeling for Industry Intelligence (MISQ 2016)

Shi, Zhan, Gene Moo Lee, Andrew B. Whinston (2016) Toward a Better Measure of Business Proximity: Topic Modeling for Industry Intelligence. MIS Quarterly 40(4), pp. 1035-1056.

The Spillover Effects of User-Generated Online Product Reviews on Purchases: Evidence from Clickstream Data (ICIS 2016)

Kwark, Y., Lee, G. M., Pavlou, P. A., Qiu, L. (2016). The Spillover Effects of User-Generated Online Product Reviews on Purchases: Evidence from Clickstream Data, Proceedings of International Conference on Information Systems (ICIS 2016), Dublin, Ireland.

AppPrint: Automatic Fingerprinting of Mobile Applications in Network Traffic (PAM 2015)

Miskovic, S., Lee, G. M., Liao, Y., and Baldi, M. (2015). AppPrint: Automatic Fingerprinting of Mobile Applications in Network Traffic, In Proceedings of Passive and Active Measurement Conference (PAM 2015), New York, New York.

Towards a Better Measure of Business Proximity: Topic Modeling for Analyzing M&As (EC 2014)

Shi, Z., Lee, G. M., Whinston, A. B. (2014). Towards a Better Measure of Business Proximity: Topic Modeling for Analyzing M&As, Proceedings of ACM Conference on Economics and Computation (EC 2014), Palo Alto, California

Event Detection using Customer Care Calls (INFOCOM 2013)

Chen, Y., Lee, G. M., Duffield, N., Qiu, L., and Wang, J. (2013). Event Detection using Customer Care Calls. In Proceedings of IEEE International Conference on Computer Communications (INFOCOM 2013), Turin, Italy.

Improving the Interaction between Overlay Routing and Traffic Engineering (Networking 2008)

Lee, G. M., and Choi, T. (2008). Improving the Interaction between Overlay Routing and Traffic Engineering, In Proceedings of IFIP Networking Conference (Networking 2008), Singapore.

Designing an Incentive-Based Framework for Overlay Routing (Technical Report 2007)

Lee, G. M., Choi, T., and Zhang, Y. (2007). Designing an Incentive-Based Framework for Overlay Routing. UTCS Technical Report, January 2007.

Shi, Zhan, Gene Moo Lee, Andrew B. Whinston (2016) Toward a Better Measure of Business Proximity: Topic Modeling for Industry Intelligence. MIS Quarterly 40(4), pp. 1035-1056.

Kwark, Y., Lee, G. M., Pavlou, P. A., Qiu, L. (2016). The Spillover Effects of User-Generated Online Product Reviews on Purchases: Evidence from Clickstream Data, Proceedings of International Conference on Information Systems (ICIS 2016), Dublin, Ireland.

Lee, G. M., Qiu, L., Whinston, A. B. (2016). Strategic Network Formation in a Location-Based Social Network: A Topic Modeling Approach, Proceedings of Hawaii International Conference on System Sciences (HICSS 2016), Kauai, Hawaii. Nominated for Best Paper Award

Gene Moo Lee (2015). Link Formation in Mobile and Economic Networks: Model and Empirical Analysis. UT Austin Ph.D. Dissertation, Austin, TX, August 2015.

Miskovic, S., Lee, G. M., Liao, Y., and Baldi, M. (2015). AppPrint: Automatic Fingerprinting of Mobile Applications in Network Traffic, In Proceedings of Passive and Active Measurement Conference (PAM 2015), New York, New York.

Shi, Z., Lee, G. M., Whinston, A. B. (2014). Towards a Better Measure of Business Proximity: Topic Modeling for Analyzing M&As, Proceedings of ACM Conference on Economics and Computation (EC 2014), Palo Alto, California

Lee, G. M., Rallapalli, S., Dong, W., Chen, Y., Qiu, L., and Zhang, Y. (2013). Mobile Video Delivery via Human Movement. In Proceedings of IEEE Conference on Sensor, Mesh, and Ad Hoc Communications and Networks (SECON 2013), New Orleans, Louisiana.

Chen, Y., Lee, G. M., Duffield, N., Qiu, L., and Wang, J. (2013). Event Detection using Customer Care Calls. In Proceedings of IEEE International Conference on Computer Communications (INFOCOM 2013), Turin, Italy.

Lee, G. M., and Choi, T. (2008). Improving the Interaction between Overlay Routing and Traffic Engineering, In Proceedings of IFIP Networking Conference (Networking 2008), Singapore.

Lee, G. M., Choi, T., and Zhang, Y. (2007). Designing an Incentive-Based Framework for Overlay Routing. UTCS Technical Report, January 2007.