Tag Archives: audio

AI Voice in Online Video Platforms: A Multimodal Perspective on Content Creation and Consumption

Zhang, Xiaoke, Mi Zhou, Gene Moo Lee AI Voice in Online Video Platforms: A Multimodal Perspective on Content Creation and Consumption,Working Paper.

  • Previous title: How Does AI-Generated Voice Affect Online Video Creation? Evidence from TikTok
  • Presentations: INFORMS DS (2022), UBC (2022), WITS (2022), Yonsei (2023), POSTECH (2023), ISMS MKSC (2023), CSWIM (2023), KrAIS Summer (2023), Dalhousie (2023), CIST (2023), Temple (2024), Santa Clara U (2024), Wisconsin Milwaukee (2024)
  • Best Student Paper Nomination at CIST 2023; Best Paper Runner-Up Award at KrAIS 2023
  • Media coverage: [UBC News] [Global News]
  • API sponsored by Ensemble Data
  • SSRN version: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4676705

Major user-generated content (UGC) platforms like TikTok have introduced AI-generated voice to assist creators in complex multimodal video creation. AI voice in videos represents a novel form of partial AI assistance, where AI augments one specific modality (audio), whereas creators maintain control over other modalities (text and visuals). This study theorizes and empirically investigates the impacts of AI voice adoption on the creation, content characteristics, and consumption of videos on a video UGC platform. Using a unique dataset of 554,252 TikTok videos, we conduct multimodal analyses to detect AI voice adoption and quantify theoretically important video characteristics in different modalities. Using a stacked difference-in-differences model with propensity score matching, we find that AI voice adoption increases creators’ video production by 21.8%. While reducing audio novelty, it enhances textual and visual novelty by freeing creators’ cognitive resources. Moreover, the heterogeneity analysis reveals that AI voice boosts engagement for less-experienced creators but reduces it for experienced creators and those with established identities. We conduct additional analyses and online randomized experiments to demonstrate two key mechanisms underlying these effects: partial AI process augmentation and partial AI content substitution. This study contributes to the UGC and human-AI collaboration literature and provides practical insights for video creators and UGC platforms.

IS / Marketing Papers on Multimodal Data Analytics (Image, Video, Audio)

Last update: Sep 7, 2023

With the advent of social media and mobile platforms, visual and multimodal data are becoming the first citizen in big data analytics research. Compared to textual data that require significant cognitive efforts to comprehend, visual data (such as images and videos) can easily convey the message from the content creator to the general audience. To conduct large-scale studies on such data types, researchers need to use machine learning and computer vision approaches. In this post, I am trying to organize studies in Information Systems, Marketing, and other management disciplines that leverage large-scale analysis of image and video datasets. The papers are ordered randomly:

  1. Yang, Yi, Yu Qin, Yangyang Fan, Zhongju Zhang (2023). Unlocking the Power of Voice for Financial Risk Prediction: A Theory-Driven Deep Learning Design Approach. MIS Quarterly 47(1): 63-96.
  2. Ceylan, G., Diehl, K., & Proserpio, D. (2023). EXPRESS: Words Meet Photos: When and Why Visual Content Increases Review HelpfulnessJournal of Marketing Research, forthcoming.
  3. Alex Burnap, John R. Hauser, Artem Timoshenko (2023) Product Aesthetic Design: A Machine Learning Augmentation. Marketing Science, forthcoming.
  4. Gao, Jia, Ying Rong, Xin Tian, Yuliang Yao (2023) Improving Convenience or Saving Face? An Empirical Analysis of the Use of Facial Recognition Payment Technology in Retail. Information Systems Research, forthcoming.
  5. Guan, Yue, Yong Tan, Qiang Wei, Guoqing Chen (2023) When Images Backfire: The Effect of Customer-Generated Images on Product Rating Dynamics. Information Systems Research, Forthcoming.
  6. Son, Y., Oh, W., Im, I. (2022) The Voice of Commerce: How Smart Speakers Reshape Digital Content Consumption and Preference. MIS Quarterlyforthcoming.
  7. Hou, J., Zhang, J., & Zhang, K. (2022). Pictures that are Worth a Thousand Donations: How Emotions in Project Images Drive the Success of Crowdfunding Campaigns? An Image Design Perspective. MIS Quarterly, Forthcoming.
  8. Lysyakov, Mikhail, Siva Viswanathan (2022) Threatened by AI: Analyzing Users’ Responses to the Introduction of AI in a Crowd-Sourcing Platform. Information Systems Research, Forthcoming.
  9. Hanwei Li, David Simchi-Levi, Michelle Xiao Wu, Weiming Zhu (2022) Estimating and Exploiting the Impact of Photo Layout: A Structural Approach. Management Science, Forthcoming.
  10. Bharadwaj, N., Ballings, M., Naik, P. A., Moore, M, Arat, M. M. (2022) “A New Livestream Retail Analytics Framework to Assess the Sales Impact of Emotional Displays,” Journal of Marketing, 86(1): 24-47.
  11. Chen, Z., Liu, Y.-J., Meng, J., Wang, Z. (2022) “What’s in a Face? An Experiment on Facial Information and Loan-Approval Decision“, Management Science, forthcoming.
  12. Lu, T., Wang, A., Yuan, X., Zhang, X. (2020) “Visual Distortion Bias in Consumer Choices,” Management Science, forthcoming.
  13. Zhou, M., Chen, G. H., Ferreira, P., Smith, M. D. (2021) “Consumer Behavior in the Online Classroom: Using Video Analytics and Machine Learning to Understand the Consumption of Video Courseware,” Journal of Marketing Research 58(6): 1079-1100.
  14. Zhang, Shunyuan, Dokyun Lee, Param Vir Singh, Kannan Srinivasan (2021) What Makes a Good Image? Airbnb Demand Analytics Leveraging Interpretable Image Features. Management Science 68(8):5644-5666.
  15. Gunarathne, P., Rui, H., Seidmann, A. (2021) “Racial Bias in Customer Service: Evidence from Twitter,” Information Systems Research 33(1): 43-54.
  16. Shin, D., He, S., Lee, G. M., Whinston, A. B., Cetintas, S., Lee, K.-C. (2020) “Enhancing Social Media Analysis with Visual Data Analytics: A Deep Learning Approach,” MIS Quarterly 44(4): 1459-1492[Details]
  17. Li, Y., Xie, Y. (2020) “Is a Picture Worth a Thousand Words? An Empirical Study of Image Content and Social Media Engagement,” Journal of Marketing Research 57(2): 1-19.
  18. Zhang, Q., Wang, W., Chen, Y. (2020) “Frontiers: In-Consumption Social Listening with Moment-to-Moment Unstructured Data: The Case of Movie Appreciation and Live comments,” Marketing Science 39(2).
  19. Liu, L., Dzyabura, D., Mizik, N. (2020) “Visual Listening In: Extracting Brand Image Portrayed on Social Media,Marketing Science 39(4): 669-686.
  20. Peng, L., Cui, G., Chung, Y., Zheng, W. (2020) “The Faces of Success: Beauty and Ugliness Premiums in E-Commerce Platforms,” Journal of Marketing 84(4): 67-85.
  21. Liu, X., Zhang, B., Susarla, A., Padman, R. (2020) “Go to YouTube and Call Me in the Morning: Use of Social Media for Chronic Conditions,” MIS Quarterly 44(1b): 257-283.
  22. Zhao, K., Hu, Y., Hong, Y., Westland, J. C. (2020) “Understanding Characteristics of Popular Streamers in Live Streaming Platforms: Evidence from Twitch.tv,” Journal of the Association for Information Systems, Forthcoming.
  23. Ordenes, F. V., Zhang, S. (2019) “From words to pixels: Text and image mining methods for service research,” Journal of Service Management 30(5): 593-620.
  24. Wang, Q., Li, B., Singh, P. V. (2018) “Copycats vs. Original Mobile Apps: A Machine Learning Copycat-Detection Method and Empirical Analysis,” Information Systems Research 29(2): 273-291.
  25. Lu, S., Xiao, L., Ding, M. (2016) “A Video-Based Automated Recommender (VAR) System for Garments,” Marketing Science 35(3): 484-510.
  26. Xiao, L., Ding, M. (2014) “Just the Faces: Exploring the Effects of Facial Features in Print Advertising,” Marketing Science 33(3), 315-461.
  27. Suh, K.-S., Kim, H., Suh, E. K. (2011) “What If Your Avatar Looks Like You? Dual-Congruity Perspectives for Avatar Use,” MIS Quarterly 35(3), 711-729.
  28. Todorov, A., Porter, J. M. (2014) “Misleading First Impressions: Different for Different Facial Images of the Same Person“, Psychological Science 25(7): 1404-1417.
  29. Todorov, A., Madnisodza, A. N., Goren, A., Hall, C. C. (2005) “Inferences of Competence from Faces Predict Election Outcomes“, Science 308(5728): 1623-1626.
  30. Mueller. U., Mazur, A. (1996) “Facial Dominance of West Point Cadets as a Predictor of Later Military Rank“, Social Forces 74(3): 823-850.
  31. Lee, H, Nam, K. “When Machine Vision Meets Human Fashion: Effects of Human Intervention on the Efficiency of CNN-Driven Recommender Systems in Online Fashion Retail”, Working Paper.
  32. Lysyhakov M, Viswanathan S (2021) “Threatened by AI: Analyzing users’ responses to the introduction of AI in a crowd-sourcing,” Working Paper.
  33. Park, S., Lee, G. M., Shin, D., Han, S.-P. (2020) “Targeting Pre-Roll Ads using Video Analytics,” Working Paper.
  34. Choi, A., Ramaprasad, J., So, H. (2021) Does Authenticity of Influencers Matter? Examining the Impact on Purchase Decisions, Working Paper.
  35. Park, J., Kim, J., Cho, D., Lee, B. Pitching in Character: The Role of Video Pitch’s Personality Style in Online Crowdsourcing, Working Paper.
  36. Yang, J., Zhang, J., Zhang Y. (2021) First Law of Motion: Influencer Video Advertising on TikTok, Working Paper.
  37. Davila, A., Guasch (2021) Manager’s Body Expansiveness, Investor Perceptions, and Firm Forecast Errors and Valuation, Working Paper.
  38. Peng, L., Teoh, S. H., Wang, U., Yan, J. (2021) Face Value: Trait Inference, Performance Characteristics, and Market Outcomes for Financial Analysts, Working Paper.
  39. Zhang, S., Friedman, E., Zhang, X., Srinivasan, K., Dhar, R. (2020) Serving with a Smile on Airbnb: Analyzing the Economic Returns and Behavioral Underpinnings of the Host’s Smile,” Working Paper.
  40. Park, K., Lee, S., Tan, Y. (2020) “What Makes Online Review Videos Helpful? Evidence from Product Review Videos on YouTube,” UW Working Paper.
  41. Doosti, S., Lee, S., Tan, Y. (2020) “Social Media Sponsorship: Metrics for Finding the Right Content Creator-Sponsor Matches,” UW Working Paper.
  42. Koh, B., Cui, F. (2020) “Give a Gist: The Impact of Thumbnails on the View-Through of Videos,” KU Working Paper.
  43. Hou J.R., Zhang J., Zhang K. (2018) Can title images predict the emotions and the performance of crowdfunding projects? Workshop on e-Business.

When Does Congruence Matter for Pre-roll Video Ads? The Effect of Multimodal, Ad-Content Congruence on the Ad Completion

Park, Sungho, Gene Moo Lee, Donghyuk Shin, Sang-Pil Han. “When Does Congruence Matter for Pre-roll Video Ads? The Effect of Multimodal, Ad-Content Congruence on the Ad Completion, Working Paper [Last update: Jan 29, 2023]

  • Previous title: Targeting Pre-Roll Ads using Video Analytics
  • Funded by Sauder Exploratory Research Grant 2020
  • Presented at Southern Methodist University (2020), University of Washington (2020), INFORMS (2020), AIMLBA (2020), WITS (2020), HKUST (2021), Maryland (2021), American University (2021), National University of Singapore (2021), Arizona (2022), George Mason (2022), KAIST (2022), Hanyang (2022), Kyung Hee (2022), McGill (2022)
  • Research assistants: Raymond Situ, Miguel Valarao

Pre-roll video ads are gaining industry traction because the audience may be willing to watch an ad for a few seconds, if not the entire ad, before the desired content video is shown. Conversely, a popular skippable type of pre-roll video ads, which enables viewers to skip an ad in a few seconds, creates opportunity costs for advertisers and online video platforms when the ad is skipped. Against this backdrop, we employ a video analytics framework to extract multimodal features from ad and content videos, including auditory signals and thematic visual information, and probe into the effect of ad-content congruence at each modality using a random matching experiment conducted by a major video advertising platform. The present study challenges the widely held view that ads that match content are more likely to be viewed than those that do not, and investigates the conditions under which congruence may or may not work. Our results indicate that non-thematic auditory signal congruence between the ad and content is essential in explaining viewers’ ad completion, while thematic visual congruence is only effective if the viewer has sufficient attentional and cognitive capacity to recognize such congruence. The findings suggest that thematic videos demand more cognitive processing power than auditory signals for viewers to perceive ad-content congruence, leading to decreased ad viewing. Overall, these findings have significant theoretical and practical implications for understanding whether and when viewers construct congruence in the context of pre-roll video ads and how advertisers might target their pre-roll video ads successfully.