text-to-speech | Gene Moo Lee, Ph.D.

Zhang, Xiaoke, Mi Zhou, Gene Moo Lee “AI Voice in Online Video Platforms: A Multimodal Perspective on Content Creation and Consumption,” Working Paper.

Previous title: How Does AI-Generated Voice Affect Online Video Creation? Evidence from TikTok
Presentations: INFORMS DS (2022), UBC (2022), WITS (2022), Yonsei (2023), POSTECH (2023), ISMS MKSC (2023), CSWIM (2023), KrAIS Summer (2023), Dalhousie (2023), CIST (2023), Temple (2024), Santa Clara U (2024), Wisconsin Milwaukee (2024)
Best Student Paper Nomination at CIST 2023; Best Paper Runner-Up Award at KrAIS Summer Workshop 2023
Media coverage: [UBC News] [Global News]
API sponsored by Ensemble Data
SSRN version: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4676705

Major user-generated content (UGC) platforms like TikTok have introduced AI-generated voice to assist creators in complex multimodal video creation. AI voice in videos represents a novel form of partial AI assistance, where AI augments one specific modality (audio), whereas creators maintain control over other modalities (text and visuals). This study theorizes and empirically investigates the impacts of AI voice adoption on the creation, content characteristics, and consumption of videos on a video UGC platform. Using a unique dataset of 554,252 TikTok videos, we conduct multimodal analyses to detect AI voice adoption and quantify theoretically important video characteristics in different modalities. Using a stacked difference-in-differences model with propensity score matching, we find that AI voice adoption increases creators’ video production by 21.8%. While reducing audio novelty, it enhances textual and visual novelty by freeing creators’ cognitive resources. Moreover, the heterogeneity analysis reveals that AI voice boosts engagement for less-experienced creators but reduces it for experienced creators and those with established identities. We conduct additional analyses and online randomized experiments to demonstrate two key mechanisms underlying these effects: partial AI process augmentation and partial AI content substitution. This study contributes to the UGC and human-AI collaboration literature and provides practical insights for video creators and UGC platforms.

Gene Moo Lee, Ph.D.

Associate Professor of Information Systems, UBC Sauder School of Business

Tag Archives: text-to-speech