Wenjun Zhou — Amazon Distinguished Professor, Martin Lee & Carol Fri Robinson Faculty Fellow

Select Working Papers

When Large Language Models Meet Social Sciences: A Survey

The rapid advancement of Large Language Models (LLMs) has opened up unprecedented opportunities for research in the social sciences. With their ability to process, generate, and understand complex data at scale, LLMs align naturally with the needs of social sciences, which often require the collection, analysis, and interpretation of unstructured, human-centric data. LLMs are transforming how social scientists approach research by enabling the exploration of previously intractable datasets, facilitating the simulation of social interactions, and uncovering nuanced patterns in human behavior and communication. This survey offers a comprehensive interdisciplinary overview of the burgeoning intersection between LLMs and social science research. We explore how LLMs are transforming methodologies and enabling novel research paradigms in various social science disciplines. To bring clarity to this rapidly growing and often fragmented area, we introduce a dual-perspective taxonomy based on methodological and disciplinary dimensions. We begin by reviewing representative applications within the selected social science domains to highlight common research paradigms, discuss key challenges, and chart future directions for research. We then summarize contributions of the social sciences to the development of LLMs, especially in providing benchmark datasets and improved models that are more socially grounded, ethically responsible, domain relevant, and methodologically robust. Ultimately, our aim is to provide a foundational guide for both newcomers and seasoned researchers eager to navigate and contribute to this dynamic interdisciplinary field of LLM-driven social science. Keywords: Artificial Intelligence, AI and Society, AI Agents, Sociology, Political Sciences, Law, Economics
Writing Quality and Soft Information in the GenAI Age: Evidence from Online Credit Markets

We document systematically how various dimensions of writing quality matter for different borrower groups on a dominant online credit platform, and would be reshaped by the use of large language models (LLMs). Using human assessments, we find that LLMs (e.g., ChatGPT) significantly enhance the writing and perceived quality of loan applications. We further build proprietary BERT-based multimodal models of lenders' decision-making that accommodate lenders' constraints, and introduce "deep" Heckman correction for sample selection bias. We demonstrate that ChatGPT adoption decreases soft information conveyed and increases credit misallocation, likely due to convergence in writing; when lenders respond to borrowers' LLM adoption, they rely more on hard information, mitigating misallocation. We leverage generative modeling to characterize the counterfactual equilibrium with endogenous borrowers' LLM adoption and lenders' responses. Our findings provide insights into the evolving role of soft information and potential impacts of GenAI on lending. Keywords: Generative AI, Heckman Correction, Lending, LLMs, Misallocation, Multimodal Data
Beyond Predictive Accuracy: Establishing Validity for AI-Extracted Consumer Psychological Constructs

The abundance of unstructured consumer data makes Generative Artificial Intelligence (AI) an attractive methodological tool for consumer research, yet its validity for Natural Language Processing (NLP) applications remains insufficiently examined. This paper addresses this critical gap by examining contemporary NLP methods: AI, Deep Learning (DL), and Machine Learning (ML), through three interconnected studies. In Part 1, we juxtapose findings derived from a multi-method study encompassing experimental and field data (analyzed using NLP methods). Despite striking similarities in the results, we identify potential validity threats stemming from method bias and conceptual overlaps in NLP-extracted constructs. Part 2 applies construct validity assessment to compare these NLP methods. The results reveal significant challenges in achieving convergent and discriminant validity when measuring consumer psychological constructs. In Part 3, we introduce a novel directed-response data collection procedure that substantially reduces measurement error while enhancing construct validity. We demonstrate this approach by measuring the variables in Part 1 through five distinct methods (self-report, expert ratings, AI, DL, and ML), establishing a more robust methodological framework for consumer behavior research employing NLP techniques. Our findings highlight the necessity of methodological caution when implementing NLP tools for consumer research, while providing a practical pathway for their valid application. Keywords: Artificial Intelligence, Construct Validity, Natural Language Processing, Research Methodology, Consumer Behavior, Multi-method Research, Large Language Models
Non-Fungible Token Recommendation: An Intent-Aware Representation Learning Approach

The rapid growth of interest in non-fungible tokens (NFTs) has generated increasing demand for effective asset discovery systems in NFT marketplaces. Enhancing asset discovery can help improve user experience, stimulate market participation, and ultimately strengthen the NFT ecosystem. Unlike conventional goods, NFTs are unique digital assets whose value is primarily driven by social consensus rather than intrinsic attributes. As a result, user engagement with NFTs is governed by strategic intents, leading to intent-dependent relational patterns among NFTs that evolve within individual users over time and remain consistent across users with shared intents. Traditional recommendation approaches, which rely on co-occurrence statistics or intrinsic attribute matching, fail to capture this intent-driven complexity. We propose an Intent-Aware Representation Learning (IARL) model for NFT recommendation. IARL models relationships among NFTs under distinct latent intents, capturing both their temporal evolution within individual users and their consistency across users. This intent-aware relational modeling aligns with real-world purchasing dynamics and enables more accurate user preference representations. Extensive experiments on a real-world, large-scale NFT transactions dataset demonstrate IARL's significant performance enhancement over state-of-the-art baseline methods, validating its effectiveness in the NFT recommendation scenario. Keywords: Blockchain Technology, NFTs, Recommender Systems, Sequential Recommendation, Hypergraph Neural Networks, Purchase Intent, Disentangled Representation Learning
Stockout Contagion Effect in a Grocery Retail Setting

Stockouts are typically studied as endpoint outcomes that affect demand, yet little is known about whether stockout risk itself propagates across interconnected retail units. This study investigates stockout contagion in grocery retail by combining (i) a hidden Markov model to infer weekly stockout probabilities from point-of-sale data and (ii) dynamic spatial panel models to quantify propagation across economically connected products and stores. The framework addresses two key empirical challenges: stockouts are unobserved in transactional data, and correlated stockout realizations may reflect shared shocks rather than structured dependence. Conditioning on promotional activity, supplier disruption indicators, demand controls, fixed effects, and explicitly defined networks, we isolate structured propagation patterns that persist beyond shared exposure. Using IRI scanner and consumer panel data, we document economically meaningful stockout contagion both within stores across substitute products and across stores for identical products. Within-store propagation is strongest when substitution networks are defined broadly at the category level rather than by a single closest substitute. Across stores, contagion is widespread under diffuse networks and remains strong when links are restricted to behaviorally proximate locations, while chain-based operational linkages generate more heterogeneous effects. Decomposition of spatial impacts indicates that stockout risk spreads primarily through network connections across stores and products, rather than remaining localized. On average, a one-percentage-point increase in the stockout probability of a focal SKU raises the stockout probability of connected products or stores by approximately 0.06 to 0.30 percentage points, with substantial heterogeneity across categories. Categories characterized by longer stockout duration exhibit stronger propagation, indicating that persistence amplifies contagion. These findings suggest that availability risk in retail operates as a dynamic networked process rather than as isolated SKU-store events, providing a scalable empirical framework for identifying and managing systemic stockout risk using standard transactional data. Keywords: Retail Operations, On-shelf Stockouts, Contagion Effect, Empirical Study
Inclusionwashing and Employee Turnover

This study examines how firms' emphasis on inclusion within their corporate social responsibility (CSR) initiatives influences employee turnover, with particular attention to inclusionwashing—the discrepancy between a firm's publicly communicated commitment to inclusion and employees' perceived inclusiveness of their workplace. We contribute to the CSR literature by demonstrating the workforce consequences of misaligned inclusion messaging. Drawing on three independent datasets, we combine firm-level inclusion disclosures with employee-level sentiment data to construct an Inclusion Perception Gap (IPG), which serves as a measure of inclusionwashing. We then assess the relationship between the IPG and employee turnover. Our analyses show that firms with larger IPGs experience significantly higher turnover, with especially pronounced effects among minority employees. We further corroborate these findings through an online behavioral experiment conducted on Prolific. The findings suggest that, despite outward signaling of commitments to inclusion, firms suffer internal consequences when employee experiences diverge from external narratives. Firms that overstate their inclusiveness risk higher employee turnover. For researchers, we provide a framework to study inclusionwashing and its workforce consequences while validating Glassdoor as a reliable source of employee sentiment data. Keywords: Inclusionwashing, Corporate Social Responsibility (CSR), Employee Turnover, Behavioral Experiments, Econometrics, Multi-Method

Select Working Papers

When Large Language Models Meet Social Sciences: A Survey

Writing Quality and Soft Information in the GenAI Age: Evidence from Online Credit Markets

Beyond Predictive Accuracy: Establishing Validity for AI-Extracted Consumer Psychological Constructs

Non-Fungible Token Recommendation: An Intent-Aware Representation Learning Approach

Stockout Contagion Effect in a Grocery Retail Setting

Inclusionwashing and Employee Turnover