publications | Yuxin (Audrey) Wang

2025

Preprint

Probing Association Biases in LLM Moderation Over-Sensitivity

Yuxin Wang, Botao Yu, Ivory Yang, Saeed Hassanpour, and Soroush Vosoughi

2025 Under review

Abs arXiv

Large Language Models are widely used for content moderation but often misclassify benign comments as toxic, leading to over-sensitivity. While previous research attributes this issue primarily to the presence of offensive terms, we reveal a potential cause beyond token level: LLMs exhibit systematic topic biases in their implicit associations. Inspired by cognitive psychology’s implicit association tests, we introduce Topic Association Analysis, a semantic-level approach to quantify how LLMs associate certain topics with toxicity. By prompting LLMs to generate free-form scenario imagination for misclassified benign comments and analyzing their topic amplification levels, we find that more advanced models (e.g., GPT-4 Turbo) demonstrate stronger topic stereotype despite lower overall false positive rates. These biases suggest that LLMs do not merely react to explicit, offensive language but rely on learned topic associations, shaping their moderation decisions. Our findings highlight the need for refinement beyond keyword-based filtering, providing insights into the underlying mechanisms driving LLM over-sensitivity.
ICML

Backdooring VLMs via Concept-Driven Triggers

Yufan Feng, Weimin Lyu, Yuxin Wang, Benjamin Tan, and Yani Ioannou

In ICML DIG-BUG Workshop 2025

Abs Proceeding

Vision-language models (VLMs) have recently achieved impressive performance, yet their growing complexity raises new security concerns. We introduce the first concept-driven backdoor for instruction-tuned VLMs, leveraging visual concept encoders to stealthily trigger the backdoor at multiple levels of abstraction. The attacked model retains clean-input performance while reliably activating the backdoor when the target visual concept is present. Experiments on Flickr data with a broad set of concepts show that both concrete and abstract concepts can effectively serve as triggers, revealing the model’s inherent sensitivity to semantic visual features. Further analysis has shown a correlation between the concept strength and attack success, reflecting an alignment between concept activation and the learned backdoor behaviour. In addition, we show that our attack can be applied in a real-world attack scenario. This work exposes a novel vulnerability in multimodal assistants and underscores the need for concept-aware defence strategies.
ICLR

ImpScore: A Learnable Metric For Quantifying The Implicitness Level of Sentences

Yuxin Wang, Xiaomeng Zhu, Weimin Lyu, Saeed Hassanpour, and Soroush Vosoughi

In ICLR 2025 (spotlight)

Abs Proceeding arXiv Python Package Code

Handling implicit language is essential for natural language processing systems to achieve precise text understanding and facilitate natural interactions with users. Despite its importance, the absence of a robust metric for accurately measuring the implicitness of language significantly constrains the depth of analysis possible in evaluating models’ comprehension capabilities. This paper addresses this gap by developing a scalar metric that quantifies the implicitness level of language without relying on external references. Drawing on principles from traditional linguistics, we define ”implicitness” as the divergence between semantic meaning and pragmatic interpretation. To operationalize this definition, we introduce ImpScore, a novel, reference-free metric formulated through an interpretable regression model. This model is trained using pairwise contrastive learning on a specially curated dataset comprising 112,580 (implicit sentence, explicit sentence) pairs. We validate ImpScore through a user study that compares its assessments with human evaluations on out-of-distribution data, demonstrating its accuracy and strong correlation with human judgments. Additionally, we apply ImpScore to hate speech detection datasets, illustrating its utility and highlighting significant limitations in current large language models’ ability to understand highly implicit content.
ACL

Visibility as Survival: Generalizing NLP for Native Alaskan Language Identification

Ivory Yang, Chunhui Zhang, Yuxin Wang, Zhongyu Ouyang, and Soroush Vosoughi

In ACL findings 2025

Abs Proceeding

Indigenous languages remain largely invisible in commercial language identification systems—a stark reality exemplified by Google Translate’s LangID tool, which supports 109 languages but excludes all 150 Indigenous languages of North America. This technological marginalization is particularly acute for Alaska’s 20 Native languages, all of which face endangerment despite their rich linguistic heritage. We present GenAlaskan, a framework demonstrating how both large language models and specialized classifiers can effectively identify these languages with minimal data. Working closely with Native Alaskan community members, we created Akutaq-2k, a carefully curated dataset of 2000 sentences spanning all 20 languages—named after the traditional Yup’ik dessert symbolizing the blending of diverse elements. We design few-shot learning on proprietary and open-source LLMs, achieving nearly 100% accuracy with just 40 examples per language. While initial zero-shot attempts showed limited success, our systematic attention head pruning revealed critical architectural components for accurate language differentiation, providing insights into model decision-making for low-resource languages. Our results challenge the notion that effective Indigenous language identification requires massive resources or corporate infrastructure, demonstrating that targeted technological interventions can drive meaningful progress in preserving endangered languages in the digital age.

2024

ACL

MentalManip: A Dataset For Fine-grained Analysis of Mental Manipulation in Conversations

Yuxin Wang, Ivory Yang, Saeed Hassanpour, and Soroush Vosoughi

In ACL 2024 (oral presentation)

Abs Proceeding Dataset Code Poster Website

Mental manipulation, a significant form of abuse in interpersonal conversations, presents a challenge to identify due to its context-dependent and often subtle nature. The detection of manipulative language is essential for protecting potential victims, yet the field of Natural Language Processing (NLP) currently faces a scarcity of resources and research on this topic. Our study addresses this gap by introducing a new dataset, named MentalManip, which consists of 4,000 annotated movie dialogues. This dataset enables a comprehensive analysis of mental manipulation, pinpointing both the techniques utilized for manipulation and the vulnerabilities targeted in victims. Our research further explores the effectiveness of leading-edge models in recognizing manipulative dialogue and its components through a series of experiments with various configurations. The results demonstrate that these models inadequately identify and categorize manipulative content. Attempts to improve their performance by fine-tuning with existing datasets on mental health and toxicity have not overcome these limitations. We anticipate that MentalManip will stimulate further research, leading to progress in both understanding and mitigating the impact of mental manipulation in conversations.
arXiv

A Survey on Game Playing Agents and Large Models: Methods, Applications, and Challenges

Xinrun Xu, Yuxin Wang, Chaoyi Xu, Ziluo Ding, Jiechuan Jiang, Zhiming Ding, and Borje F. Karlsson

arXiv 2024

Abs arXiv Code

The swift evolution of Large-scale Models (LMs), either language-focused or multi-modal, has garnered extensive attention in both academy and industry. But despite the surge in interest in this rapidly evolving area, there are scarce systematic reviews on their capabilities and potential in distinct impactful scenarios. This paper endeavours to help bridge this gap, offering a thorough examination of the current landscape of LM usage in regards to complex game playing scenarios and the challenges still open. Here, we seek to systematically review the existing architectures of LM-based Agents (LMAs) for games and summarize their commonalities, challenges, and any other insights. Furthermore, we present our perspective on promising future research avenues for the advancement of LMs in games. We hope to assist researchers in gaining a clear understanding of the field and to generate more interest in this highly impactful research direction. A corresponding resource, continuously updated, can be found in our GitHub repository.

2023

AAAI

Lifelong Embedding Learning and Transfer for Growing Knowledge Graphs

Yuanning Cui, Yuxin Wang, Zequn Sun, Wenqiang Liu, Yiqiao Jiang, Kexin Han, and Wei Hu

In Association for the Advancement of Artificial Intelligence – AAAI 2023

Abs Proceeding arXiv Code

Existing knowledge graph (KG) embedding models have primarily focused on static KGs. However, real-world KGs do not remain static, but rather evolve and grow in tandem with the development of KG applications. Consequently, new facts and previously unseen entities and relations continually emerge, necessitating an embedding model that can quickly learn and transfer new knowledge through growth. Motivated by this, we delve into an expanding field of KG embedding in this paper, i.e., lifelong KG embedding. We consider knowledge transfer and retention of the learning on growing snapshots of a KG without having to learn embeddings from scratch. The proposed model includes a masked KG autoencoder for embedding learning and update, with an embedding transfer strategy to inject the learned knowledge into the new entity and relation embeddings, and an embedding regularization method to avoid catastrophic forgetting. To investigate the impacts of different aspects of KG growth, we construct four datasets to evaluate the performance of lifelong KG embedding. Experimental results show that the proposed model outperforms the state-of-the-art inductive and lifelong embedding baselines.
Neurocomputing

Open-world Story Generation with Structured Knowledge Enhancement: A Comprehensive Survey

Yuxin Wang, Jieru Lin, Zhiwei Yu, Wei Hu, and Borje F. Karlsson

Neurocomputing 2023

Abs Article arXiv

Storytelling and narrative are fundamental to human experience, intertwined with our social and cultural engagement. As such, researchers have long attempted to create systems that can generate stories automatically. In recent years, powered by deep learning and massive data resources, automatic story generation has shown significant advances. However, considerable challenges, like the need for global coherence in generated stories, still hamper generative models from reaching the same storytelling ability as human narrators. To tackle these challenges, many studies seek to inject structured knowledge into the generation process, which is referred to as structured knowledge-enhanced story generation. Incorporating external knowledge can enhance the logical coherence among story events, achieve better knowledge grounding, and alleviate over-generalization and repetition problems in stories. This survey provides the latest and comprehensive review of this research field: (i) we present a systematic taxonomy regarding how existing methods integrate structured knowledge into story generation; (ii) we summarize involved story corpora, structured knowledge datasets, and evaluation metrics; (iii) we give multidimensional insights into the challenges of knowledge-enhanced story generation and cast light on promising directions for future study.

2022

ISWC

Facing Changes: Continual Entity Alignment for Growing Knowledge Graphs

Yuxin Wang, Yuanning Cui, Wenqiang Liu, Zequn Sun, Yiqiao Jiang, Kexin Han, and Wei Hu

In The Semantic Web – ISWC 2022 (acceptance rate: 17.6%)

Abs Proceeding arXiv Blog Code

Entity alignment is a basic and vital technique in knowledge graph (KG) integration. Over the years, research on entity alignment has resided on the assumption that KGs are static, which neglects the nature of growth of real-world KGs. As KGs grow, previous alignment results face the need to be revisited while new entity alignment waits to be discovered. In this paper, we propose and dive into a realistic yet unexplored setting, referred to as continual entity alignment. To avoid retraining an entire model on the whole KGs whenever new entities and triples come, we present a continual alignment method for this task. It reconstructs an entity’s representation based on entity adjacency, enabling it to generate embeddings for new entities quickly and inductively using their existing neighbors. It selects and replays partial pre-aligned entity pairs to train only parts of KGs while extracting trustworthy alignment for knowledge augmentation. As growing KGs inevitably contain non-matchable entities, different from previous works, the proposed method employs bidirectional nearest neighbor matching to find new entity alignment and update old alignment. Furthermore, we also construct new datasets by simulating the growth of multilingual DBpedia. Extensive experiments demonstrate that our continual alignment method is more effective than baselines based on retraining or inductive learning.
CIKM

Inductive Knowledge Graph Reasoning for Multi-batch Emerging Entities

Yuanning Cui, Yuxin Wang, Zequn Sun, Wenqiang Liu, Yiqiao Jiang, Kexin Han, and Wei Hu

In ACM International Conference on Information & Knowledge Management – CIKM 2022

Abs arXiv Code

Over the years, reasoning over knowledge graphs (KGs), which aims to infer new conclusions from known facts, has mostly focused on static KGs. Due to the unceasing growth of knowledge in real life, there raises the necessity to enable inductive reasoning ability on expanding KGs. Existing inductive work assumes that new entities all emerge once in a batch, which oversimplifies the real scenario that new entities continually appear. This study dives into a more realistic and challenging setting where new entities emerge in multiple batches. We propose a walk-based inductive reasoning model to tackle the new setting. Specifically, a graph convolutional network with adaptive relation aggregation is designed to encode and update entities using their neighboring relations. To capture the varying neighbor importance, we employ a query-aware feedback attention mechanism during the aggregation. Furthermore, to alleviate the sparse link problem of new entities, we propose a link augmentation strategy to add trustworthy facts into KGs. We construct three new datasets for simulating this multi-batch emergence scenario. The experimental results show that our proposed model outperforms state-of-the-art embedding-based, walk-based and rule-based models on inductive KG reasoning.
TKDE

Revisiting Embedding-based Entity Alignment: A Robust and Adaptive Method

Zequn Sun, Wei Hu, Chengming Wang, Yuxin Wang, and Yuzhong Qu

IEEE Transactions on Knowledge and Data Engineering – TKDE 2022

Abs PDF Code

Entity alignment – the discovery of identical entities across different knowledge graphs (KGs) – is a critical task in data fusion. In this paper, we revisit existing entity alignment methods in practical and challenging scenarios. Our empirical studies show that current work has a low level of robustness to long-tail entities and the lack of entity names or relation triples. We aim to develop a robust and adaptive entity alignment method, and the availability of relations, attributes, or names is not required. Our method consists of an attribute encoder and a relation encoder, representing an entity by aggregating its attributes or relational neighbors using the attention mechanisms that can highlight the useful attributes and relations in end-to-end learning. To let the encoders complement each other and produce a coherent representation space, we propose adaptive embedding fusion via a gating mechanism. We consider four evaluation settings, i.e., the conventional setting with both relation and attribute triples, as well as three challenging settings without attributes, without relations, without both relations and names, respectively. Results show that our method can achieve state-of-the-art performance. Even in the most challenging setting without relations and names, our method can still achieve promising results while existing methods fail.