MentalManip: A Benchmark for Fine-grained Analysis of Mental Manipulation

Dartmouth
ACL 2024

MentalManip is a dataset curated for the development and assessment of NLP models aimed at detecting and analyzing mental manipulation in dialogues:

  • 4,000 annotated fictional dialogues extracted from the Cornell Movie Dialogs Corpus
  • Designed to reflect diverse and practical use cases of mental manipulation
  • Provides a realistic datset with real-world influenced scenarios
  • Tests the detection ability of models across different types of manipulative conversations.
illustration

An example dialogue that contains elements of mental manipulation which GPT-4 fails to identify. The manipulative parts are highlighted in red.

Definition of Mental Manipulation

Using language to influence, alter, or control an individual's psychological state or perception for the manipulator's benefit.

Abstract

Mental manipulation, a significant form of abuse in interpersonal conversations, presents a challenge to identify due to its context-dependent and often subtle nature. The detection of manipulative language is essential for protecting potential victims, yet the field of Natural Language Processing (NLP) currently faces a scarcity of resources and research on this topic. Our study addresses this gap by introducing a new dataset, named MentalManip, which consists of 4,000 annotated movie dialogues. This dataset enables a comprehensive analysis of mental manipulation, pinpointing both the techniques utilized for manipulation and the vulnerabilities targeted in victims. Our research further explores the effectiveness of leading-edge models in recognizing manipulative dialogue and its components through a series of experiments with various configurations. The results demonstrate that these models inadequately identify and categorize manipulative content. Attempts to improve their performance by fine-tuning with existing datasets on mental health and toxicity have not overcome these limitations. We anticipate that MentalManip will stimulate further research, leading to progress in both understanding and mitigating the impact of mental manipulation in conversations.

Dataset Taxonomy & Statistics

MentalManip contains 4,000 multi-turn fictional dialogues between two characters extracted from online movie scripts. To enable fine-grained analysis, our Labeling Taxonomy covers three dimensions:

  • Presence of Manipulation if a dialogue contains elements of mental manipulation (binary category)
  • Manipulation Technique the manipulation techniques used by the manipulator (multi-label category)
  • Targeted Vulnerability the vulnerabilities of victims exploited by the manipulator (multi-label category)
Persuasion Taxonomy
Labeling Taxonomy

Statistics
We have two versions of MentalManip regarding the criteria used for generating the gold labels:

  • MentalManipcon contains only dialogues with the same annotation results. The gold labels are the accordant annotation results.
  • MentalManipmaj contains all dialogue samples. The gold labels are the majority of the annotation results.


statistics
Statistics of Two Versions of MentalManip Dataset

Experiments on Detection and Classification Tasks


Tabular Results

Visualized Results

Poster

BibTeX


        @article{wang2024mentalmanip,
        title={MentalManip: A Dataset For Fine-grained Analysis of Mental Manipulation in Conversations},
        author={Wang, Yuxin and Yang, Ivory and Hassanpour, Saeed and Vosoughi, Soroush},
        journal={arXiv preprint arXiv:2405.16584},
        year={2024}
        }