[Read&Explore] Social Media Reveals Urban-Rural Differences in Stree across China

This paper collected approximately 297 million posts on Weibo and analyzed the stress emotion of people across urban and rural China.

Data Collection and Preparation

  1. The authors collected posts on Weibo through Web crawling, same as a prior work.
  2. They collected a total of 297 million posts from 888,000 users.
  3. The users were widly spread across 91% of the counties in China.
  4. The authors compiled a list of 30 Chinese words, such as 鸭梨、负担、就业, as stressors.
  5. The self-reported location on the user profile page was used as the county information.


The authors aggregated all posts of a user and extracted three features:

  1. Words and phrases (1-3 grams)
  2. Topics generated from Latent Dirichlet Allocation (LDA), a famous statistical model.
  3. Psycho-linguistic lexicon - Linguistic Inquiry Word Count (LIWC).

Then, the authors used the following linear model to model the effect of being in urban or rural:

\begin{equation} \text{Feature}_i=\beta_1 * \text{County}_{IsUrban} +\beta_2 * \text {County}_{\text{LogGDP}} +\beta_3 * \text{User}_{\text{Gender}}+\phi+\epsilon \end{equation}

where they controlled for \(\beta_2\) and \(\beta_3\) and investigated the the values of \(\beta_1\) which resulted in different features. They checked the largest top-k positive values and the lowest top-k negative values. The former values resulted in the most prominent features of urban China, while the latter resulted in the most prominent features of rural China.


The difference of the prominent features (words and phrases) in urban and rural China shows that, generally,

  1. Rural Chinese express more personal issues such as relationships and health, while urban Chinese predominantly focus on larger economic and financial issues.
  2. Rural Chinese are more likely to express negative emotions.