We have published a new main conference paper at ACL!
Intro
Social science and psycholinguistic research have shown that power and status affect how people use language in a range of domains. In our recent ACL paper, we investigate a similar question in the Internet Engineering Task Force (IETF). Our analysis, based on lexical categories (LIWC) and BERT, shows that participants’ levels of influence can be predicted from their email text, and identifies key linguistic differences (e.g., certain LIWC categories, such as WE are positively correlated with high-influence). We also identify the differences in language use for the same person before and after becoming influential.
Specifically, we explore the following research questions: RQ1: How do linguistic traits differ between more and less influential participants? RQ2: How do linguistic traits vary for participants at different levels of the organization hierarchy? RQ3: How does linguistic behaviour of participants change as they gain influence?
Methods
To tackle these questions we create three classification / regression problems as follows.All tasks have as input a set of LIWC features calculated on a subset of emails of a participant. The prediction tasks are as follows. RQ1: predict the influence percentile of a person, defined as their centrality score in the email communication network. RQ2: predict whether a person is a working group chair or not. RQ3: predict whether the person’s communication occurred before or after they became influential (as defined in RQ1). The significant (p < 0.05) LIWC features are given in the table below.
Results:
RQ1: The first and third person plural pronoun usage indicates that influential people tend to adopt a collaborative and community-oriented approach. Moreover, they also use more organisational language, indicated by negative correlation with less formal LIWC categories, such as netspeak and sexual.
RQ2: Working group (WG) chairs are more social and collaborative, as is shown by the we and social LIWC categories. They also use more tentative statements in discussions primarily focused on technical feedback and revisions, or suggesting alternatives
RQ3: We observe that, as they gain influence, participants tend to become more descriptive and engaged in the immediate state of issues and situations. They are also more involved in cognitive processes as compared to their previous self when they were new to IETF and had little influence.
Discussion
We further investigated the trends using separate word-level regression models for words in the LIWC categories that were prominent in the experiments. This revealed not all words within the same category have the same trend. For example, the negemo category is more prominent in influential people mostly because they use words like “problems”, even though they do this in a constructive way to provide feedback and point out limitations. We also observed some unexpected trends are related to ambiguous words which have specific meanings in technical discussions (e.g., “live” is often used in “keep connection alive” and “kill” in “kill the process/thread”, which is not typical of standard language use).
Prediction experiments:
Finally, we experimented with large pretrained language models (BERT) to see how their predictive power would compare to the LIWC features. The results, given below, show that the hand-crafted LIWC features are still somewhat better tailored to these prediction tasks