sodestream

sodestream at IETF 115

by Stephen McQuistin • Tuesday 29 November 2022 • Permalink

IETF 115 was held in London from 5th November through 11th November 2022, with members of the "Streamlining Social Decision Making for Improved Internet Standards" (sodestream) project taking part in activities throughout the meeting.

Hackathon

To start the week, project members, colleagues, and students from the University of Glasgow (Elizabeth Boswell, Stephen McQuistin, Mairi Sillars Moya, and Ivan Nikitin) and Queen Mary University of London (Ignacio Castro, Mladen Karan, Prashant Khare, Hugo Ramirez, and Ravi Shekhar) took part in the IETF Hackathon.

sodestream group at the IETF 115 Hackathon
sodestream group at the IETF 115 Hackathon

The focus of our work at the Hackathon was on analysing the sentiment of postings to the ietf@ietf.org mailing list. This mailing list provides a forum for the broad discussion of IETF-related topics. Sentiment analysis techniques could be useful in characterising the tone, and levels of toxicity, in the interactions that take place on this, and other, IETF mailing lists. With historical data, these trends can be tracked over time, providing insight into how the IETF community is evolving. The team managed to generate a dataset of sentiment scores, passing e-mail through the VADER library to understand if messages are broadly positive, negative, or neutral. In addition, they started to plot broad trends over time, and for individuals, and sketched out improvements to tooling documentation and packaging.

Distribution of VADER sentiment scores for all ietf@ietf.org postings
Distribution of VADER sentiment scores for all ietf@ietf.org postings
Distributions of sentiment scores for all ietf@ietf.org postings, within each year
Distributions of sentiment scores for all ietf@ietf.org postings, within each year

There initial evidence of some interesting trends: relatively low levels of negativity, with more negativity on found on postings made on weekends and via person e-mail addresses, and relatively more positivity on Monday's. Broadly, however, the team found that sentiment analysis over technical text is difficult. For example, phrases like "dropped packets", "killed process", and "abort transmission" are neutral technical phrases that are scored negatively by the sentiment analysis library. It is essential to build up a lexicon of technical phrases to avoid misclassification.

Presentations

Ignacio Castro presenting a summary of our findings so far at IETF 115
Ignacio Castro presenting a summary of our findings so far at IETF 115

Ignacio Castro presented a summary of our recent work to meetings of the Internet Engineering Steering Group (slides), the Working Group Chair Forum (slides), and the Measurement and Analysis of Protocols Research Group (slides, recording). These presentations highlighted our findings, including that conversations seem to be getting more complex, publishing is harder, and that the relevance of a minority of influential participants is growing. Our findings suggest that these are interconnected.

Proposed Research and Analysis of Standardisation Processes Research Group (RASP RG)

Finally, the team contributed to a meeting to discuss a proposal to form a Research and Analysis of Standardisation Processes Research Group within the IRTF. The meeting, led by Ignacio Castro and Niels ten Oever, highlighted the broad themes that relate to building an understanding of standardisation processes. These include understanding IP disclosure rules, barriers to participation, leadership dynamics, and decision making processes. The aim of the proposed group would be to improve our understanding of the development of standards-setting organisations, bringing together a community of researchers, practitioners, and standards developers and users. These themes and objectives align well with the core motivation of the sodestream project, and we look forward to contributing to the development of the group.


Impact of early engagement on longevity of IETF participation

by Prashant Khare • Friday 02 April 2021 • Permalink

Introduction

What factors influence an Internet Engineering Task Force (IETF) participant to remain engaged with the IETF community for a long time? Are there early signs that can indicate whether a new participant will go on to stay engaged with IETF activities for a long time or are they likely to get disengaged?

In our recent work, we have attempted to find answers to these questions. Nearly 4000 participants per year were found to actively participate across various mailing lists between 2000-2004. During this period nearly 2000 new participants per year joined various mailing lists. By the beginning of the next decade (2011-), the number of active participants per year (across mailing lists) were still close to 4000. While this could reflect an entirely new generation of IETF participants across different time periods or a combination of some old participants still active along with the new participants, the reality is that a significant proportion of new people who joined over the years eventually became inactive or disengaged with the IETF community, at least on the mailing lists.

So we decided to look at the interaction behaviour of the new joiners over the years. Without giving too much away, we find out that engaging people early in their life span may be the most effective tool to keep participants engaged. We notice a clear difference in the early age interaction activities of those who go on to stay for a long time in comparison to those who leave early.

Methodology

We take over 1100 IETF mailing lists archives (all the mailing lists available at the time of download) and, for each user, determine their activity period (also referred to as ‘age’). This is the time period between their first and last email exchange. Note that we identify users with multiple accounts and create unique identifiers, resulting in more than 200,000 identifiers.

The first thing we notice is that the data is dominated by users who only send one or two emails (covering around 83% of all users). The one-time emails can often vary between spam emails, introduction/greet emails, etc. To avoid skewing the results, we therefore apply a filter to consider only users who have sent at least three messages. It is important to note that the remaining 17% of the accounts contributed towards over 90% of the total volume of emails (over 2.8 million emails).

Analysis

Question: How many years do IETF participants generally remain active, as they join over the years?

To begin with, we are curious about the age (years of remaining active) that IETF participants tend to acquire. Figure 1 shows a kernel density of final age acquired by people joining over the years. The colour density bar on the side shows increasing order of the probability where the colour intensity is high. It gives an indication that while a high number of people leave early (as the darker contours of density at the bottom suggest a higher probability for more participants leaving early), some participants go on to stay for a few more years and some participants go on to remain active for as long as they could.

Kernel Density: Age acquired by people joining each year
Kernel Density: Age acquired by people joining each year (Figure 1)

To understand more about the age categories we generated Gaussian Mixture Models (GMM) to reveal the possible clusters of age probability distribution that the participants go on to acquire. The data reflects the year in which the person exchanged email for the first time (year born), and the number of years the person remains active for (age).

We generate five Gaussian maximum age clusters, where each person belongs to a particular cluster. Each of these clusters are then manually analysed and some of the clusters are further merged as they were identified to be of similar age categories. Figure 2 shows, broadly, three categories of maximum acquired age, for users joining each year between the years 2000 and 2013 (to allow participants a time window to fully exhibit their longevity of association). The three broad categories identified are:

  • Early leavers: participants who go on to get inactive within 1 year of joining IETF.
  • Mid-level stayers: participants who go on to remain active for a period between 1 and 5 years before getting inactive.
  • Long-term stayers: participants who go onto remain active for 5 or more years.

This is an interesting observation since these categories are, broadly, consistent over an observed period of 13 years. Substantial proportion of people leave within a year or so throughout this time period, while some indeed go onto remain active for 5 or more years.

Participant Category Number of participants
Early leavers 17142
Mid-level stayers 5349
Long-term stayers 4833
Number of participants across categories (Table 1)

Gaussian Mixture Models: max age acquired over the years
Gaussian Mixture Models: max age acquired over the years (Figure 2)

Now that we have identified how IETF participants cluster regarding the age they go on to acquire, we explore what factors influence the length of their association.

Question: Is early age interaction activity indicative of how long a new participant goes on to remain active?

What is an interaction? - participants either respond to someone’s email on the mailing lists or their email is responded to by some participant and these collectively reflect the interaction activities of a participant. We hypothesise that new joiners who engage more with the existing community early on are more likely to stay for longer. This is based on the observation that after removing nearly 83% of accounts (posting two or less number of emails), the remaining 17% of the accounts formulate over 90% of the total volume of emails in the archives. Thus, the extent to which a new joiner interacts with the active community can influence their ability/motivation to remain engaged. Since Early leavers get inactive within 1 year of joining IETF, we consider analysing interaction behaviour of participants of all the three categories (above) in the first year of their participantship.

To understand whether a new joiner interacts more with young participants (other new joiners) or participants who have been in IETF for a long time we categorise the network nodes as one of the categories identified in the GMM model:

  • Senior participants: when the age of this participant, at the time of interaction with a new joiner, is 5 or more years.
  • Mid-age participants: when the age of this participant, at the time of interaction with a new joiner, is between 1 and 5 years.
  • Young participants: when the age of this participant, at the time of interaction with a new joiner, is 1 or less than 1 year.

We, now, have three types of categories for new joiners based on the number of years that they will remain in the IETF., early leavers, mid-level stayers, and long-term stayers. And, we also have three types of categories for their network viz., senior participants, mid-age participants, and young participants. To understand the interaction dynamics we look at two types of interactions:

  • Outgoing interaction (new joiner responds to an email from the network: email sent)
  • Incoming interaction (network responds to an email by new joiner: email received)

Next, for each new joiner we evaluate how many people (from each network category) they have an outgoing or incoming interaction with, and how many messages were covered in these interactions respectively (in the first year of their IETF lifespan). We do this to observe if the joiners from individual categories reflect any specific behaviour with respect to how they interact with their network, for instance, if the long-term stayers show a certain behavioural aspect, in terms of interactions, which early leavers do not. We record first year interactions for the new joiners in the years 2000-2013. We select the years till 2013 to make sure that there is enough time for participants to acquire whatever age they could go on to acquire till the time data was collected (in 2020), and avoid bias towards younger participants. For e.g. a new joiner, Person A, has outgoing interactions with 10 participants of the mid-age participant category sending 15 messages. We plot graphs, in Figures 3 and 4, showing incoming and outgoing interactions respectively, between new joiners across three categories and their corresponding network. We make the following observations:

  • Early leavers engage significantly less with the senior participants or mid-age participants. For instance, early leavers send less than 1 email as outgoing interactions, on average, to senior participants. The incoming interactions from senior participants are much lower than the outgoing interactions.
  • In their first year, long-term stayers not only take initiative to interact with the senior community but are also responded to by the senior participants of the community. Long-term stayers send more than 4 emails as outgoing interactions, on average, to senior participants. The incoming interactions from senior participants are close to 4 on average.
  • Mid-level stayers interact more with senior participants as compared to the early leavers, and send out over 2 emails as outgoing interactions, on average, to senior participants.

Conclusion

While the longevity of association of a person with the IETF might have more than one influencing factor, it is an important observation that participants who go on to remain associated with the IETF community for a long time, engage more with the senior participants and mid-age participants of the community in their early years. Getting a response back from the community in the mailing lists can turn out to be a motivating factor for a new joining participant. We also aim to explore how these interactions are reflected in the conversations, and how strongly the context of conversations play a role in these interactions and thereby, influencing the longevity of a person's association with IETF.

Additional figures

Incoming interactions, from IETF participants to new joinee (in their first year) of types: Early leavers (max age <= 1 year), Mid-level stayers (max age 1-5 years), and Long-term stayers (max age >= 5 years). IETF participants are classified as  senior (age >= 5 at time of interaction with new joinee), mid-age (age 1-5 at time of interaction), and young (age <= 1 at time of interaction).
Incoming interactions, from IETF participants to new joinee (in their first year) of types: Early leavers (max age <= 1 year), Mid-level stayers (max age 1-5 years), and Long-term stayers (max age >= 5 years). IETF participants are classified as senior (age >= 5 at time of interaction with new joinee), mid-age (age 1-5 at time of interaction), and young (age <= 1 at time of interaction) (Figure 3)

Outgoing interactions, from new joinees in their first year of joining depending on the longevity of the new joinee (Early leavers, max age <= 1 year; Mid-level stayers, max age 1-5 years; Long-term stayers,max age >= 5 years), and the seniority (at the time of the interaction) of the IETF participants they interact with (senior participants, age >=5;  mid-age, age 1-5; young participants, age <= 1).
Outgoing interactions, from new joinees in their first year of joining depending on the longevity of the new joinee (Early leavers, max age <= 1 year; Mid-level stayers, max age 1-5 years; Long-term stayers,max age >= 5 years), and the seniority (at the time of the interaction) of the IETF participants they interact with (senior participants, age >=5; mid-age, age 1-5; young participants, age <= 1) (Figure 4)