sodestream

Articles by Stephen McQuistin

Analysis of IETF mailing lists in 2020

by Stephen McQuistin • Tuesday 20 April 2021 • Permalink

In 2020, there were 118,537 unique e-mails sent across IETF mailing lists by 3,616 addresses. In this blog post, we briefly outline highlights and trends in the dataset, analyse the ietf@ietf.org list, and describe the raw dataset.

Highlights and trends

In 2020, we saw the following highlights:

  • There was activity (i.e., at least one e-mail sent) across 335 mailing lists, with 16,612 e-mails to the quic-issues list, making up 12.5% of all e-mails in the dataset
  • 17,045 e-mails were sent from notifications@github.com and noreply@github.com: these addresses were the first and third biggest contributers of e-mail respectively. This highlights the growing role that GitHub plays in managing working group activity.
  • Addresses were active on an average of 3.3 mailing lists
  • Active addresses sent 33 e-mails on average

CDF of e-mails sent per address
Figure 1: CDF of e-mails sent per address
CDF of number of lists each address is active in
Figure 2: CDF of number of lists each address is active in

Figures 1 and 2 show CDFs of the e-mail volume and list participation, respectively, for each address. These figures show that participation across IETF mailing lists is heavy-tailed: 93.4% of addresses contributed fewer than 100 e-mails each, with 71.2% sending fewer than 10 each. Similarly, approximately 80% of addresses were active in 3 or fewer mailing lists.

Matching e-mail addresses to Datatracker profiles

The raw IETF mail archive dataset can be noisy, with automated e-mails from notification addresses, and participants splitting their activity across multiple e-mail accounts. To tidy up the dataset, we attempted to match e-mail senders with Datatracker profiles. Names and e-mail addresses were extracted from the From header of the e-mail, and these were matched against the Datatracker. Many exact matches were found using e-mail addresses, but were this was not possible, a number of heuristics were applied to the name to find a match. In some cases, no Datatracker account could be matched to a given sender.

In total, 88,662 e-mails were sent by participants matched to their Datatracker profile, representing 74.8% of the total e-mails in the dataset. We found that 2,088 people (i.e., Datatracker users) sent e-mails across 319 mailing lists.

Name E-mail count (percentage of all e-mail)
Martin Thomson 6577 (7.42%)
Jana Iyengar 2024 (2.28%)
Michael Richardson 1629 (1.84%)
Mike Bishop 1492 (1.68%)
Carsten Bormann 1253 (1.41%)
Table 1: Top 5 senders in 2020 (percentages show proportion of people-matched e-mail volume)

List name E-mail count (percentage of all e-mail)
quic-issues 12884 (13.06%)
ietf 5356 (5.43%)
ipv6 4247 (4.31%)
last-call 2736 (2.77%)
dmarc 2008 (2.04%)
Table 2: Top 5 e-mail lists in 2020 (percentages show proportion of people-matched e-mail volume)

Tables 1 and 2 show the top 5 e-mail senders and mailing lists, respectively. Table 1 highlights the purpose of matching senders to Datatracker profiles: the significant portion of e-mail assigned to the top participants was sent via GitHub issues, previously aggregated under @github.com addresses.

CDF of e-mails sent per person
Figure 3: CDF of e-mails sent per person
CDF of number of lists each person is active in
Figure 4: CDF of number of lists each person is active in

Figures 3 and 4 replot Figures 1 and 2, but across the Datatracker-mapped e-mail dataset. Figure 4 highlights that 80% of people are active (i.e., send at least one e-mail) on 5 or fewer mailing lists.

Analysing ietf@ietf.org

The dataset also provides insight into the ietf@ietf.org mailing list. As the general IETF discussion list, the list should ideally be representative of the broader IETF community. We found that 322 people, 15.4% of the 2088 people in the dataset, sent at least one e-mail to the ietf@ietf.org list. In total, 5356 e-mails were sent to the ietf@ietf.org list in the Datatracker-mapped dataset. Of those, 29.6%, or 1589 e-mails, were sent by the top 10 most active participants; more than 50% of the e-mails sent to the ietf@ietf.org were sent by the top 25 contributors.

CDF of e-mails sent per person to all lists and ietf@ietf.org
Figure 5: CDF of e-mails sent per person to all lists and ietf@ietf.org

Figure 5 highlights this difference in behaviour, as compared with wider contribution trends. As shown, people that participate in the ietf@ietf.org list tend to send greater volumes of e-mail to that list.

Raw data tables

In order to allow for further analysis, we provide the raw data tables used in the analysis above. Each data table is a tab-separated file. Data tables are provided for each of the address-based and Datatracker-mapped person-based datasets described.

All of the raw data tables can be downloaded here. Each data table is described below.

Address-based datasets
All e-mails by unique address

Each row contains the e-mail address, the count of unique e-mails sent, and the per-list breakdown of each e-mail sent. Note that uniqueness is defined by the Message-ID field; per-list counts may sum to more than this number, where an e-mail is sent to multiple lists.

all-addrs.dat, 3616 rows
Unique addresses by active list count

Each row contains the e-mail address and the count of e-mail lists that the address sent at least one e-mail to.

addr-lists-rank.dat, 3616 rows
Unique addresses active on ietf@ietf.org

Each row contains the e-mail address, count of e-mails sent to ietf@ietf.org, and the percentage of e-mails sent by each address to ietf@ietf.org.

addr-email-rank-ietf.dat, 392 rows
Lists ranked by e-mail volume

Each row contains the mailing list name, count of e-mails sent, and the percentage of e-mails sent to each list.

addr-list-email-rank.dat, 335 rows
Lists ranked by active address count

Each row contains the mailing list name and count of unique addresses that sent at least one e-mail.

addr-list-active-rank.dat, 335 rows
Lists by e-mail volume from addresses that only contribute to a single list

Each row contains the mailing list name and count of e-mails sent by addresses that only sent e-mail to that list.

addr-list-single-active-pct-volume.dat, 229 rows
List of addresses that only contributed to ietf@ietf.org

Each row contains the address and volume of e-mail sent by addresses that only contributed to ietf@ietf.org.

addr-ietf-single-volume-rank.dat, 69 rows
People-based datasets