The complete paper is included as an attachment and can be found here: Cultural Analysis Incubator and Chinese Initiated Projects



Data Source

All ASF projects have publicly archived mailing lists. The Apache mailing list are a permanent searchable archive that are publicly available. They are a legacy communication medium from the creation of the ASF that now forms an integral part of any ASF project. It is the heart of a project and a place where people interact, communicate, collaborate, argue, agree and disagree. This means that it is an appropriate place to mine data to look for cultural indicators.

Research Tools

The research will be carried out using the following tools, formulas and indicators.

Apache Kibble

Apache Kibble is a suite of tools for collecting, aggregating and visualising data and activity in software projects.

The following Kibble indicators will be used:

Pony Factor

The Pony Factor (PF) measures the diversity of a project based on the contributions from individual contributors. It can be defined as:

  • The lowest number of contributors whose total contribution makes up the majority”

of whatever is being measured (e.g. lines of code written, number of messages sent etc.).

A higher Pony Factor means that a project has a good tolerance for continuing to survive if one or more of the core contributors leaves.

NOTE: Pony Factor includes all contributions from contributors whether they are still active or not.

Augmented Pony Factor

The Augmented Pony Factor (APF) is an adjustment to the standard Pony Factor calculation where contributions from contributors that are no longer active are not included.

NOTE: The Augmented Pony Factor will not be used as part of the assessment but a description of it has been included here for completeness.

Meta Pony Factor

The Meta Pony Factor calculation is a work in progress. It attempts measure the affiliation of a contributor based on the email address linked to the contribution. If developed further then this could help identify distinct organisations that are contributing.

Sentient Analysis

Pang and Lee (2008) defined “Sentient analysis” is a tool that is

a kind of text mining, which is used to predict human mind, specifically the emotional state of a person by extracting specific emotional expressions from the text”

This means that it can be used as an indicator to gauge people’s opinions and reactions to certain ideas. Data is collected in the form of text and an algorithm is used to identify keywords associated with an emotion. Any communication can be linked to several emotions so weightings are used to highlight the strength of the sentiment.

Key Phrase Extraction

Key Phrase Extraction (KPE) is a method where key phrase or words are extracted that can summarise the main ideas or themes of a document. It has been successfully used for indexing journals and online content but in this paper it will be used to extract any text that could indicate cultural ideas or language

Contributor Retention

Contributor retention is related to how successful a project is at attracting and retaining contributor and can be broken down into the following areas:

  • Active Contributors:

    • How many contributors are active within the project

    • Are contributors regular and remain active over a longer timespan

  • Retained Contributors:

    • How long a contributor has been contributing

    • The longer a contributor has been retained the more successful a project is at retaining them

  • Contributors that have Left

    • How many contributors are leaving

  • Past Contributors that have Returned

    • How many contributors have contributed in the past and have returned to rejoin the community

Data Selection


  • No labels