Initialization

Approach

The notebook is organized as follows: the first section will retrieve and display the translated articles from English to French. It is further broken down to display data from three years: 2016, 2017, and 2018. Throughout this section, I provide explanations of why I chose certain articles and some analysis of what I have observed. The second section however, contains my main analysis.

For this reserach project, I investigated two main questions. First, what sorts of articles were chosen to be translated into French? Is there any cultural bias present? Second, what did the contributions look like for the translated articles? By this I mean, what were the popular contributions (content, minor grammar fixes, link fixes) and who tended to be the contributors? So, people who were just fluent in French or people who also had an interest in the subject of the article? This second main question is more difficult to answer than the first. I chose to manually look into several articles, so my sample size is extremely small. I also did not look at the home page of every single contributor of these articles. Therefore, it is difficult to draw a generalized and accurate statement regarding the contributors. Nevertheless, I did observe some interesting scenarios regarding the kinds of contributions/contributors.

Reference used: https://paws-public.wmflabs.org/paws-public/User:Isaac_(WMF)/Content%20Translation%20Example.ipynb

2016 Article Choices

I chose the following three articles for a more in depth analysis. I chose two articles about people because I was curious on the types of people picked to have their articles translated. The Dinoire article had a moderately high human translation portion of 0.786. The Koljonen article has a low human translation portion of 0.237. I was interested in why there would be this disparity. I picked the SIAI S.8 article because it had a high human translation portion of 0.933 and it was not about a person. Notes about each of these three articles are in the second section of this notebook.

We can see from this sample of 52 articles published in 2016 that the subject titles are very diverse, ranging from grilled cheese to corruption in France. After opening several (between 10 and 15) articles at random, I noticed that they are all short in total length, falling within 3,000 to 12,000 bytes.

2017 Article Choices

Similar to 2016, I picked articles with different human translated portions. I picked one article on a person, one political article, and one scientific article.

The types of articles translated from my sample of 50 articles from 2017 does not seem to differ significantly from the 2016 articles. Approximately 40% of the articles are about people.

2018 Article Choices

From a sample size of 50, two of the articles I chose were about people and one was on camouflage. This sample seems consistent with the samples from 2016 and 2017. Approximately 1/3 of the articles are about people. The rest of the articles vary widely on their subject matter.

Qualitiative Analysis

I started my analysis on a very general level. I was curious to see what kind of content was translated. Some questions I asked: were more articles about people or events translated? What kind of people were more popularly translated? Celebrities or historical figures? The approach I decided on taking was to look at 3 articles from each a list of 50-55 articles published each year from 2016-2018.

Articles Published in 2016

Articles Published in 2017

Articles Published in 2018

From a glance, it seems that people consist of 30% of translated content. There is also a few selections of airports. Overall however, the content translated seem quite diverse.

Discussion

To answer the first question, by far articles about individuals make up the largest genre. Articles about events go as far back as the 1910s and as recent as the 2016 Paralympics. I did not come across any article about an event earlier than the 1900s. Although at first glance we see that all these individuals come from different backgrounds, I note that the majority are about women. Additionally, the people presented are not extremely famous, e.g. Edith Finch Russell vs Betrand Russell. However, this is probably more due to the fact that the document corpus for this research project is post 2016, so articles on famous people, such as Beyonce, would logically already have been published and translated. There does seem to be some cultural bias on what is translated. For example, there is a higher frequency about people who are French or European, such as Isabelle Dinoire or Cristian Popescu Piedone, whose article I did not look at in depth. Additionally, note the difference between the length and quality of the original English article on Trump's first 100 days in office and the translated French article. The translated article is significantly shorter, which seems to imply the lack of interest of French speakers compared to English speakers on the American presidency.

Now to anwer my second question of what the contribution page looks like for the translated articles. For the most part, content is not widely different from the original English articles. There are sometimes minor organization differences, as in "Inviscid Flow" and "Mary Tsingou". It is however, interesting to note that tables and lists are sources of divergence between the original and translated articles. If the 4 articles I looked up that had tables or lists, there was some dfference regarding the tables in all of the translated articles. In the articles about the two athletes, Koljonen and Holt, the French articles removed and added a table, respectively. Additionally, the French version of biographical articles, such as for Finch Russell, requests a photo of the person, whereas the English article does not.

Regarding contributors: on average, French articles would have significantly less contributors than the English articles. French translated articles had on average 7 contributors whereas the English articles would have upwards of 20 contributors. I also observed that it is more common for the translated French articles to have one author who had translated most of the content in one contribution and other later editors added categories or changed wordings. For example, the translated article of "Inviscid Flow" seemed like it was an outlier in that content was added steadily over multiple contributions, but then I realized that those contributions were from one user. Lastly, I did not observe any correlation between user attributes to the kind of edits they made or to any certain subject of articles they would contribute to. People with interests from mathematics to geography would contribute edits to articles about women poets. Article translation, to French at least, did not seem to draw newer users. Of all the contributors I happened to click on, the "newest" user I saw joined in 2017.

Conclusion

I learned that there exists some cultural bias that influences users when deciding which articles to translate into French and this in turn affects the quality of the translation. Of the articles translated, short biographical articles about not well-known people form the largest class. Compared to the original English articles, translated articles on average have a much lower number of contributors. For the most part, one contributor would do the bulk of the translation in the beginning and other later contributors would edit the wording, finish the rest of the translation, or add categories. Except for the article "First 100 Days of Donald Trump's Presidency", the content of the original article was not greatly changed in the translated articles. However, liberties would be taken regarding tables and lists in the translated articles.

For future work, more investigation regarding the kinds of articles selected to be translated could be done. In particular, what does the corpus of translated articles from English to Mandarin look like? Perhaps biographical articles would still form the largest class, but would there be differences in who exactly was chosen? I am also curious if the editors would make organizationally different choices for a language like Mandarin. Would the structure of the article be different, or would it follow the outline of the original article? Even for French, some struturally different choices, albeit minor, were made for the translated article. However, it is unclear if this was due to the contributor's personal style or to the language itself.