## Time spent on a question (can be useful for worker ability)¶

Let's see how many judgments we have per unit

Let's remove the units that have only one judgment

1. Create a column with time spent (use pd.to_datetime)
2. Compute the average time per worker

# Basic aggregation¶

## Quantitative variables¶

If we are also doing a per-worker analysis, we can compute values from the worker

## Categorical variables¶

Now we can't do the following because the following is a categorical variable:

Let's explore what is this column and decide what to do

The majority vote of an array is simply the mode

How is the variable distributed?

Let's compute the majority voting

Sometimes this returns two values, let's get the first in that case (better way would be random)

# Weighted measures¶

## Weighted majority voting¶

Now we need, for each unit, to find the category with the highest trust score

# Free text¶

Now we analyse the case in which we have free text

We can't use the weighted majority voting here! We need first to assign a score to this values.

## Exercise¶

• Create a function that assigns a score to each value of the column 'explanation_0' (for example the text lenght len(text), or whether in contains some words from a list, str in value) look here for reference https://pandas.pydata.org/pandas-docs/stable/text.html
• create a column with this score
• generate a weighted mean for it (using '_trust')