Pandas Data Types

Accompanying the PB Python article here

Use and df.dtypes to look at the types that pandas automatically infers based on the data


Since the 2016 and 2017 columns were read in as objects, trying to add the values will result in string concatenation not numerical addition

The simplest way to to convert to a type is using astype.

We can apply it to the customer number first.

The code above does not alter the original dataframe

Assign the new integer customer number back to the original frame and check the type

The data all looks good for the Customer Number.

If we try to convert the Jan Units column, we will get an error.

In a similar manner we get an error if we try to convert the sales column

We can try to use astype with a bool type but that does not give expected results

In order to convert the currency and percentages, we need to use custom functions

Use apply to convert the 2016 and 2017 columns to floating point numbers

We could use a lambda function as well but it may be more difficult for new users to understand

Use a lambda function to convert the percentage strings to numbers

pd.to_numeric is another option for handling column conversions when invalid values are included

Make sure to populate the original column of data

pd.to_datetime is very useful for working with date conversions

Use np.where to convert the active column to a boolean

Many of the examples shown above can be used when reading in data using dtypes or converters arguments