Ecommerce Purchases Exercise
In this Exercise you will be given some Fake Data about some purchases done through Amazon! Just go ahead and follow the directions and try your best to answer the questions and complete the tasks. Feel free to reference the solutions. Most of the tasks can be solved in different ways. For the most part, the questions get progressively harder.
Please excuse anything that doesn't make "Real-World" sense in the dataframe, all the data is fake and made-up.
Also note that all of these questions can be answered with one line of code.
** Import pandas and read in the Ecommerce Purchases csv file and set it to a DataFrame called ecom. **
import pandas as pd
df = pd.read_csv('../Inputs/Ecommerce Purchases')
Check the head of the DataFrame.
df.head()
** How many rows and columns are there? **
df.shape
df.info()
** What is the average Purchase Price? **
df['Purchase Price'].mean()
** What were the highest and lowest purchase prices? **
df['Purchase Price'].max()
df['Purchase Price'].min()
** How many people have English 'en' as their Language of choice on the website? **
df[df['Language']=='en'].count()
** How many people have the job title of "Lawyer" ? **
df[df['Job']=='Lawyer'].count()
** How many people made the purchase during the AM and how many people made the purchase during PM ? **
*(Hint: Check out value_counts() ) *
df['AM or PM'].value_counts()
** What are the 5 most common Job Titles? **
df['Job'].value_counts().head()
** Someone made a purchase that came from Lot: "90 WT" , what was the Purchase Price for this transaction? **
df[df['Lot']=='90 WT']['Purchase Price']
** What is the email of the person with the following Credit Card Number: 4926535242672853 **
df[df['Credit Card']==4926535242672853]['Email']
** How many people have American Express as their Credit Card Provider and made a purchase above $95 ?**
df[(df['CC Provider']=='American Express')&(df['Purchase Price']>95)].count()
** Hard: How many people have a credit card that expires in 2025? **
df[df['CC Exp Date'].apply(lambda x:x[3:])=='25'].count()
** Hard: What are the top 5 most popular email providers/hosts (e.g. gmail.com, yahoo.com, etc...) **
df['Email'].apply(lambda x:x.split('@')[1]).value_counts().head()