33.pandas dataframe

A Data frame is a two-dimensional data structure, i.e., data is aligned in a tabular fashion in rows and columns.

pandas.DataFrame

pandas.DataFrame( data, index, columns, dtype, copy)

Create DataFrame

A pandas DataFrame can be created using various inputs like: * Lists * dict * Series * Numpy ndarrays * Another DataFrame

Create an Empty DataFrame

A basic DataFrame, which can be created is an Empty Dataframe.

Example

import pandas as pd
df = pd.DataFrame()
print(df)
>>> Empty DataFrame
    Columns: []
    Index: []

Create a DataFrame from Lists

The DataFrame can be created using a single list or a list of lists.

Example

import pandas as pd
data = [1,2,3,4,5]
df = pd.DataFrame(data)
print(df)
>>>      0
    0    1
    1    2
    2    3
    3    4
    4    5

Example

import pandas as pd
data = [['Bhavya',10],['Manasa',12],['Nandhini',13],['Keerthi',14]]
df = pd.DataFrame(data,columns=['Name','level'])
print(df)
>>>     Name  level
  0    Bhavya     10
  1    Manasa     12
  2  Nandhini     13
  3   Keerthi     14

Example

import pandas as pd
data =  [['Bhavya',10],['Manasa',12],['Nandhini',13],['Keerthi',14]]
df = pd.DataFrame(data,columns=['Name','level'],dtype=float)
print(df)
>>>     Name     level
  0    Bhavya     10.0
  1    Manasa     12.0
  2  Nandhini     13.0
  3   Keerthi     14.0

The dtype parameter changes the type of level column to floating point.

Create a DataFrame from Dict of ndarrays / Lists

All the ndarrays must be of same length. If index is passed, then the length of the index should equal to the length of the arrays.
If no index is passed, then by default, index will be range(n), where n is the array length.

Example

import pandas as pd
data = {'Name':['Tom', 'Jack', 'Steve', 'Ricky'],'Age':[28,34,29,42]}
df = pd.DataFrame(data)
print(df)
>>>       Age      Name
    0     28        Tom
    1     34       Jack
    2     29      Steve
    3     42      Ricky

Observe the values 0,1,2,3. They are the default index assigned to each using the function range(n).

Example

import pandas as pd
data = {'Name':['Tom', 'Jack', 'Steve', 'Ricky'],'Age':[28,34,29,42]}
df = pd.DataFrame(data, index=['rank1','rank2','rank3','rank4'])
print(df)
>>>         Age    Name
    rank1    28      Tom
    rank2    34     Jack
    rank3    29    Steve
    rank4    42    Ricky

The index parameter assigns an index to each row.

Create a DataFrame from List of Dicts

List of Dictionaries can be passed as input data to create a DataFrame. The dictionary keys are by default taken as column names.

Example

import pandas as pd
data = [{'a': 1, 'b': 2},{'a': 5, 'b': 10, 'c': 20}]
df = pd.DataFrame(data)
print(df)
>>>     a    b      c
    0   1   2     NaN
    1   5   10   20.0

Example

import pandas as pd
data = [{'a': 1, 'b': 2},{'a': 5, 'b': 10, 'c': 20}]

#With two column indices, values same as dictionary keys
df1 = pd.DataFrame(data, index=['first', 'second'], columns=['a', 'b'])

#With two column indices with one index with other name
df2 = pd.DataFrame(data, index=['first', 'second'], columns=['a', 'b1'])
print(df1)
>>>          a  b
    first    1  2
    second   5  10
print(df2)
>>>          a  b1
    first    1  NaN
    second   5  NaN

Observe, df2 DataFrame is created with a column index other than the dictionary key; thus, appended the NaN’s in place. Whereas, df1 is created with column indices same as dictionary keys, so NaN’s appended.

Create a DataFrame from Dict of Series

Dictionary of Series can be passed to form a DataFrame. The resultant index is the union of all the series indexes passed.

Example

import pandas as pd

d = {'one' : pd.Series([1, 2, 3], index=['a', 'b', 'c']),
  'two' : pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd'])}

df = pd.DataFrame(d)
print(df)
>>>       one    two
    a     1.0    1
    b     2.0    2
    c     3.0    3
    d     NaN    4

Accesssing the elements from the DataFrame

columns

Column selection

import pandas as pd

d = {'one' : pd.Series([1, 2, 3], index=['a', 'b', 'c']),
  'two' : pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd'])}

df = pd.DataFrame(d)
print(df['one'])
>>> a     1.0
    b     2.0
    c     3.0
    d     NaN
Name: one, dtype: float64

Column Addition

mport pandas as pd

d = {'one' : pd.Series([1, 2, 3], index=['a', 'b', 'c']),
  'two' : pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd'])}

df = pd.DataFrame(d)
print ("Adding a new column by passing as Series:")
df['three']=pd.Series([10,20,30],index=['a','b','c'])
print(df)
>>>      one   two   three
    a    1.0    1    10.0
    b    2.0    2    20.0
    c    3.0    3    30.0
    d    NaN    4    NaN

Column Deletion

# Using the previous DataFrame, we will delete a column
# using del function
import pandas as pd

d = {'one' : pd.Series([1, 2, 3], index=['a', 'b', 'c']), 
'two' : pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd']), 
'three' : pd.Series([10,20,30], index=['a','b','c'])}

df = pd.DataFrame(d)
print ("Our dataframe is:")
print df
>>> Our dataframe is:
    one   three  two
a     1.0    10.0   1
b     2.0    20.0   2
c     3.0    30.0   3
d     NaN     NaN   4

# using del function
print ("Deleting the first column using DEL function:")
del df['one']
print df
>>> Deleting the first column using DEL function:
    three    two
a     10.0     1
b     20.0     2
c     30.0     3
d     NaN      4


# using pop function
print ("Deleting another column using POP function:")
df.pop('two')
print df
>>> Deleting another column using POP function:
three
a  10.0
b  20.0
c  30.0
d  NaN

rows

Row selection

import pandas as pd

d = {'one' : pd.Series([1, 2, 3], index=['a', 'b', 'c']), 
'two' : pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd'])}

df = pd.DataFrame(d)
print df.loc['b']
>>> one 2.0
    two 2.0
    Name: b, dtype: float64

# select by integer location
print df.iloc[2]
>>> one   3.0
    two   3.0
    Name: c, dtype: float64

# Slice Rows
print df[2:4]
>>>  one  two
c  3.0    3
d  NaN    4

Row Addition

import pandas as pd

df = pd.DataFrame([[1, 2], [3, 4]], columns = ['a','b'])
df2 = pd.DataFrame([[5, 6], [7, 8]], columns = ['a','b'])

df = df.append(df2)
print df
>>>    a  b
    0  1  2
    1  3  4
    0  5  6
    1  7  8

Row Deletion

import pandas as pd

df = pd.DataFrame([[1, 2], [3, 4]], columns = ['a','b'])
df2 = pd.DataFrame([[5, 6], [7, 8]], columns = ['a','b'])

df = df.append(df2)

# Drop rows with label 0
df = df.drop(0)

print df
>>>   a b
    1 3 4
    1 7 8