Pandas

Python Pandas - Introduction

Pandas is an open-source Python Library providing high-performance data manipulation and analysis tool using its powerful data structures. The name Pandas is derived from the word Panel Data – an Econometrics from Multidimensional data.

Using Pandas, we can accomplish five typical steps in the processing and analysis of data, regardless of the origin of data — load, prepare, manipulate, model, and analyze.

Key Features of Pandas

  • Fast and efficient DataFrame object with default and customized indexing.
  • Tools for loading data into in-memory data objects from different file formats.
  • Data alignment and integrated handling of missing data.
  • Reshaping and pivoting of date sets.
  • Label-based slicing, indexing and subsetting of large data sets.
  • Columns from a data structure can be deleted or inserted.
  • Group by data for aggregation and transformations.
  • High performance merging and joining of data.
  • Time Series functionality.

Standard Python distribution doesn't come bundled with Pandas module.

Pandas can be installed using pip install pandas

Data Structures

Pandas deals with the following three data structures: * Series * DataFrame * Panel

These data structures are built on top of Numpy array, which means they are fast.

Series

Series is a one-dimensional array like structure with homogeneous data. For example: series is a collection of integers 10, 23, 56, 78, 98, 12

  • Homogeneous data
  • Size Immutable
  • Values of Data mutable

DataFrame

DataFrame is a two-dimensional array with heterogeneous data.

Name age Gender Rating
Steve 32 Male 3.45
Lia 28 Female 4.6
Vin 45 Male 3.9
Katie 38 Female 2.78

The data types of the four columns are as follows:

Column Type
Name String
Age Integer
Gender String
Rating Float
  • Heterogeneous data
  • Size Mutable
  • Data Mutable

Panel

Panel is a three-dimensional data structure with heterogeneous data. It is hard to represent the panel in graphical representation. But a panel can be illustrated as a container of DataFrame.

  • Heterogeneous data
  • Size Mutable
  • Data Mutable