Pandas
Python Pandas - Introduction
Pandas is an open-source Python Library providing high-performance data manipulation and analysis tool using its powerful data structures. The name Pandas is derived from the word Panel Data – an Econometrics from Multidimensional data.
Using Pandas, we can accomplish five typical steps in the processing and analysis of data, regardless of the origin of data — load, prepare, manipulate, model, and analyze.
Key Features of Pandas
- Fast and efficient DataFrame object with default and customized indexing.
- Tools for loading data into in-memory data objects from different file formats.
- Data alignment and integrated handling of missing data.
- Reshaping and pivoting of date sets.
- Label-based slicing, indexing and subsetting of large data sets.
- Columns from a data structure can be deleted or inserted.
- Group by data for aggregation and transformations.
- High performance merging and joining of data.
- Time Series functionality.
Standard Python distribution doesn't come bundled with Pandas module.
Pandas can be installed using pip install pandas
Data Structures
Pandas deals with the following three data structures: * Series * DataFrame * Panel
These data structures are built on top of Numpy array, which means they are fast.
Series
Series is a one-dimensional array like structure with homogeneous data. For example: series is a collection of integers 10, 23, 56, 78, 98, 12
- Homogeneous data
- Size Immutable
- Values of Data mutable
DataFrame
DataFrame is a two-dimensional array with heterogeneous data.
Name | age | Gender | Rating |
---|---|---|---|
Steve | 32 | Male | 3.45 |
Lia | 28 | Female | 4.6 |
Vin | 45 | Male | 3.9 |
Katie | 38 | Female | 2.78 |
The data types of the four columns are as follows:
Column | Type |
---|---|
Name | String |
Age | Integer |
Gender | String |
Rating | Float |
- Heterogeneous data
- Size Mutable
- Data Mutable
Panel
Panel is a three-dimensional data structure with heterogeneous data. It is hard to represent the panel in graphical representation. But a panel can be illustrated as a container of DataFrame.
- Heterogeneous data
- Size Mutable
- Data Mutable