← Back to Blog

Data Analysis with Python and Pandas

Published Nov 2, 2025 • 10 min read

Pandas is the most popular Python library for data analysis. It makes working with structured data (like CSV files and Excel spreadsheets) incredibly easy and powerful.

What is Pandas?

Pandas is an open-source library that provides high-performance data structures and analysis tools. It's built on top of NumPy and is essential for anyone working with data in Python.

Installation

pip install pandas

Reading Data

Pandas can read data from various formats:

import pandas as pd # Read CSV file df = pd.read_csv('data.csv') # Read Excel file df = pd.read_excel('data.xlsx') # Read from URL df = pd.read_csv('https://example.com/data.csv')

Basic DataFrame Operations

# View first rows print(df.head()) # Get info about dataset print(df.info()) # Statistical summary print(df.describe()) # Select columns print(df['column_name']) # Filter rows filtered = df[df['age'] > 25]

Data Cleaning

Clean your data for better analysis:

# Remove duplicates df = df.drop_duplicates() # Handle missing values df = df.fillna(0) # Fill with 0 df = df.dropna() # Remove rows with NaN # Rename columns df = df.rename(columns={'old_name': 'new_name'})

Data Analysis

# Group by and aggregate grouped = df.groupby('category').mean() # Sort values sorted_df = df.sort_values('price', ascending=False) # Calculate statistics average = df['price'].mean() total = df['quantity'].sum() maximum = df['score'].max()

Exporting Data

# Save to CSV df.to_csv('output.csv', index=False) # Save to Excel df.to_excel('output.xlsx', index=False)
💡 Pro Tip: Always use head() to preview your data before performing operations!

Real-World Example

Let's analyze a sales dataset:

import pandas as pd # Load data sales = pd.read_csv('sales.csv') # Calculate total revenue per product revenue = sales.groupby('product')['price'].sum() # Find top 5 products top_products = revenue.sort_values(ascending=False).head(5) # Calculate average order value avg_order = sales['price'].mean() print(f"Top 5 Products:\n{top_products}") print(f"Average Order Value: ${avg_order:.2f}")

Conclusion

Pandas is an essential tool for data analysis in Python. Start with simple operations and gradually explore more advanced features like merging, pivoting, and time series analysis.

← Back to Blog