← Back to Blog
Data Analysis with Python and Pandas
Published Nov 2, 2025 • 10 min read
Pandas is the most popular Python library for data analysis. It makes working with structured data (like CSV files and Excel spreadsheets) incredibly easy and powerful.
What is Pandas?
Pandas is an open-source library that provides high-performance data structures and analysis tools. It's built on top of NumPy and is essential for anyone working with data in Python.
Installation
pip install pandas
Reading Data
Pandas can read data from various formats:
import pandas as pd
# Read CSV file
df = pd.read_csv('data.csv')
# Read Excel file
df = pd.read_excel('data.xlsx')
# Read from URL
df = pd.read_csv('https://example.com/data.csv')
Basic DataFrame Operations
# View first rows
print(df.head())
# Get info about dataset
print(df.info())
# Statistical summary
print(df.describe())
# Select columns
print(df['column_name'])
# Filter rows
filtered = df[df['age'] > 25]
Data Cleaning
Clean your data for better analysis:
# Remove duplicates
df = df.drop_duplicates()
# Handle missing values
df = df.fillna(0) # Fill with 0
df = df.dropna() # Remove rows with NaN
# Rename columns
df = df.rename(columns={'old_name': 'new_name'})
Data Analysis
# Group by and aggregate
grouped = df.groupby('category').mean()
# Sort values
sorted_df = df.sort_values('price', ascending=False)
# Calculate statistics
average = df['price'].mean()
total = df['quantity'].sum()
maximum = df['score'].max()
Exporting Data
# Save to CSV
df.to_csv('output.csv', index=False)
# Save to Excel
df.to_excel('output.xlsx', index=False)
💡 Pro Tip: Always use head() to preview your data before performing operations!
Real-World Example
Let's analyze a sales dataset:
import pandas as pd
# Load data
sales = pd.read_csv('sales.csv')
# Calculate total revenue per product
revenue = sales.groupby('product')['price'].sum()
# Find top 5 products
top_products = revenue.sort_values(ascending=False).head(5)
# Calculate average order value
avg_order = sales['price'].mean()
print(f"Top 5 Products:\n{top_products}")
print(f"Average Order Value: ${avg_order:.2f}")
Conclusion
Pandas is an essential tool for data analysis in Python. Start with simple operations and gradually explore more advanced features like merging, pivoting, and time series analysis.
← Back to Blog