Malware Analysis Deep Dive: From Static to Dynamic Analysis
Introduction
Throughout my cybersecurity career, malware analysis has been one of the most challenging yet rewarding aspects of threat research. From my early days at Innobuzz conducting training sessions to analyzing nation-state malware at the Ministry of Defence, I’ve developed a comprehensive methodology that combines traditional static analysis with advanced dynamic techniques.
The Evolution of Malware Analysis
Traditional vs. Modern Approaches
The malware landscape has evolved dramatically:
- Early 2000s: Simple file-based analysis, signature detection
- 2010s: Packed malware, polymorphic techniques, sandbox evasion
- 2020s: Living-off-the-land, fileless malware, AI-powered evasion
This evolution requires analysts to adapt their methodologies continuously.
Static Analysis Fundamentals
Static analysis involves examining malware without executing it. This is always the first step in my analysis workflow.
File Metadata Analysis
# Basic file information
file suspicious_sample.exe
strings suspicious_sample.exe | grep -i "http\|ftp\|\.exe\|\.dll"
hexdump -C suspicious_sample.exe | head -20
# Hash calculation for threat intelligence
md5sum suspicious_sample.exe
sha1sum suspicious_sample.exe
sha256sum suspicious_sample.exe
PE Structure Analysis
For Windows executables, PE structure analysis reveals crucial information:
import pefile
import hashlib
def analyze_pe_structure(file_path):
"""Comprehensive PE file analysis"""
try:
pe = pefile.PE(file_path)
analysis_results = {
'file_info': {
'md5': hashlib.md5(open(file_path, 'rb').read()).hexdigest(),
'sha256': hashlib.sha256(open(file_path, 'rb').read()).hexdigest(),
'size': len(open(file_path, 'rb').read())
},
'pe_info': {
'machine_type': hex(pe.FILE_HEADER.Machine),
'timestamp': pe.FILE_HEADER.TimeDateStamp,
'entry_point': hex(pe.OPTIONAL_HEADER.AddressOfEntryPoint),
'image_base': hex(pe.OPTIONAL_HEADER.ImageBase)
},
'sections': [],
'imports': [],
'exports': [],
'suspicious_indicators': []
}
# Analyze sections
for section in pe.sections:
section_info = {
'name': section.Name.decode('utf-8').rstrip('\x00'),
'virtual_address': hex(section.VirtualAddress),
'raw_size': section.SizeOfRawData,
'entropy': section.get_entropy()
}
# High entropy might indicate packing/encryption
if section.get_entropy() > 7.0:
analysis_results['suspicious_indicators'].append(
f"High entropy section: {section_info['name']} ({section.get_entropy():.2f})"
)
analysis_results['sections'].append(section_info)
# Analyze imports
if hasattr(pe, 'DIRECTORY_ENTRY_IMPORT'):
for entry in pe.DIRECTORY_ENTRY_IMPORT:
dll_name = entry.dll.decode('utf-8')
functions = []
for function in entry.imports:
if function.name:
func_name = function.name.decode('utf-8')
functions.append(func_name)
# Check for suspicious API calls
if func_name in ['CreateProcess', 'WriteProcessMemory', 'VirtualAlloc']:
analysis_results['suspicious_indicators'].append(
f"Suspicious API: {func_name} from {dll_name}"
)
analysis_results['imports'].append({
'dll': dll_name,
'functions': functions
})
return analysis_results
except Exception as e:
return {'error': f"PE analysis failed: {str(e)}"}
String Analysis
String analysis often reveals the most actionable intelligence:
import re
def extract_suspicious_strings(file_path):
"""Extract and categorize suspicious strings"""
with open(file_path, 'rb') as f:
data = f.read()
# Convert to string, ignore errors
try:
text = data.decode('utf-8', errors='ignore')
except:
text = data.decode('latin1', errors='ignore')
patterns = {
'urls': re.findall(r'https?://[^\s<>"]{2,}', text, re.IGNORECASE),
'ip_addresses': re.findall(r'\b(?:[0-9]{1,3}\.){3}[0-9]{1,3}\b', text),
'email_addresses': re.findall(r'\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b', text),
'file_paths': re.findall(r'[A-Za-z]:\\[^<>:"|?*\s]+', text),
'registry_keys': re.findall(r'HKEY_[A-Z_]+\\[^<>:"|?*\s]+', text, re.IGNORECASE),
'crypto_indicators': re.findall(r'\b[A-Fa-f0-9]{32,}\b', text) # Potential hashes/keys
}
return {k: list(set(v)) for k, v in patterns.items() if v} # Remove duplicates
Dynamic Analysis Techniques
Dynamic analysis involves executing malware in a controlled environment to observe its behavior.
Sandbox Environment Setup
My standard analysis environment includes:
sandbox_configuration:
hypervisor: VMware Workstation Pro
operating_systems:
- Windows 10 x64 (latest patches)
- Windows 7 x86 (legacy analysis)
- Ubuntu 20.04 LTS (Linux malware)
monitoring_tools:
process_monitor: ProcMon
network_capture: Wireshark
api_monitor: API Monitor
memory_analysis: Volatility Framework
isolation:
network: Isolated virtual network
snapshots: Pre-execution snapshots for rollback
time_limits: 30-minute maximum execution
Behavioral Analysis Framework
import psutil
import time
import json
from datetime import datetime
class BehaviorAnalyzer:
def __init__(self):
self.baseline_processes = set(p.pid for p in psutil.process_iter())
self.baseline_connections = set(self.get_network_connections())
self.analysis_start = datetime.now()
self.behaviors = []
def get_network_connections(self):
"""Get current network connections"""
connections = []
for conn in psutil.net_connections():
if conn.status == 'ESTABLISHED':
connections.append(f"{conn.laddr.ip}:{conn.laddr.port} -> {conn.raddr.ip}:{conn.raddr.port}")
return connections
def monitor_process_creation(self):
"""Monitor for new process creation"""
current_processes = set(p.pid for p in psutil.process_iter())
new_processes = current_processes - self.baseline_processes
for pid in new_processes:
try:
process = psutil.Process(pid)
self.behaviors.append({
'timestamp': datetime.now().isoformat(),
'type': 'process_creation',
'details': {
'pid': pid,
'name': process.name(),
'cmdline': ' '.join(process.cmdline()),
'parent_pid': process.ppid()
}
})
except (psutil.NoSuchProcess, psutil.AccessDenied):
pass
self.baseline_processes = current_processes
def monitor_network_activity(self):
"""Monitor for new network connections"""
current_connections = set(self.get_network_connections())
new_connections = current_connections - self.baseline_connections
for conn in new_connections:
self.behaviors.append({
'timestamp': datetime.now().isoformat(),
'type': 'network_connection',
'details': {'connection': conn}
})
self.baseline_connections = current_connections
def monitor_file_system(self, watch_paths):
"""Monitor file system changes in specified paths"""
# Implementation would use file system monitoring
# This is a simplified version
pass
def analyze_sample(self, sample_path, duration=300):
"""Run complete behavioral analysis"""
print(f"Starting analysis of {sample_path}")
# Execute the sample (in a real scenario, this would be more sophisticated)
import subprocess
process = subprocess.Popen(sample_path, shell=True)
# Monitor behavior for specified duration
end_time = time.time() + duration
while time.time() < end_time:
self.monitor_process_creation()
self.monitor_network_activity()
time.sleep(1)
# Terminate the process if still running
try:
process.terminate()
except:
pass
return {
'sample': sample_path,
'analysis_duration': duration,
'behaviors': self.behaviors,
'summary': self.generate_summary()
}
def generate_summary(self):
"""Generate analysis summary"""
process_count = len([b for b in self.behaviors if b['type'] == 'process_creation'])
network_count = len([b for b in self.behaviors if b['type'] == 'network_connection'])
return {
'total_behaviors': len(self.behaviors),
'process_creations': process_count,
'network_connections': network_count,
'risk_assessment': self.assess_risk()
}
def assess_risk(self):
"""Simple risk assessment based on observed behaviors"""
risk_score = 0
# Risk factors
process_count = len([b for b in self.behaviors if b['type'] == 'process_creation'])
network_count = len([b for b in self.behaviors if b['type'] == 'network_connection'])
if process_count > 5:
risk_score += 30
if network_count > 0:
risk_score += 40
# Check for suspicious process names
suspicious_processes = ['cmd.exe', 'powershell.exe', 'reg.exe']
for behavior in self.behaviors:
if behavior['type'] == 'process_creation':
if any(susp in behavior['details']['name'].lower() for susp in suspicious_processes):
risk_score += 20
return min(risk_score, 100)
Advanced Analysis Techniques
Memory Forensics
Memory analysis reveals runtime behavior that file analysis cannot:
# Volatility Framework commands for memory analysis
volatility -f memory_dump.raw --profile=Win10x64_19041 imageinfo
volatility -f memory_dump.raw --profile=Win10x64_19041 pslist
volatility -f memory_dump.raw --profile=Win10x64_19041 netscan
volatility -f memory_dump.raw --profile=Win10x64_19041 malfind
volatility -f memory_dump.raw --profile=Win10x64_19041 yarascan -y malware_rules.yar
Network Traffic Analysis
from scapy.all import *
def analyze_network_traffic(pcap_file):
"""Analyze network traffic for malicious indicators"""
packets = rdpcap(pcap_file)
analysis = {
'total_packets': len(packets),
'protocols': {},
'destinations': {},
'suspicious_activities': []
}
for packet in packets:
# Protocol analysis
if packet.haslayer(IP):
dst_ip = packet[IP].dst
analysis['destinations'][dst_ip] = analysis['destinations'].get(dst_ip, 0) + 1
# Check for suspicious destinations
if is_suspicious_ip(dst_ip):
analysis['suspicious_activities'].append({
'type': 'suspicious_destination',
'ip': dst_ip,
'timestamp': packet.time
})
# HTTP analysis
if packet.haslayer(Raw):
payload = packet[Raw].load.decode('utf-8', errors='ignore')
if 'User-Agent:' in payload:
user_agent = extract_user_agent(payload)
if is_suspicious_user_agent(user_agent):
analysis['suspicious_activities'].append({
'type': 'suspicious_user_agent',
'user_agent': user_agent
})
return analysis
def is_suspicious_ip(ip):
"""Check if IP is suspicious based on threat intelligence"""
# This would integrate with threat intelligence feeds
# For demo purposes, flagging certain IP ranges
suspicious_ranges = ['91.', '185.', '194.'] # Example malicious IP prefixes
return any(ip.startswith(prefix) for prefix in suspicious_ranges)
Automated Analysis Pipeline
Based on my experience analyzing thousands of samples, I developed this automated pipeline:
class MalwareAnalysisPipeline:
def __init__(self):
self.stages = [
'file_identification',
'static_analysis',
'dynamic_analysis',
'memory_analysis',
'network_analysis',
'report_generation'
]
def analyze_sample(self, sample_path):
"""Complete automated analysis pipeline"""
results = {'sample': sample_path, 'stages': {}}
try:
# Stage 1: File Identification
results['stages']['file_identification'] = self.identify_file_type(sample_path)
# Stage 2: Static Analysis
results['stages']['static_analysis'] = {
'pe_analysis': analyze_pe_structure(sample_path),
'strings': extract_suspicious_strings(sample_path),
'entropy': calculate_file_entropy(sample_path)
}
# Stage 3: Dynamic Analysis (if safe)
if self.is_safe_to_execute(results['stages']['static_analysis']):
analyzer = BehaviorAnalyzer()
results['stages']['dynamic_analysis'] = analyzer.analyze_sample(sample_path)
# Stage 4: Generate IOCs
results['iocs'] = self.extract_iocs(results)
# Stage 5: Threat Classification
results['classification'] = self.classify_threat(results)
return results
except Exception as e:
results['error'] = str(e)
return results
def extract_iocs(self, analysis_results):
"""Extract Indicators of Compromise"""
iocs = {
'file_hashes': [],
'network_indicators': [],
'registry_keys': [],
'file_paths': []
}
# Extract from static analysis
static = analysis_results['stages'].get('static_analysis', {})
if 'strings' in static:
strings = static['strings']
iocs['network_indicators'].extend(strings.get('urls', []))
iocs['network_indicators'].extend(strings.get('ip_addresses', []))
iocs['registry_keys'].extend(strings.get('registry_keys', []))
iocs['file_paths'].extend(strings.get('file_paths', []))
# Extract from dynamic analysis
dynamic = analysis_results['stages'].get('dynamic_analysis', {})
if 'behaviors' in dynamic:
for behavior in dynamic['behaviors']:
if behavior['type'] == 'network_connection':
iocs['network_indicators'].append(behavior['details']['connection'])
return iocs
Case Studies from the Field
Case Study 1: Nation-State Malware Analysis
During my time at the Ministry of Defence, I analyzed a sophisticated APT sample:
Initial Observations:
- File size: 2.3MB (unusually large for initial payload)
- High entropy sections (7.8+) indicating packing
- No obvious strings in static analysis
Static Analysis Findings:
- Custom packer with anti-analysis techniques
- Legitimate digital signature (stolen certificate)
- Import table obfuscation
Dynamic Analysis Results:
{
"execution_flow": [
"Unpacking routine executed",
"Process hollowing of legitimate Windows binary",
"Registry persistence established",
"C2 communication initiated"
],
"network_indicators": [
"https://legitimate-looking-domain.com/api/v1/status",
"DNS queries to compromised domains"
],
"persistence_methods": [
"HKEY_CURRENT_USER\\Software\\Microsoft\\Windows\\CurrentVersion\\Run"
]
}
Case Study 2: Banking Trojan Analysis
Sample: Emotet variant targeting financial institutions
Key Findings:
- Modular architecture with plugin system
- Email harvesting capabilities
- Credential theft targeting specific banks
- P2P communication for resilience
Attribution Indicators:
- Code similarities to known Emotet samples
- Infrastructure overlaps with previous campaigns
- TTP alignment with TA542 group
Tools and Techniques Comparison
Static Analysis Tools
| Tool | Strength | Use Case | Cost |
|---|---|---|---|
| IDA Pro | Advanced disassembly | Complex reverse engineering | Commercial |
| Ghidra | Free NSA tool | General reverse engineering | Free |
| PEiD | Packer detection | Quick packer identification | Free |
| Strings | String extraction | Basic IOC discovery | Free |
Dynamic Analysis Tools
| Tool | Strength | Use Case | Cost |
|---|---|---|---|
| Cuckoo Sandbox | Automated analysis | High-volume processing | Free |
| Any.run | Interactive analysis | Manual behavior observation | Freemium |
| Joe Sandbox | Comprehensive reports | Enterprise analysis | Commercial |
| Process Monitor | Real-time monitoring | Live system analysis | Free |
Advanced Evasion Techniques (What to Watch For)
Anti-Analysis Techniques
Modern malware employs sophisticated evasion:
def detect_analysis_evasion(sample_behavior):
"""Detect common analysis evasion techniques"""
evasion_indicators = []
# VM detection techniques
vm_artifacts = [
'VMware', 'VirtualBox', 'QEMU', 'Xen',
'vmtoolsd.exe', 'VBoxService.exe'
]
# Sandbox detection
sandbox_artifacts = [
'cuckoo', 'analyst', 'malware', 'sample',
'virus', 'sandbox'
]
# Time-based evasion
if sample_behavior.get('execution_time', 0) < 60:
evasion_indicators.append('Short execution time (possible sleep evasion)')
# Check for environment enumeration
processes = sample_behavior.get('process_creations', [])
for process in processes:
if any(artifact in process.lower() for artifact in vm_artifacts):
evasion_indicators.append(f'VM detection attempt: {process}')
if any(artifact in process.lower() for artifact in sandbox_artifacts):
evasion_indicators.append(f'Sandbox detection attempt: {process}')
return evasion_indicators
Best Practices and Lessons Learned
Analysis Workflow
- Always start with static analysis - Never execute unknown samples first
- Use isolated environments - Dedicated analysis networks and systems
- Document everything - Maintain detailed analysis notes and screenshots
- Validate findings - Cross-reference IOCs with threat intelligence
- Share intelligence - Contribute findings to community databases
Common Pitfalls
- Rushing to execute: Static analysis often provides sufficient intelligence
- Insufficient isolation: Malware can escape poorly configured sandboxes
- Ignoring metadata: File creation times, paths, and attributes provide context
- Over-relying on automation: Manual analysis catches what automated tools miss
Future of Malware Analysis
Emerging Trends
- AI-Powered Malware: Malware using machine learning for evasion
- Fileless Attacks: Living-off-the-land techniques reducing file-based artifacts
- Cloud-Native Threats: Malware designed for cloud environments
- IoT Malware: Embedded system threats requiring specialized analysis
Analysis Evolution
- ML-Assisted Analysis: Using AI to identify patterns and classify samples
- Behavior-Based Detection: Focus on actions rather than signatures
- Threat Intelligence Integration: Real-time feeds enhancing analysis context
- Collaborative Analysis: Shared platforms for community-driven research
Conclusion
Malware analysis remains both an art and a science. The key to effective analysis is:
- Systematic Approach: Following consistent methodologies while adapting to new techniques
- Continuous Learning: The threat landscape evolves rapidly, requiring constant skill development
- Tool Mastery: Understanding both the capabilities and limitations of analysis tools
- Intelligence Integration: Connecting analysis findings to broader threat intelligence
Based on my experience analyzing thousands of samples across different organizations, the analysts who succeed are those who combine technical depth with strategic thinking, always asking not just “what does this malware do?” but “what does this tell us about the adversary’s objectives and capabilities?”
This deep dive represents techniques and methodologies developed over years of hands-on malware analysis. For specific questions about advanced techniques or tool recommendations, feel free to reach out.