How to Import and Stack Data Using Python Pandas and Jupyter Notebooks
An important tool in a threat hunter's tool belt.
First things first lets discuss what Python pandas and Jupyter Notebooks are, and how they can help a threat hunter crunch through their data.
What is Python pandas?
Python pandas is a library that provides powerful and easy-to-use data analysis and manipulation tools for Python. It allows you to work with data frames, which are two-dimensional labeled data structures that can store different types of data.
You can use pandas to read, write, clean, explore, and visualize data from various sources, such as:
CSV
JSON
SQL
Excel
more
Pandas also has many built-in functions and methods for performing common tasks, such as:
filtering
grouping
aggregating
merging
reshaping data
Pandas is one of the most popular and widely used libraries for data science and machine learning in Python.
What is a Jupyter Notebook?
A Jupyter Notebook is a web application that allows you to create and share documents that contain live code, equations, visualizations, and text. You can use Jupyter Notebook for data science, machine learning, scientific computing, and other interactive computing workflows.
Jupyter Notebook supports over 40 programming languages, including:
Python
R
Julia
Scala
You can also use Jupyter widgets to create interactive dashboards and Voilà to convert notebooks into web applications. Jupyter Notebook is part of the Project Jupyter, which promotes open standards and services for interactive computing.
How can a threat hunter use these 2 tools?
A cyber threat hunter can use Python pandas and Jupyter Notebook to perform various tasks related to data analysis and visualization of security events. For example, a threat hunter can use these tools to:
Read and parse security log files in different formats, such as JSON, CSV, or XML, and load them into pandas data frames for easy manipulation and exploration.
Apply various filters, transformations, aggregations, and calculations on the data frames to extract relevant information and insights from the security events.
Query and join data from different sources, such as Elasticsearch, SQL databases, or Spark clusters, using pandas or PySpark libraries.
Visualize the data using various charts, graphs, maps, and widgets, using libraries such as matplotlib, seaborn, plotly, or ipywidgets.
Document and share the analysis steps and results using Jupyter Notebook’s rich text and code cells, which can also be converted into web applications or interactive dashboards.
Now to import your libraries your and setup your columns and rows you will run these commands:
import pandas as pd
import re
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)
pd.set_option('display.max_colwidth', None)
Then next I am importing a CSV file that I created from my threat hunting scenarios that consists of Sysmon ProcessCreate events. I am also dropping a column of garbage data I do not need.
noabar_proc=sysmon_noabar.drop(["Unnamed: 0"],axis=1)
I will then use the function head() to return the first 5 rows of my data to ensure it imported correctly.
noabar_proc.head()
Here is a screenshot
And now the last piece is to choose the column you want to stack on and use the function value_counts(). I also use the argument ascending=True to sort column least to most.
noabar_proc.value_counts("ParentCommandLine",ascending=True)
See screenshot.
And thats it! Thank you for reading and if this was interesting or informative I ask that you like, comment, and share!
Happy hunting!
Resources: