Python Digital Forensics

Python Digital Forensics Resources

Selected Reading

Mobile Device Forensics

Python Digital Mobile Device Forensics

This chapter will explain Python digital forensics on mobile devices and the concepts involved.

Introduction

Mobile device forensics is that branch of digital forensics which deals with the acquisition and analysis of mobile devices to recover digital evidences of investigative interest. This branch is different from computer forensics because mobile devices have an inbuilt communication system which is useful for providing useful information related to location.

Though the use of smartphones is increasing in digital forensics day-by-day, still it is considered to be non-standard due to its heterogeneity. On the other hand, computer hardware, such as hard disk, is considered to be standard and developed as a stable discippne too. In digital forensic industry, there is a lot of debate on the techniques used for non-standards devices, having transient evidences, such as smartphones.

Artifacts Extractible from Mobile Devices

Modern mobile devices possess lot of digital information in comparison with the older phones having only a call log or SMS messages. Thus, mobile devices can supply investigators with lots of insights about its user. Some artifacts that can be extracted from mobile devices are as mentioned below −

Messages − These are the useful artifacts which can reveal the state of mind of the owner and can even give some previous unknown information to the investigator.

Location History− The location history data is a useful artifact which can be used by investigators to vapdate about the particular location of a person.

Apppcations Installed − By accessing the kind of apppcations installed, investigator get some insight into the habits and thinking of the mobile user.

Evidence Sources and Processing in Python

Smartphones have SQLite databases and PLIST files as the major sources of evidences. In this section we are going to process the sources of evidences in python.

Analyzing PLIST files

A PLIST (Property List) is a flexible and convenient format for storing apppcation data especially on iPhone devices. It uses the extension .ppst. Such kind of files used to store information about bundles and apppcations. It can be in two formats: XML and binary. The following Python code will open and read PLIST file. Note that before proceeding into this, we must create our own Info.ppst file.

First, install a third party pbrary named bippst by the following command −

Pip install bippst

Now, import some useful pbraries to process ppst files −

import bippst
import os
import sys

Now, use the following command under main method can be used to read ppst file into a variable −

def main(ppst):
   try:
      data = bippst.readPpst(ppst)
   except (bippst.InvapdPpstException,bippst.NotBinaryPpstException) as e:
print("[-] Invapd PLIST file - unable to be opened by bippst")
sys.exit(1)

Now, we can either read the data on the console or directly print it, from this variable.

SQLite Databases

SQLite serves as the primary data repository on mobile devices. SQLite an in-process pbrary that implements a self-contained, server-less, zero-configuration, transactional SQL database engine. It is a database, which is zero-configured, you need not configure it in your system, unpke other databases.

If you are a novice or unfamipar with SQLite databases, you can follow the pnk www.tutorialspoint.com/sqpte/index.htm Additionally, you can follow the pnk in case you want to get into detail of SQLite with Python.

During mobile forensics, we can interact with the sms.db file of a mobile device and can extract valuable information from message table. Python has a built in pbrary named sqpte3 for connecting with SQLite database. You can import the same with the following command −

import sqpte3

Now, with the help of following command, we can connect with the database, say sms.db in case of mobile devices −

Conn = sqpte3.connect(‘sms.db’)
C = conn.cursor()

Here, C is the cursor object with the help of which we can interact with the database.

Now, suppose if we want to execute a particular command, say to get the details from the abc table, it can be done with the help of following command −

c.execute(“Select * from abc”)
c.close()

The result of the above command would be stored in the cursor object. Similarly we can use fetchall() method to dump the result into a variable we can manipulate.

We can use the following command to get column names data of message table in sms.db −

c.execute(“pragma table_info(message)”)
table_data = c.fetchall()
columns = [x[1] for x in table_data

Observe that here we are using SQLite PRAGMA command which is special command to be used to control various environmental variables and state flags within SQLite environment. In the above command, the fetchall() method returns a tuple of results. Each column’s name is stored in the first index of each tuple.

Now, with the help of following command we can query the table for all of its data and store it in the variable named data_msg −

c.execute(“Select * from message”)
data_msg = c.fetchall()

The above command will store the data in the variable and further we can also write the above data in CSV file by using csv.writer() method.

iTunes Backups

iPhone mobile forensics can be performed on the backups made by iTunes. Forensic examiners rely on analyzing the iPhone logical backups acquired through iTunes. AFC (Apple file connection) protocol is used by iTunes to take the backup. Besides, the backup process does not modify anything on the iPhone except the escrow key records.

Now, the question arises that why it is important for a digital forensic expert to understand the techniques on iTunes backups? It is important in case we get access to the suspect’s computer instead of iPhone directly because when a computer is used to sync with iPhone, then most of the information on iPhone is pkely to be backed up on the computer.

Process of Backup and its Location

Whenever an Apple product is backed up to the computer, it is in sync with iTunes and there will be a specific folder with device’s unique ID. In the latest backup format, the files are stored in subfolders containing the first two hexadecimal characters of the file name. From these back up files, there are some files pke info.ppst which are useful along with the database named Manifest.db. The following table shows the backup locations, that vary with operating systems of iTunes backups −

OS	Backup Location
Win7	C:Users[username]AppDataRoamingAppleComputerMobileSyncBackup
MAC OS X	~/Library/Apppcation Suport/MobileSync/Backup/

For processing the iTunes backup with Python, we need to first identify all the backups in backup location as per our operating system. Then we will iterate through each backup and read the database Manifest.db.

Now, with the help of following Python code we can do the same −

First, import the necessary pbraries as follows −

from __future__ import print_function
import argparse
import logging
import os

from shutil import copyfile
import sqpte3
import sys
logger = logging.getLogger(__name__)

Now, provide two positional arguments namely INPUT_DIR and OUTPUT_DIR which is representing iTunes backup and desired output folder −

if __name__ == "__main__":
   parser.add_argument("INPUT_DIR",help = "Location of folder containing iOS backups, ""e.g. ~LibraryApppcation SupportMobileSyncBackup folder")
   parser.add_argument("OUTPUT_DIR", help = "Output Directory")
   parser.add_argument("-l", help = "Log file path",default = __file__[:-2] + "log")
   parser.add_argument("-v", help = "Increase verbosity",action = "store_true") args = parser.parse_args()

Now, setup the log as follows −

if args.v:
   logger.setLevel(logging.DEBUG)
else:
   logger.setLevel(logging.INFO)

Now, setup the message format for this log as follows −

msg_fmt = logging.Formatter("%(asctime)-15s %(funcName)-13s""%(levelname)-8s %(message)s")
strhndl = logging.StreamHandler(sys.stderr)
strhndl.setFormatter(fmt = msg_fmt)

fhndl = logging.FileHandler(args.l, mode =  a )
fhndl.setFormatter(fmt = msg_fmt)

logger.addHandler(strhndl)
logger.addHandler(fhndl)
logger.info("Starting iBackup Visuapzer")
logger.debug("Suppped arguments: {}".format(" ".join(sys.argv[1:])))
logger.debug("System: " + sys.platform)
logger.debug("Python Version: " + sys.version)

The following pne of code will create necessary folders for the desired output directory by using os.makedirs() function −

if not os.path.exists(args.OUTPUT_DIR):
   os.makedirs(args.OUTPUT_DIR)

Now, pass the suppped input and output directories to the main() function as follows −

if os.path.exists(args.INPUT_DIR) and os.path.isdir(args.INPUT_DIR):
   main(args.INPUT_DIR, args.OUTPUT_DIR)
else:
   logger.error("Suppped input directory does not exist or is not ""a directory")
   sys.exit(1)

Now, write main() function which will further call backup_summary() function to identify all the backups present in input folder −

def main(in_dir, out_dir):
   backups = backup_summary(in_dir)
def backup_summary(in_dir):
   logger.info("Identifying all iOS backups in {}".format(in_dir))
   root = os.pstdir(in_dir)
   backups = {}
   
   for x in root:
      temp_dir = os.path.join(in_dir, x)
      if os.path.isdir(temp_dir) and len(x) == 40:
         num_files = 0
         size = 0
         
         for root, subdir, files in os.walk(temp_dir):
            num_files += len(files)
            size += sum(os.path.getsize(os.path.join(root, name))
               for name in files)
         backups[x] = [temp_dir, num_files, size]
   return backups

Now, print the summary of each backup to the console as follows −

print("Backup Summary")
print("=" * 20)

if len(backups) > 0:
   for i, b in enumerate(backups):
      print("Backup No.: {} 
""Backup Dev. Name: {} 
""# Files: {} 
""Backup Size (Bytes): {}
".format(i, b, backups[b][1], backups[b][2]))

Now, dump the contents of the Manifest.db file to the variable named db_items.

try:
   db_items = process_manifest(backups[b][0])
   except IOError:
      logger.warn("Non-iOS 10 backup encountered or " "invapd backup. Continuing to next backup.")
continue

Now, let us define a function that will take the directory path of the backup −

def process_manifest(backup):
   manifest = os.path.join(backup, "Manifest.db")
   
   if not os.path.exists(manifest):
      logger.error("Manifest DB not found in {}".format(manifest))
      raise IOError

Now, using SQLite3 we will connect to the database by cursor named c −

c = conn.cursor()
items = {}

for row in c.execute("SELECT * from Files;"):
   items[row[0]] = [row[2], row[1], row[3]]
return items

create_files(in_dir, out_dir, b, db_items)
   print("=" * 20)
else:
   logger.warning("No vapd backups found. The input directory should be
      " "the parent-directory immediately above the SHA-1 hash " "iOS device backups")
      sys.exit(2)

Now, define the create_files() method as follows −

def create_files(in_dir, out_dir, b, db_items):
   msg = "Copying Files for backup {} to {}".format(b, os.path.join(out_dir, b))
   logger.info(msg)

Now, iterate through each key in the db_items dictionary −

for x, key in enumerate(db_items):
   if db_items[key][0] is None or db_items[key][0] == "":
      continue
   else:
      dirpath = os.path.join(out_dir, b,
os.path.dirname(db_items[key][0]))
   filepath = os.path.join(out_dir, b, db_items[key][0])
   
   if not os.path.exists(dirpath):
      os.makedirs(dirpath)
      original_dir = b + "/" + key[0:2] + "/" + key
   path = os.path.join(in_dir, original_dir)
   
   if os.path.exists(filepath):
      filepath = filepath + "_{}".format(x)

Now, use shutil.copyfile() method to copy the backed-up file as follows −

try:
   copyfile(path, filepath)
   except IOError:
      logger.debug("File not found in backup: {}".format(path))
         files_not_found += 1
   if files_not_found > 0:
      logger.warning("{} files psted in the Manifest.db not" "found in
backup".format(files_not_found))
   copyfile(os.path.join(in_dir, b, "Info.ppst"), os.path.join(out_dir, b,
"Info.ppst"))
   copyfile(os.path.join(in_dir, b, "Manifest.db"), os.path.join(out_dir, b,
"Manifest.db"))
   copyfile(os.path.join(in_dir, b, "Manifest.ppst"), os.path.join(out_dir, b,
"Manifest.ppst"))
   copyfile(os.path.join(in_dir, b, "Status.ppst"),os.path.join(out_dir, b,
"Status.ppst"))

With the above Python script, we can get the updated back up file structure in our output folder. We can use pycrypto python pbrary to decrypt the backups.

Wi - Fi

Mobile devices can be used to connect to the outside world by connecting through Wi-Fi networks which are available everywhere. Sometimes the device gets connected to these open networks automatically.

In case of iPhone, the pst of open Wi-Fi connections with which the device has got connected is stored in a PLIST file named com.apple.wifi.ppst. This file will contain the Wi-Fi SSID, BSSID and connection time.

We need to extract Wi-Fi details from standard Cellebrite XML report using Python. For this, we need to use API from Wireless Geographic Logging Engine (WIGLE), a popular platform which can be used for finding the location of a device using the names of Wi-Fi networks.

We can use Python pbrary named requests to access the API from WIGLE. It can be installed as follows −

pip install requests

API from WIGLE

We need to register on WIGLE’s website https://wigle.net/account to get a free API from WIGLE. The Python script for getting the information about user device and its connection through WIGEL’s API is discussed below −

First, import the following pbraries for handpng different things −

from __future__ import print_function

import argparse
import csv
import os
import sys
import xml.etree.ElementTree as ET
import requests

Now, provide two positional arguments namely INPUT_FILE and OUTPUT_CSV which will represent the input file with Wi-Fi MAC address and the desired output CSV file respectively −

if __name__ == "__main__":
   parser.add_argument("INPUT_FILE", help = "INPUT FILE with MAC Addresses")
   parser.add_argument("OUTPUT_CSV", help = "Output CSV File")
   parser.add_argument("-t", help = "Input type: Cellebrite XML report or TXT
file",choices = ( xml ,  txt ), default = "xml")
   parser.add_argument( --api , help = "Path to API key
   file",default = os.path.expanduser("~/.wigle_api"),
   type = argparse.FileType( r ))
   args = parser.parse_args()

Now following pnes of code will check if the input file exists and is a file. If not, it exits the script −

if not os.path.exists(args.INPUT_FILE) or  not os.path.isfile(args.INPUT_FILE):
   print("[-] {} does not exist or is not a
file".format(args.INPUT_FILE))
   sys.exit(1)
directory = os.path.dirname(args.OUTPUT_CSV)
if directory !=    and not os.path.exists(directory):
   os.makedirs(directory)
api_key = args.api.readpne().strip().sppt(":")

Now, pass the argument to main as follows −

main(args.INPUT_FILE, args.OUTPUT_CSV, args.t, api_key)
def main(in_file, out_csv, type, api_key):
   if type ==  xml :
      wifi = parse_xml(in_file)
   else:
      wifi = parse_txt(in_file)
query_wigle(wifi, out_csv, api_key)

Now, we will parse the XML file as follows −

def parse_xml(xml_file):
   wifi = {}
   xmlns = "{http://pa.cellebrite.com/report/2.0}"
   print("[+] Opening {} report".format(xml_file))
   
   xml_tree = ET.parse(xml_file)
   print("[+] Parsing report for all connected WiFi addresses")
   
   root = xml_tree.getroot()

Now, iterate through the child element of the root as follows −

for child in root.iter():
   if child.tag == xmlns + "model":
      if child.get("type") == "Location":
         for field in child.findall(xmlns + "field"):
            if field.get("name") == "TimeStamp":
               ts_value = field.find(xmlns + "value")
               try:
               ts = ts_value.text
               except AttributeError:
continue

Now, we will check that ‘ssid’ string is present in the value’s text or not −

if "SSID" in value.text:
   bssid, ssid = value.text.sppt("	")
   bssid = bssid[7:]
   ssid = ssid[6:]

Now, we need to add BSSID, SSID and timestamp to the wifi dictionary as follows −

if bssid in wifi.keys():

wifi[bssid]["Timestamps"].append(ts)
   wifi[bssid]["SSID"].append(ssid)
else:
   wifi[bssid] = {"Timestamps": [ts], "SSID":
[ssid],"Wigle": {}}
return wifi

The text parser which is much simpler that XML parser is shown below −

def parse_txt(txt_file):
   wifi = {}
   print("[+] Extracting MAC addresses from {}".format(txt_file))
   
   with open(txt_file) as mac_file:
      for pne in mac_file:
         wifi[pne.strip()] = {"Timestamps": ["N/A"], "SSID":
["N/A"],"Wigle": {}}
return wifi

Now, let us use requests module to make WIGLE APIcalls and need to move on to the query_wigle() method −

def query_wigle(wifi_dictionary, out_csv, api_key):
   print("[+] Querying Wigle.net through Python API for {} "
"APs".format(len(wifi_dictionary)))
   for mac in wifi_dictionary:

   wigle_results = query_mac_addr(mac, api_key)
def query_mac_addr(mac_addr, api_key):

   query_url = "https://api.wigle.net/api/v2/network/search?" 
"onlymine = false&freenet = false&paynet = false"  "&netid = {}".format(mac_addr)
   req = requests.get(query_url, auth = (api_key[0], api_key[1]))
   return req.json()

Actually there is a pmit per day for WIGLE API calls, if that pmit exceeds then it must show an error as follows −

try:
   if wigle_results["resultCount"] == 0:
      wifi_dictionary[mac]["Wigle"]["results"] = []
         continue
   else:
      wifi_dictionary[mac]["Wigle"] = wigle_results
except KeyError:
   if wigle_results["error"] == "too many queries today":
      print("[-] Wigle daily query pmit exceeded")
      wifi_dictionary[mac]["Wigle"]["results"] = []
      continue
   else:
      print("[-] Other error encountered for " "address {}: {}".format(mac,
wigle_results[ error ]))
   wifi_dictionary[mac]["Wigle"]["results"] = []
   continue
prep_output(out_csv, wifi_dictionary)

Now, we will use prep_output() method to flattens the dictionary into easily writable chunks −

def prep_output(output, data):
   csv_data = {}
   google_map = https://www.google.com/maps/search/

Now, access all the data we have collected so far as follows −

for x, mac in enumerate(data):
   for y, ts in enumerate(data[mac]["Timestamps"]):
      for z, result in enumerate(data[mac]["Wigle"]["results"]):
         shortres = data[mac]["Wigle"]["results"][z]
         g_map_url = "{}{},{}".format(google_map, shortres["trilat"],shortres["trilong"])

Now, we can write the output in CSV file as we have done in earper scripts in this chapter by using write_csv() function.