Analyse your web browsing history using Python

August 21, 2017

20346

We are living in the era of the Web. From buying a simple thing to monitoring huge amount of data, we are always connected to the Internet. So, it would be highly valuable for us to actually know how we spend our valuable time on the Internet. In this article, we are writing a python program that will help you analyse your web browsing history. You need to be using Mozilla’s Firefox to utilise our solution.

Writing the program

Firefox stores the browsing history locally in SQLite database. We are, therefore, using the sqlite3 library of Python to connect to the open source database, query the necessary fields and extract the required data, which is the list of URLs visited and the total visit counts.

You can find the SQLite database on a Linux machine through the following path in the home directory:
“.mozilla/firefox/7xov879d.default”. The name of the database is “places.sqlite”, and the following code gives us the path of the database:

data_path = os.path.expanduser('~')+"/.mozilla/firefox/7xov879d.default"
files = os.listdir(data_path)
history_db = os.path.join(data_path, 'places.sqlite')

Now that we have the path of the database, we will perform the necessary SQL operations. To do that, we will first import the sqlite3 library in Python. After importing the library, we will be extracting the required information from the database. Below is the code that extracts necessary data:

c = sqlite3.connect(history_db)
cursor = c.cursor()

select_statement = "select moz_places.url, moz_places.visit_count from moz_places;"
cursor.execute(select_statement)

results = cursor.fetchall()

In the first two lines, we are connecting to the database. In the next two lines, we are executing the select statement. The result of the select statement is stored in the variable results, and the statement “cursor.fetchall” returns a list of tuples.

Next, we will count the occurrence of each URL and associate these two as a key-value pair in a dictionary. But before that, we are writing a function to parse the URL. For this, we will be using the urlparse library.

def parse(url):
try:
parse_url = urlparse(url)
domain = parse_url.netloc
return domain
except IndexError:
print("URL format error")

We are using the netloc function to get the domain of the visited site. In the next step, we will write a loop to count the number of times a site was visited and store the information in a dictionary.

sites_count = {}

for url, count in results:
url = parse(url)

if url in sites_count:
sites_count[url] += 1
else:
sites_count[url] = 1

Now, we are sorting the dictionary in ascending order and extracting the top ten visited sites. To sort the dictionary we will be using the OrderedDict function.

sites_count_sorted = OrderedDict(sorted(sites_count.items(), key=operator.itemgetter(1), reverse=True)[:13])

After sorting the dictionary, we are writing a function to plot the extracted information in a graph. To plot the graph, we will be using the matplotlib library.

def analyse(results):

plt.bar(range(len(results)), results.values(), align='edge')
plt.xticks(rotation=20)
plt.xticks(range(len(results)), results.keys())

plt.show()

The complete code – Python browsing history analyzer for Firefox is available below. (Feel free to add your tweaks.)

import os
import sqlite3
import operator
from collections import OrderedDict
import matplotlib.pyplot as plt
from urlparse import urlparse

def parse(url):
try:
parse_url = urlparse(url)
domain = parse_url.netloc
return domain
except IndexError:
print("URL format error")

def analyse(results):

plt.bar(range(len(results)), results.values(), align='edge')
plt.xticks(rotation=20)
plt.xticks(range(len(results)), results.keys())

plt.show()

data_path = os.path.expanduser('~')+"/.mozilla/firefox/7xov879d.default"
files = os.listdir(data_path)
history_db = os.path.join(data_path, 'places.sqlite')

c = sqlite3.connect(history_db)
cursor = c.cursor()
select_statement = "select moz_places.url, moz_places.visit_count from moz_places;"
cursor.execute(select_statement)

results = cursor.fetchall()
sites_count = {}

for url, count in results:
url = parse(url)

if url in sites_count:
sites_count[url] += 1
else:
sites_count[url] = 1

sites_count_sorted = OrderedDict(sorted(sites_count.items(), key=operator.itemgetter(1), reverse=True)[:10])

analyse(sites_count_sorted)

You can use the comment section below to ask your queries regarding the browsing history analysis program.

LEAVE A REPLY Cancel reply

Thought Leaders

HOW TOs

MOST POPULAR

Open Journey

EDITOR PICKS

POPULAR POSTS

POPULAR CATEGORY