Exporting your Amazon Kindle Highlights & Notes to MySQL

Date

Thu Apr 20

Author

Martijn

Not Automagically Enough

I started digging into that web page to see if I can somehow automate it. As it turned out, they are using a dynamic scroller which loads content when you reach the bottom of the page (see Infinite Scrolling). Just retrieving the HTML contents of the page with a server-side script wasn’t going to work. Also, Amazon deploys a lot of counter measures on their website to stop bots from reading content.

Enter PhantomJS

There are a few javascript libraries running on nodeJS that have capabilities to run browser simulations. This basically means that you can run a browser session to a website with a server-side script (i.e. in the background, automatically). PhantomJS is one of those libraries and I found a proper example on how to use it, so there we go.

Browser Workflow

To get the highlights and notes from the Amazon website, this workflow needs to be followed:

Go to https://kindle.amazon.com/
Click the Sign In button
Login with Amazon username & password
Browse to https://kindle.amazon.com/your_highlights
Scroll all the way down until the infinite scroller stops
Save the entire HTML output and parse it to single out individual highlights and notes

After getting this into the PhantomJS script, I had the output of all the highlights & notes, ready to be parsed.

Parsing the HTML

Now that I have entire HTML from the highlights page, I could parse that into individual records and insert those into a database. Considering I’m lazy and all kinds of good people have put out libraries for such things, I used PHP Simple HTML DOM Parser. After that it was a cakewalk to get the individual records and synchronise them to a database.

Sounds Useful?

As with most things that might be useful for someone else, I put this script on GitHub. You can find the direct link below, have fun!