XPath – Utilities to get them

When parsing HTML, the recommended way is to use XPath queries or dom traversal to get the desired elements. Getting the XPath to a specific set of element can often involve a bit of trial and error. However, both Firebug and Chrome Developer Tools offers utilities to help this process.

Using Firebug, simply select the node you want to get using inspect, then right click the selected node and click “Copy XPath”:

The process is the exactly same in Chrome Developer Tools:

Both ways will copy a XPath selector into your clipboard.

Scraping Google Search Results

Let’s take another example. We want to extract the link to all sites for a regular Google Search.

First, issue a regular search. Then open up Firebug or Chrome Developer Tools. Right click on the a-element for any SERP item and click “Copy XPath”. I got*

/html/body/div[4]/div[2]/div/div[4]/div[2]/div[2]/div[2]/div[2]/div/ol/li[5]/div/h3/a

Evaluate this XPath in your console using the $x() method (both Firebug and Chrome supports this). That will yield one element.

The next step is to generalize this query to match all elements. We know that each serp item is a list and our query contains li[5] (I did choose the fifth SERP item). We can simply remove the indexer ([5]) to get all list items. Our query is now:


/html/body/div[4]/div[2]/div/div[4]/div[2]/div[2]/div[2]/div[2]/div/ol/li/div/h3/a

This will give us all a-elements. However, we want the actual URL. We can apply an attribute selector to retrieve this:

/html/body/div[4]/div[2]/div/div[4]/div[2]/div[2]/div[2]/div[2]/div/ol/li[5]/div/h3/a/@href


As we can see, this query does return the href-attribute for all SERP items. Easy, right?

What more?

I recommend using XPath Checker for FireFox to simplify this process even further. This extension makes it so much easier to evaluate queries and see the matching results.

If you want to do this programatically from C#, use HtmlAgilityPack. It’s a powerful library which makes this a breeze. See my previous article on Parsing HTML with C# for example on how to do it.

If you’re using PHP, a combination of DOMDocument and DOMXPath will do the job.

*Note that Googles markup can change depending on which datacenter you end up on.


  • Obamamoney

    What’s needed now is

    XPath in adblock for chromium

    CSS selectors alone is insufficient

    XPath allows for RANGE of elements

    XPath will allow adblocking no matter what elements are named

    death to ads raping user privacy!!