The AOL Search dataset is a collection of real query log data that is based on real users.
The Data Source
The dataset consists of 20M Web queries from 650k users over a period of three months, 440MB in total and available for download. The format used in the dataset is:
- AnonID, an anonymous user ID number.
- Query, the query issued by the user, case shifted with most punctuation removed.
- QueryTime, the time at which the query was submitted for search.
- ItemRank, if the user clicked on a search result, the rank of the item on which they clicked is listed.
- ClickURL, if the user clicked on a search result, the domain portion of the URL in the clicked result is listed.
Each line in the data represents one of two types of events
- A query that was NOT followed by the user clicking on a result item.
- A click through on an item in the result list returned from a query.
In the first case (query only) there is data in only the first three columns, in the second case (click through), there is data in all five columns. For click through events, the query that preceded the click through is included. Note that if a user clicked on more than one result in the list returned from a single query, there will be TWO lines in the data to represent the two events.
Interesting queries, for example
- Users querying for topic X
- Users that click on the first (second, third) ranked item
- TOP 10 domains searched
- TOP 10 domains clicked at