by Gabriel Weinberg, May 2009
On the first day Duck Duck Go was publicly available I noticed a strange recurrence of the query boobs. It turns out this is a common query for new visitors.[1] Here's the top thirty:
| 1. | 11. | python | 21. | wikipedia | |
| 2. | test | 12. | java | 22. | news |
| 3. | duck | 13. | linux | 23. | fuck |
| 4. | sex | 14. | 24. | cars | |
| 5. | duckduckgo | 15. | yahoo | 25. | hi |
| 6. | porn | 16. | ducks | 26. | youtube |
| 7. | duck duck go | 17. | cnn | 27. | boobs |
| 8. | hello | 18. | new search engine | 28. | cuil |
| 9. | 19. | iphone | 29. | ruby | |
| 10. | obama | 20. | 30. | india |
You can see the early adopter tech crowd in there. This makes sense given our exposure on tech blogs and forums.
The table misleads the overall prominence of "dirty" keywords, as they are only 5% of total new visitor queries.
We tried to remove non-natural queries, but there are still a few that made it in.[2] In particular, 'cnn' was mostly likely related to our logo on CNN, and 'iphone' to our iPhone app.
12% of new visitors searched for at least one personal (non-celebrity) name in their first visit.[3] Overall, 10% of new visitor queries were for personal names.
The average new visitor query length was 2.0 words. 1-word, 2-word, and 3-word queries made up 41%, 34%, and 14%, respectively.
Finally, here's a new data point for the reddit/Hacker News debate, i.e. how similar Hacker News is to reddit. The following tabulates the above stats for the subsets of new users who first came from reddit, Hacker News, or otherwise.
| Hacker News | Other | ||
| avg. words | 1.7 | 1.7 | 2.1 |
| dirty % | 5 | 3 | 5 |
| names % | 8 | 9 | 10 |
| 1 name % | 13 | 17 | 11 |
| top queries |
google
test duck porn python palin [4] duckduckgo haskell hello cuil linux duck duck go obama java |
google
test duck duckduckgo python ruby hello lisp sex duck duck go hacker news django obama ycombinator |
google
test sex duck apple porn duckduckgo duck duck go hello obama java linux yahoo cnn |
Notes
[1] New visitors were defined as such by the first occurrence of their IP+useragent. We used IP+useragent because we do not currently use cookies. The first visit included all requests before a request that occurred at least an hour after the last one, at which point we classify them as on their second visit.
[2] We omitted new visitors that didn't come in through our front, about or query pages, which leaves out visitors who originally came from a search bar, toolbar or other static page. We omitted duplicate queries by the same visitor. We also omitted queries that were unduly influenced, e.g. referred directly from a blog article; however, we did allow those same query terms if they were typed in directly.
[3] We used Wikipedia to detect celebrity names.
[4] Most queries in this reddit subset came from a 09/26/08 post, i.e. right before the last US presidential election.