What People Search First at Duck Duck Go

by Gabriel Weinberg, May 2009

On the first day Duck Duck Go was publicly available I noticed a strange recurrence of the query boobs. It turns out this is a common query for new visitors.[1] Here's the top thirty:

1. google 11. python 21. wikipedia
2. test 12. java 22. news
3. duck 13. linux 23. fuck
4. sex 14. reddit 24. cars
5. duckduckgo 15. yahoo 25. hi
6. porn 16. ducks 26. youtube
7. duck duck go     17. cnn 27. boobs
8. hello 18. new search engine     28. cuil
9. facebook 19. iphone 29. ruby
10.  obama 20.  twitter 30.  india

You can see the early adopter tech crowd in there. This makes sense given our exposure on tech blogs and forums.

The table misleads the overall prominence of "dirty" keywords, as they are only 5% of total new visitor queries.

We tried to remove non-natural queries, but there are still a few that made it in.[2] In particular, 'cnn' was mostly likely related to our logo on CNN, and 'iphone' to our iPhone app.

12% of new visitors searched for at least one personal (non-celebrity) name in their first visit.[3] Overall, 10% of new visitor queries were for personal names.

The average new visitor query length was 2.0 words. 1-word, 2-word, and 3-word queries made up 41%, 34%, and 14%, respectively.

Finally, here's a new data point for the reddit/Hacker News debate, i.e. how similar Hacker News is to reddit. The following tabulates the above stats for the subsets of new users who first came from reddit, Hacker News, or otherwise.

reddit Hacker News Other
avg. words 1.7 1.7 2.1
dirty % 5 3 5
names % 8 9 10
1 name % 13 17 11
top queries     google
test
reddit
duck
porn
python
palin [4]
duckduckgo
haskell
hello
cuil
linux
duck duck go    
obama
java
google
test
duck
duckduckgo
python
reddit
ruby
hello
lisp
sex
duck duck go    
hacker news
django
obama
ycombinator
google
test
sex
duck
apple
porn
duckduckgo
duck duck go
facebook
hello
obama
java
linux
yahoo
cnn

Notes

[1] New visitors were defined as such by the first occurrence of their IP+useragent. We used IP+useragent because we do not currently use cookies. The first visit included all requests before a request that occurred at least an hour after the last one, at which point we classify them as on their second visit.

[2] We omitted new visitors that didn't come in through our front, about or query pages, which leaves out visitors who originally came from a search bar, toolbar or other static page. We omitted duplicate queries by the same visitor. We also omitted queries that were unduly influenced, e.g. referred directly from a blog article; however, we did allow those same query terms if they were typed in directly.

[3] We used Wikipedia to detect celebrity names.

[4] Most queries in this reddit subset came from a 09/26/08 post, i.e. right before the last US presidential election.

About Duck Duck Blog