I've been working on some stats and analytics toosl for Sampa. We want to understand very well what users do, how much they do it and what works well. My most recent struggle is with the definition of "Unique Users".
There are some many ways to define unique users, like:
- Number of unique IPs;
- Number of unique IPs + User-Agent;
- Cookie (computer);
- Cookie (user);
Option #1 underestimates the number of unique users in a period of time because people might be accessing the system behind a proxy, which means that for a large company, all users would be counted as 1 or a just a few. Option #2 is the same technique with the variant of adding the User-Agent. Now, you expect that would improve the accuracy of the data, but most people behind a proxy already have the same user-agent.
Option #3 is the most used by most services and tracking system. Each browser is given a unique identifier cookie, like a Guid, and each access from that browser will send that value back to the server. Now, here are the key problems with that:
- The first time a user access the system it has no cookies;
- The browser might have cookies disabled;
- A user might have multiple browsers on the same computer (like Firefox and IE) and they will be counted as 2 users.
- A user might access the site from different computers, and they are not trully "unique users".
Option #4 only works if the system requires sign-in all that time, like in the case of Orkut, Facebook, Hotmail, etc.
Here is 37Signals announcing 1,000,000 users using Basecamp. Congratulations for them. The most interesting thing about this post, IMHO, is that they disclose what that number means!
It is very rare for web services to do that because they want to give the most inflated number possible, without explaining exactly what it means. This way, they can't be challenged.
Take MySpace for example. They recently announce they have 127,000,000 users on their system. Holly s**t! That is a lot of people. But, what if some of the people there have 2 accounts, one for them, one for the alter ego? Or, what if a lot of users sign up and never really used it (like me). What if a good chunk of those accounts are stale and have not been accessed for more than 6 months?
All that would be ok if MySpace explained what the 127M means, but they don't because they want the press (that influences advertisers) to think it is the number of people that goes to MySpace every month. In reality, their numbers are more like 50M unique users per month, and even that is using a sampling methodology from third parties (ComScore, Alexa) because they don't announce that.