Archive for July, 2011

Anecdata

July 24, 2011

Quantitative people usually treat anecdotes with disdain, but more for sociological than statistical reasons.

There’s a saying that “the plural of anecdote is not data”. Funnily enough, it started out without the not, which makes more sense, for what is an anecdote if not a datum? But it was important for economists to distinguish their ideas from the just-so stories of other social scientists, and so the not stuck.

There is something to the hostility, though. One datum should generally not affect your estimates by much. So if you’re trying to work out the national inflation rate, it’s not worth paying attention to your cousin’s complaints about how they’re paying more for broccoli now. Just let the government statisticians do their job of aggregating the millions of price points.

The problem is that most quantities you’re interested in aren’t like that. Want to know if we’re going into a recession? You could wait for the NBER, which will tell you a year after it’s happened. Or you could ask a friend in a volatile industry like consulting if she’s finding work. Mathematically, if what you want isn’t being collected at the right time or in the right level of detail, then one point can and should make a big difference to your estimates.

Using these anecdata is tricky, particularly if you take Bayesianism seriously. For just one example, how do you deal with selection bias, where you’ll hear about successes more than failures? But there’s no reason not to try.

Google+ and the four card problem

July 12, 2011

Google came late to social, so all the good words were taken. Facebook has friend, Twitter has follow, and LinkedIn has connect. Google+ has the less intuitive circle. A minor terminological difference? Maybe not, if you think about the four card problem.

The four card problem is a logic puzzle. You have four cards, each of which has a number on one side and a colour on another. You have to check that if the number is even, then the colour is green. The cards show 3, 8, red, and green. Which cards do you need to turn over?

Very few people get the right answer (8 and red) but you can restate the problem in equivalent terms. You’re a police officer, and need to check that the alcohol rules are being followed. Each card represents a person, with their age on one side, and what they’re drinking on the other. The cards show 14, 35, a coke, and a beer. Everyone gets this – you check the 14 year old and the beer drinker.

The point is that context matters. We’re very good at checking if someone’s cheating, and have similarly good understanding about social relationships in general. The terminology for the big 3 social sites give the functionality away. We know that friendship is symmetric (if I’m your friend, then you’re my friend) and not transitive (friends of friends are not necessarily your friends). We know that following (aka stalking) is not symmetric or transitive. And we know that connecting is symmetric, and (thanks to Six Degrees of Separation) it has a transitive aspect. No further explanation is necessary.

What about Google+? What does it mean to add someone to one of your circles? Do you get to see their stuff? Do they see yours? The name doesn’t tell you, and may even mislead – a circles of friends commonly being a group that’s all connected to each other. Does this doom the project? Probably not, but it doesn’t bode well to have your most basic operation be confusing.