Archive for March, 2011

Financial planning for hackers

March 31, 2011

There are lots of guides on financial planning.  But how should they change for hackers?  Here’s 5 rule tweaks to follow:

  1. Buy lattes
  2. Don’t buy a house
  3. Buy more bonds
  4. Keep a larger cash reserve
  5. Don’t try to beat the market, even if you can

Number 1: Buy lattes.

Every financial planning book has some example where if you cut out buying a latte each day, then you’ll end up a millionaire in 30 years.  Take this into account, but as a hacker you should be buying more lattes than the average person, as long as you’re having them with other people.  Why?  Because in the startup world you need to network, network, and network.  How much would you pay now to have a coffee with Zuckerberg or Page or Brin?  Well, 15 years ago it would have been just a couple of dollars.  Hackers meet more future CEOs than other people, so they should be investing more in developing their social networks, and that means buying more lattes.

Number 2: Don’t buy a house

Rather, be less eager to buy a house than other people.  Renting is buying flexibility – the flexibility to move around Silicon Valley or across the country to join a promising new thing.  You can take advantage of that flexibility more than the average person, so tilt towards renting more.

Number 3: Buy more bonds

As a hacker, you’re in a risky business, and already highly exposed to the stock market via possible options.  You should buy more bonds than the average person to reduce that risk.

Number 4: Keep a larger cash reserve

Everyone should have a cash reserve, but you should have a larger one than most.  First, because we’re in a bubble and who knows when it’s going to burst.  But, most importantly, it gives you the freedom to not have income for some time while you’re building the next big thing.  Other people aren’t going to do this – they don’t need as much as you.

and finally, Number 5: Don’t try to beat the market, even if you can

You probably can’t beat the market, so don’t bother.  But suppose you (think you) can.  Even then, don’t bother getting fancy.  Why?  Because the time you spend on perfecting your trading algorithm is time taken away from networking, increasing your skills, or perfecting your real money-making idea. (Of course, if you’re planning to work in finance, then your trading algorithm is your money-making idea, so work on that, and get coffees with hedge-fund managers.)  Scott Locklin recommends you invest in small businesses rather than the stock market, but that sounds like work.

The average return is fine – wouldn’t you rather have 5% return on a billion dollars than 7% on ten thousand?

(Disclaimer: I am not a financial planner.  You should not get your financial planning advice from some random guy on the Internet.)


Will the Libyan war be successful? Let’s use math to find out!

March 25, 2011

We use forecasting models to predict everything from the climate to the stock market, so why not our current war?  We’ll talk through some issues and touch on five principles of forecasting along the way.

First, why do we need a forecasting model at all?  That’s our first principle – any model is better than no model.  People are emotional and ignore data that doesn’t fit their preconceptions – they’re very bad at predicting what’s going to happen.  No matter how bad your model is, it’s going to be better than just going with your gut.

So how complicated should our model be?  That brings us to our second principle – more complicated models are not much better.  Deep in the bowels of the Pentagon the Libyan situation is doubtless being war-gamed in all sorts of ways, with models of great complexity.  But much of this detail will be a waste of time in predicting the outcome.  It doesn’t really matter exactly what Gadaffi’s 53rd Armor Division does – if you want to know how this war will turn out, look at what’s happened with similar countries and similar armies.  In fact, the dirty little secret of statistical modeling is that even exponential smoothing does a pretty good job at forecasting, so if you just guessed this war would go like a weighted average of other recent wars, you’d beat most of the experts.  Exponential smoothing doesn’t get you too much respect, though, so we might try linear or logistic regression.

OK, so if we don’t pay too much attention to modeling, what should we be focusing on?  That’s our third principle – it’s all about the data.  To predict how this war will turn out, we’ll need to look at other U.S. wars.  But how far back?  Vietnam, like the left says?  WWII, like the right wants?  The Spanish-American War?  The Barbary Pirates?  And should we include the wars of other countries?  Libya with Chad?  The U.S.S.R with Afghanistan?  What variables are going to be important?  I’m guessing population size, military technology, terrain, for starters.  Some of these will be available easily, some we’ll have to manually code (well, I suppose you could develop a numerical measure of Afghanistan’s mountain-ness, but would it be worth it?)  It’s all going to be a mess, and we’re going to have to make decisions between contradictory sources.  And what are we trying to predict?  Length of war?  Casualties?  Cost?  Whether it’s a “success”?  How do we measure any of those?  It’s all about the data.

With so many possible variables, we’re going to need to pay attention to our fourth principle – beware of overfitting.  For example, we might have a country-specific variable in our model.  This would look good for the U.S., as Libya seems to have lost every war it was involved in, from the Barbary Pirates on.  But is that a good enough basis for our model?  Maybe so, as Afghanistan seems to have defeated every invader from Alexander on.  Still, worth paying close attention to.

And our final principle – models are only useful if they’re used.  I would be happy to hear that the President has some Excel spreadsheet showing him the likely results of invading every country from Azerbaijan to Zimbabwe.  But I’m guessing there’s nothing in-between the Pentagon’s insanely complicated wargames and the uninformed opinions of politicos.  Don’t be disheartened, though – you can still use your model to try and cut through the propaganda and plain old wishful thinking you’ll hear from all sides.

What about my prediction?  I’m going to go with an average of the last 6 big wars  “success” – determined and selected in a completely arbitrary way – the 2nd Iraq war (50%), Afghanistan (40%), Serbia (80%), Haiti (90%), Somalia (10%), and the 1st Iraq war (95%).  Gives us 61% – not a debacle like Somalia, less of a mess than Afghanistan, but not one of the greatest success stories, either.  It’s a stupid model and stupid data, because nobody’s paying me for it, but, following our first principle, it’s better than nothing.

How to tell if it’s a bubble

March 24, 2011

If you’re asking, “Is it a bubble?”, then it’s a bubble.

If your friends are asking “Is it a bubble?”, then it’s a bubble.

If there’s lots of discussion as to whether it’s a bubble or not, even if many people don’t think it’s a bubble, then it’s a bubble.

It’s a bubble.

Twitter’s “White People Stink” problem

March 18, 2011

More headaches for Twitter today, as “White People Stink” becomes a trending topic.  And it could have legs, as people see it trending , and tweet complaints, thus keeping it popular.  Not to mention the spammers who jump on the bandwagon.  It’s a Reply-To-All with millions of addresses.

Of course this is a big problem for the gentrifying Twitter.  It wants to make money from sponsored trends, but that will only come from big brand advertisers, and they’re notoriously controversy-averse (just ask Gilbert Gottfried).  Will Ford, Disney, or NBC want to have their trend next to White People Stink, linking them in the minds of Middle America?  No way.

What will happen?  Twitter is trying out some algorithmic fixes behind the scenes, but eventually will have to do some more manual curation.  When it comes to money, you can’t be too careful.

What are the picks and shovels of the digital Gold Rush?

March 17, 2011

Everybody knows that in the Gold Rush you didn’t get rich by staking a claim – the real moneymaker was selling picks and shovels to miners.  So what are the picks and shovels of our second digital Gold Rush?  Computing power?  Storage?  Both cheap commodities now.  Data analysis?  Maybe, but enough people are already talking about that.  Today’s shovel is the lowly projector.

I was visiting a client the other day for a presentation.  The projector didn’t work.  Projectors never work.  They didn’t work in the first dot-com bubble either, and everyone hates them.  If you make a projector that works, you will make a fortune.

Here’s the numbers.  Just like everyone in L.A. is working on a screenplay, everyone  in San Jose is working on a startup pitch (if not, why on earth do they live in San Jose?).  1 million projectors at $50 profit per unit means $50 million in the first year (yes, the babies have their pitches, as do some of the more intelligent cats).  And we haven’t started on Seattle, Portland, or the other wannabe tech hubs.

So how come I’m not doing this?  Three reasons.  First, ideas are cheap – it’s execution that matters, and that sounds like work.  Second, I’d have to learn about projectors, and that sounds kind of dull.  And third and most important, I already have my ticket to the top planned.  It’s a startup that data mines the social graph to send hyperlocal coupons direct to your smartphone … and we make a game out of it!  Funding, you ask?  Not to worry.  I’ve got a killer Powerpoint deck and 15 minutes on the calendar of a top VC.  What could go wrong?

Things I wish I’d known before I started using R

March 11, 2011

I’ve been using R for a couple of years now.  This post is aimed at me a couple of years ago, or you if you’re just starting to use R and are pressed for time.  Here’s some things I wish I’d known in early 2009.

  1. Use a naming convention
  2. read.csv is a great function, but be careful
  3. doBy is not just a city in the Middle East
  4. attach is more trouble than it’s worth
  5. Many packages are poor
  6. There’s a lot of useful blog posts out there

Use a naming convention.  You should probably have a naming convention whatever language you’re using, but you really need one with R, thanks to features like not needing to declare variables, and partial matchingGoogle has one – I think that putting dots in variable names is asking for trouble, but it would be fine – the important thing is just to pick one.  (And you should probably figure that Google knows more about how to write good code than I do).

read.csv is a really great function.  But it has some gotchas – e.g. default options may convert numbers to factors.  Dealing with data is a whole other post, but you can always convert back using as.numeric(as.character(f)), or go through the documentation carefully (see stringsAsFactors, colClasses), or perhaps best yet, use Python or some other scripting language to pre-process the data (it’s always a mess).

doBy is a great package, covering 95% of what you need in processing data by groups.  People swear by plyr too, but try doBy first.

attach – I’ll let Google’s R style guide take this one:

The possibilities for creating errors when using attach are numerous. Avoid it.

Many packages are poor.  My friend John Mount has a whole tutorial (the cranky guide to trying R packages) on this:

The summary is: expect errors, search out errors and don’t start with the built in examples or real data.

Why do I bring this up?  Well, it’s not just to criticize package designers who fail to do even minimal QA.  If you’re using a language and something doesn’t work, your first instinct is (hopefully) to think that you’ve got something wrong.  If you’re using a contributed package, give a bit more weight to the idea that they’ve got something wrong (and don’t rule it out even for something in core R).

And finally, there’s a lot of useful resources out there.  It’s my blog, so I’m going to point you to my “R in production systems” post, but John Mount and Nina Zumel’s posts on R are an excellent read, particularly Survive R.  I’ve also liked Quick-R for SAS / SPSS / Stata users, and you probably will have your favorites.

What am I missing?  Add it in the comments.

The Gentrification of Twitter

March 11, 2011

Everyone is upset that Twitter is leaning hard on third-party apps.  But contrary to the usual saying, if you are outraged, you’re not paying attention.

If you look at urban neighborhoods, you see the same pattern over and over again.  There’s some unattractive area – artists move in, then their hipster friends follow the cool, until finally it gets so desirable that the artists, hipsters, and whoever was there in the first place are all replaced by some painfully dull bourgeois types who are now the only people who can afford to live there, and proceed to blandify it to death.  Why does it happen?  Because the squares have the money, so whoever owns the horrible garrets can win big by converting them into lovely condos.  The hipsters whine that they made the neighborhood, but nobody cares, so they have to repeat the cycle somewhere else.

It’s the same with Twitter.  The geeks helped popularize and build it, but now that the Fail Whale has become an endangered species, the landlord has come a-calling.  The condos are really more like ticky-tacky Kim Kardashian and Justin Bieber little boxes, but the principle’s the same.  The only surprise is that anyone’s surprised.  It’s always been this way, and will always be this way.

What to do?  Move on.  Make your own Twitter.  How hard could it be?  If you want to stop the cycle from repeating, say that everyone has to solve a quadratic equation before they can post.  Or maybe that they have to figure out some Python code.  But quit whining, and leave the database keys under the mat.