Yeah, who would have guessed that programmers spend more time on computers than judo players?
I believe today is a first. It is the first time I have ever posted the same thing on both blogs. In fact, I was planning on writing about the Wilcoxon signed-rank test today and how I applied it to -- well, never mind, I know you're not interested, and besides, here is what I wrote about instead.
I'm doing a workshop at the San Diego SAS users group meeting on Wednesday and had suggested opening the session with a clip of my daughter's last amateur fight. Someone politely commented,
"Uh, I guess that would be okay, if it was, uh, relevant."Fair question, how can martial arts be related to statistics or to programming?
I was world judo champion, so I think I can claim a bit of knowledge of martial arts. In teaching over the years, I have seen thousands of up and coming young players, what I would consider the programming equivalent of those at the intermediate level -no longer a novice but not quite to the expert level yet, either. What the most promising of those martial artists have in common with the most promising young programmers and statisticians is, unfortunately, too often the same thing. They are in a hurry. They believe their own press.
They are enamored of the latest technique someone is doing in the Olympics or they want to do whatever the newest form of complex sampling - Rasch - IML - hierarchical -neural network model is without nailing down the basics first.
Here is what I have learned:
- Get off to a good start - make sure that you have the correct data set. Seems pretty obvious, doesn't it? About once a year, someone sends me the wrong data, data from the previous year or month, the data set that was not corrected for invalid data, etc.
- Nail down the basics - make sure you completely understand the data you will be using. Do a reality check. Does an average income of $120,000 a year make sense to you? It's amazing to me the number of times that people think not having ERROR show up in the log means that there are no errors in the program. Don't just count on automated rules like there should be a non-negative minimum for age, weight, height, etc. Some of the biggest screw-ups I have seen are because the programmer did not reverse code the items before scoring. It wasn't that the person didn't know to do this, he or she just didn't think of doing it. Just like in martial arts, the things that are fundamental should be over-learned until they are a reflex.
- Automate what you can - I did the same "boring" matwork drills 100 times a night for year after year until I did them almost as a reflex. When my daughter hits certain positions, she will automatically spin out and land on her feet or rotate into an armbar. With programming, it's even easier. If you do the same thing over and over, turn it into a macro.
- Automation takes time - just like the boring drills, people resist writing macros because it takes time, and it seems, when you are doing it, to take time from the really important things that are going to make you better. (I already KNOW that armbar, Sensei, why are we doing it again?) I'd be embarrassed to tell you for how many projects I wrote essentially the same code before sucking it up, taking the time to turn it into a macro and rarely thinking about it again.
- There are many ways to the same goal. Whether you are using SAS, Ruby, SPSS or whatever your flavor of the month is, there are multiple ways to parse text, test relationships, validate your data.
- Size matters. What works on an opponent (or data) that is really big may be inefficient or inappropriate in a smaller situation.
- You can't learn it all from a book. This is a rather discouraging fact since I am just now writing a book on training champions in martial arts. The fact is, though, most statisticians I have met came out of graduate school unprepared for the real world. I hate that term by the way. I worked at universities for many years and if they really are an alternate universe, I think they should have flying cars and a unicorn or two. Still, one way in which universities do resemble an alternate universe is that data are all perfect and you're often told what statistical test you need to use. It's really very weird to me - you're asked a lot to prove theorems and equations, which you can look up, but the stuff you can't look up, like handling missing data or drawing conclusions based on incomplete and imperfect data, doesn't come up nearly as often as it should.
- People can train you, but once you're an expert, you're on your own. When you're out there on the mat fighting, you need to figure out the right thing to do all on your own. Many years ago, I was visiting my former advisor. I showed him an article I was working on at the time and asked his opinion if the analysis and conclusions were correct. I was a little dismayed when he said, "Probably. Your guess is as good as mine. What are you asking me for? You know this stuff as well as I do. Look, there comes a point when you aren't a student any more. You can consult with other people, you can read books, but in the end, you find the answers for yourself and they're as right as you know how to make them. That's it. No one has the answer key for the whole field, you know."