LW – Humans are not Automatically Strategic

Lionhearted writes:

[A] large majority of otherwise smart people spend time doing semi-productive things, when there are massively productive opportunities untapped.

A somewhat silly example: Let’s say someone aspires to be a comedian, the best comedian ever, and to make a living doing comedy. He wants nothing else, it is his purpose. And he decides that in order to become a better comedian, he will watch re-runs of the old television cartoon ‘Garfield and Friends’ that was on TV from 1988 to 1995….

I’m curious as to why.

Why will a randomly chosen eight-year-old fail a calculus test?  Because most possible answers are wrong, and there is no force to guide him to the correct answers.  (There is no need to postulate a “fear of success”; most ways writing or not writing on a calculus test constitute failure, and so people, and rocks, fail calculus tests by default.)

Why do most of us, most of the time, choose to “pursue our goals” through routes that are far less effective than the routes we could find if we tried?[1]  My guess is that here, as with the calculus test, the main problem is that mostcourses of action are extremely ineffective, and that there has been no strong evolutionary or cultural force sufficient to focus us on the very narrow behavior patterns that would actually be effective. 

To be more specific: there are clearly at least some limited senses in which we have goals.  We: (1) tell ourselves and others stories of how we’re aiming for various “goals”; (2) search out modes of activity that are consistent with the role, and goal-seeking, that we see ourselves as doing (“learning math”; “becoming a comedian”; “being a good parent”); and sometimes even (3) feel glad or disappointed when we do/don’t achieve our “goals”.

But there are clearly also heuristics that would be useful to goal-achievement (or that would be part of what it means to “have goals” at all) that we do not automatically carry out.  We do not automatically:

  • (a) Ask ourselves what we’re trying to achieve; 
  • (b) Ask ourselves how we could tell if we achieved it (“what does it look like to be a good comedian?”) and how we can track progress; 
  • (c) Find ourselves strongly, intrinsically curious about information that would help us achieve our goal; 
  • (d) Gather that information (e.g., by asking as how folks commonly achieve our goal, or similar goals, or by tallying which strategies have and haven’t worked for us in the past); 
  • (e) Systematically test many different conjectures for how to achieve the goals, including methods that aren’t habitual for us, while tracking which ones do and don’t work; 
  • (f) Focus most of the energy that *isn’t* going into systematic exploration, on the methods that work best;
  • (g) Make sure that our “goal” is really our goal, that we coherently want it and are not constrained by fears or by uncertainty as to whether it is worth the effort, and that we have thought through any questions and decisions in advance so they won’t continually sap our energies;
  • (h) Use environmental cues and social contexts to bolster our motivation, so we can keep working effectively in the face of intermittent frustrations, or temptations based in hyperbolic discounting;

…. or carry out any number of other useful techniques.  Instead, we mostly just do things.  We act from habit; we act from impulse or convenience when primed by the activities in front of us; we remember our goal and choose an action that feels associated with our goal.  We do any number of things.  But we do not systematically choose the narrow sets of actions that would effectively optimize for our claimed goals, or for any other goals.

Why?  Most basically, because humans are only just on the cusp of general intelligence.  Perhaps 5% of the population has enough abstract reasoning skill to verbally understand that the above heuristics would be useful once these heuristics are pointed out.  That is not at all the same as the ability to automatically implement these heuristics.  Our verbal, conversational systems are much better at abstract reasoning than are the motivational systems that pull our behavior.  I have enough abstract reasoning ability to understand that I’m safe on the glass floor of a tall building, or that ice cream is not healthy, or that exercise furthers my goals… but this doesn’t lead to an automatic updating of the reward gradients that, absent rare and costly conscious overrides, pull my behavior.  I can train my automatic systems, for example by visualizing ice cream as disgusting and artery-clogging and yucky, or by walking across the glass floor often enough to persuade my brain that I can’t fall through the floor… but systematically training one’s motivational systems in this way is also not automatic for us.  And so it seems far from surprising that most of us have not trained ourselves in this way, and that most of our “goal-seeking” actions are far less effective than they could be.

Still, I’m keen to train.  I know people who are far more strategic than I am, and there seem to be clear avenues for becoming far more strategic than they are.  It also seems that having goals, in a much more pervasive sense than (1)-(3), is part of what “rational” should mean, will help us achieve what we care about, and hasn’t been taught in much detail on LW.

So, to second Lionhearted’s questions: does this analysis seem right?  Have some of you trained yourselves to be substantially more strategic, or goal-achieving, than you started out?  How did you do it?  Do you agree with (a)-(h) above?  Do you have some good heuristics to add?  Do you have some good ideas for how to train yourself in such heuristics?


[1] For example, why do many people go through long training programs “to make money” without spending a few hours doing salary comparisons ahead of time?  Why do many who type for hours a day remain two-finger typists, without bothering with a typing tutor program?  Why do people spend their Saturdays “enjoying themselves” without bothering to track which of their habitual leisure activities are *actually* enjoyable?  Why do even unusually numerate people fear illness, car accidents, and bogeymen, and take safety measures, but not bother to look up statistics on the relative risks? Why do most of us settle into a single, stereotyped mode of studying, writing, social interaction, or the like, without trying alternatives to see if they work better — even when such experiments as we have tried have sometimes given great boosts?


Peter Norvig – Teach Yourself Programming in 10 Years

Teach Yourself Programming in Ten Years

Peter Norvig

Why is everyone in such a rush?

Walk into any bookstore, and you’ll see how to Teach Yourself Java in 7 Days alongside endless variations offering to teach Visual Basic, Windows, the Internet, and so on in a few days or hours. I did the following power search at Amazon.com:

     pubdate: after 1992 and title: days and
      (title: learn or title: teach yourself)

and got back 248 hits. The first 78 were computer books (number 79 was Learn Bengali in 30 days). I replaced “days” with “hours” and got remarkably similar results: 253 more books, with 77 computer books followed by Teach Yourself Grammar and Style in 24 Hours at number 78. Out of the top 200 total, 96% were computer books.The conclusion is that either people are in a big rush to learn about computers, or that computers are somehow fabulously easier to learn than anything else. There are no books on how to learn Beethoven, or Quantum Physics, or even Dog Grooming in a few days. Felleisen et al. give a nod to this trend in their book How to Design Programs, when they say “Bad programming is easy. Idiots can learn it in 21 days, even if they are dummies.

Let’s analyze what a title like Learn C++ in Three Days could mean:

  • Learn: In 3 days you won’t have time to write several significant programs, and learn from your successes and failures with them. You won’t have time to work with an experienced programmer and understand what it is like to live in a C++ environment. In short, you won’t have time to learn much. So the book can only be talking about a superficial familiarity, not a deep understanding. As Alexander Pope said, a little learning is a dangerous thing. 
  • C++: In 3 days you might be able to learn some of the syntax of C++ (if you already know another language), but you couldn’t learn much about how to use the language. In short, if you were, say, a Basic programmer, you could learn to write programs in the style of Basic using C++ syntax, but you couldn’t learn what C++ is actually good (and bad) for. So what’s the point? Alan Perlis once said: “A language that doesn’t affect the way you think about programming, is not worth knowing”. One possible point is that you have to learn a tiny bit of C++ (or more likely, something like JavaScript or Flash’s Flex) because you need to interface with an existing tool to accomplish a specific task. But then you’re not learning how to program; you’re learning to accomplish that task. 
  • in Three Days: Unfortunately, this is not enough, as the next section shows.

Teach Yourself Programming in Ten Years

Researchers (Bloom (1985)Bryan & Harter (1899)Hayes (1989)Simmon & Chase (1973)) have shown it takes about ten years to develop expertise in any of a wide variety of areas, including chess playing, music composition, telegraph operation, painting, piano playing, swimming, tennis, and research in neuropsychology and topology. The key isdeliberative practice: not just doing it again and again, but challenging yourself with a task that is just beyond your current ability, trying it, analyzing your performance while and after doing it, and correcting any mistakes. Then repeat. And repeat again. There appear to be no real shortcuts: even Mozart, who was a musical prodigy at age 4, took 13 more years before he began to produce world-class music. In another genre, the Beatles seemed to burst onto the scene with a string of #1 hits and an appearance on the Ed Sullivan show in 1964. But they had been playing small clubs in Liverpool and Hamburg since 1957, and while they had mass appeal early on, their first great critical success, Sgt. Peppers, was released in 1967. Malcolm Gladwell reports that a study of students at the Berlin Academy of Music compared the top, middle, and bottom third of the class and asked them how much they had practiced:

Everyone, from all three groups, started playing at roughly the same time – around the age of five. In those first few years, everyone practised roughly the same amount – about two or three hours a week. But around the age of eight real differences started to emerge. The students who would end up as the best in their class began to practise more than everyone else: six hours a week by age nine, eight by age 12, 16 a week by age 14, and up and up, until by the age of 20 they were practising well over 30 hours a week. By the age of 20, the elite performers had all totalled 10,000 hours of practice over the course of their lives. The merely good students had totalled, by contrast, 8,000 hours, and the future music teachers just over 4,000 hours.

So it may be that 10,000 hours, not 10 years, is the magic number. (Henri Cartier-Bresson (1908-2004) said “Your first 10,000 photographs are your worst,” but he shot more than one an hour.) Samuel Johnson (1709-1784) thought it took even longer: “Excellence in any department can be attained only by the labor of a lifetime; it is not to be purchased at a lesser price.” And Chaucer (1340-1400) complained “the lyf so short, the craft so long to lerne.” Hippocrates (c. 400BC) is known for the excerpt “ars longa, vita brevis”, which is part of the longer quotation “Ars longa, vita brevis, occasio praeceps, experimentum periculosum, iudicium difficile”, which in English renders as “Life is short, [the] craft long, opportunity fleeting, experiment treacherous, judgment difficult.” Although in Latin, ars can mean either art or craft, in the original Greek the word “techne” can only mean “skill”, not “art”.

So You Want to be a Programmer

Here’s my recipe for programming success:

  • Get interested in programming, and do some because it is fun. Make sure that it keeps being enough fun so that you will be willing to put in your ten years/10,000 hours. 
  • Program. The best kind of learning is learning by doing. To put it more technically, “the maximal level of performance for individuals in a given domain is not attained automatically as a function of extended experience, but the level of performance can be increased even by highly experienced individuals as a result of deliberate efforts to improve.” (p. 366) and “the most effective learning requires a well-defined task with an appropriate difficulty level for the particular individual, informative feedback, and opportunities for repetition and corrections of errors.” (p. 20-21) The book Cognition in Practice: Mind, Mathematics, and Culture in Everyday Life is an interesting reference for this viewpoint. 
  • Talk with other programmers; read other programs. This is more important than any book or training course. 
  • If you want, put in four years at a college (or more at a graduate school). This will give you access to some jobs that require credentials, and it will give you a deeper understanding of the field, but if you don’t enjoy school, you can (with some dedication) get similar experience on your own or on the job. In any case, book learning alone won’t be enough. “Computer science education cannot make anybody an expert programmer any more than studying brushes and pigment can make somebody an expert painter” says Eric Raymond, author of The New Hacker’s Dictionary. One of the best programmers I ever hired had only a High School degree; he’s produced a lot of great software, has his own news group, and made enough in stock options to buy his own nightclub
  • Work on projects with other programmers. Be the best programmer on some projects; be the worst on some others. When you’re the best, you get to test your abilities to lead a project, and to inspire others with your vision. When you’re the worst, you learn what the masters do, and you learn what they don’t like to do (because they make you do it for them). 
  • Work on projects after other programmers. Understand a program written by someone else. See what it takes to understand and fix it when the original programmers are not around. Think about how to design your programs to make it easier for those who will maintain them after you. 
  • Learn at least a half dozen programming languages. Include one language that supports class abstractions (like Java or C++), one that supports functional abstraction (like Lisp or ML), one that supports syntactic abstraction (like Lisp), one that supports declarative specifications (like Prolog or C++ templates), one that supports coroutines (like Icon or Scheme), and one that supports parallelism (like Sisal). 
  • Remember that there is a “computer” in “computer science”. Know how long it takes your computer to execute an instruction, fetch a word from memory (with and without a cache miss), read consecutive words from disk, and seek to a new location on disk. (Answers here.
  • Get involved in a language standardization effort. It could be the ANSI C++ committee, or it could be deciding if your local coding style will have 2 or 4 space indentation levels. Either way, you learn about what other people like in a language, how deeply they feel so, and perhaps even a little about why they feel so. 
  • Have the good sense to get off the language standardization effort as quickly as possible.

With all that in mind, its questionable how far you can get just by book learning. Before my first child was born, I read all the How To books, and still felt like a clueless novice. 30 Months later, when my second child was due, did I go back to the books for a refresher? No. Instead, I relied on my personal experience, which turned out to be far more useful and reassuring to me than the thousands of pages written by experts.Fred Brooks, in his essay No Silver Bullet identified a three-part plan for finding great software designers:

  1. Systematically identify top designers as early as possible. 
  2. Assign a career mentor to be responsible for the development of the prospect and carefully keep a career file. 
  3. Provide opportunities for growing designers to interact and stimulate each other. 

This assumes that some people already have the qualities necessary for being a great designer; the job is to properly coax them along. Alan Perlis put it more succinctly: “Everyone can be taught to sculpt: Michelangelo would have had to be taught how not to. So it is with the great programmers”. Perlis is saying that the greats have some internal quality that transcends their training. But where does the quality come from? Is it innate? Or do they develop it through diligence? As Auguste Gusteau (the fictional chef in Ratatouille) puts it, “anyone can cook, but only the fearless can be great.” I think of it more as willingness to devote a large portion of one’s life to deliberative practice. But maybe fearless is a way to summarize that. Or, as Gusteau’s critic, Anton Ego, says: “Not everyone can become a great artist, but a great artist can come from anywhere.”So go ahead and buy that Java/Ruby/Javascript/PHP book; you’ll probably get some use out of it. But you won’t change your life, or your real overall expertise as a programmer in 24 hours, days, or even weeks. How about working hard to continually improve over 24 months? Well, now you’re starting to get somewhere…



Bloom, Benjamin (ed.) Developing Talent in Young People, Ballantine, 1985.

Brooks, Fred, No Silver Bullets, IEEE Computer, vol. 20, no. 4, 1987, p. 10-19.

Bryan, W.L. & Harter, N. “Studies on the telegraphic language: The acquisition of a hierarchy of habits. Psychology Review, 1899, 8, 345-375

Hayes, John R., Complete Problem Solver Lawrence Erlbaum, 1989.

Chase, William G. & Simon, Herbert A. “Perception in Chess” Cognitive Psychology, 1973, 4, 55-81.

Lave, Jean, Cognition in Practice: Mind, Mathematics, and Culture in Everyday Life, Cambridge University Press, 1988.


Approximate timing for various operations on a typical PC:

execute typical instruction 1/1,000,000,000 sec = 1 nanosec
fetch from L1 cache memory 0.5 nanosec
branch misprediction 5 nanosec
fetch from L2 cache memory 7 nanosec
Mutex lock/unlock 25 nanosec
fetch from main memory 100 nanosec
send 2K bytes over 1Gbps network 20,000 nanosec
read 1MB sequentially from memory 250,000 nanosec
fetch from new disk location (seek) 8,000,000 nanosec
read 1MB sequentially from disk 20,000,000 nanosec
send packet US to Europe and back 150 milliseconds = 150,000,000 nanosec

Appendix: Language Choice

Several people have asked what programming language they should learn first. There is no one answer, but consider these points: 

  • Use your friends. When asked “what operating system should I use, Windows, Unix, or Mac?”, my answer is usually: “use whatever your friends use.” The advantage you get from learning from your friends will offset any intrinsic difference between OS, or between programming languages. Also consider your future friends: the community of programmers that you will be a part of if you continue. Does your chosen language have a large growing community or a small dying one? Are there books, web sites, and online forums to get answers from? Do you like the people in those forums?
  • Keep it simple. Programming languages such as C++ and Java are designed for professional development by large teams of experienced programmers who are concerned about the run-time efficiency of their code. As a result, these languages have complicated parts designed for these circumstances. You’re concerned with learning to program. You don’t need that complication. You want a language that was designed to be easy to learn and remember by a single new programmer.
  • Play. Which way would you rather learn to play the piano: the normal, interactive way, in which you hear each note as soon as you hit a key, or “batch” mode, in which you only hear the notes after you finish a whole song? Clearly, interactive mode makes learning easier for the piano, and also for programming. Insist on a language with an interactive mode and use it.

Given these criteria, my recommendations for a first programming language would be Python or Scheme. But your circumstances may vary, and there are other good choices. If your age is a single-digit, you might prefer Alice or Squeak (older learners might also enjoy these). The important thing is that you choose and get started.

Appendix: Books and Other Resources

Several people have asked what books and web pages they should learn from. I repeat that “book learning alone won’t be enough” but I can recommend the following: 



T. Capey points out that the Complete Problem Solver page on Amazon now has the “Teach Yourself Bengali in 21 days” and “Teach Yourself Grammar and Style” books under the “Customers who shopped for this item also shopped for these items” section. I guess that a large portion of the people who look at that book are coming from this page. Thanks to Ross Cohen for help with Hippocrates.



Learn and relearn your field -Terence Tao

Even fairly good students, when they have obtained the solution of the problem and written down neatly the argument, shut their books and look for something else. Doing so, they miss an important and instructive phase of the work. … A good teacher should understand and impress on his students the view that no problem whatever is completely exhausted.
One of the first and foremost duties of the teacher is not to give his students the impression that mathematical problems have little connection with each other, and no connection at all with anything else. We have a natural opportunity to investigate the connections of a problem when looking back at its solution.
 (George Pólya, “How to Solve It“)

Learning never really stops in this business, even in your chosen specialty; for instance I am still learning surprising things about basic harmonic analysis, more than ten years after writing my thesis in the topic.

Just because you know a statement and proof of Fundamental Lemma X, you shouldn’t take that lemma for granted; instead, you should dig deeper until you really understand what the lemma is all about:

  • Can you find alternate proofs?
  • If you know two proofs of the lemma, do you know to what extent the proofs are equivalent?  Do they generalise in different ways?  What themes do the proofs have in common?  What are the other relative strengths and weaknesses of the two proofs?
  • Do you know why each of the hypotheses are necessary?
  • What kind of generalizations are known/conjectured/heuristic?
  • Are there weaker and simpler versions which can suffice for some applications?
  • What are some model examples demonstrating that lemma in action?
  • When is it a good idea to use the lemma, and when isn’t it?
  • What kind of problems can it solve, and what kind of problems are beyond its ability to assist with?
  • Are there analogues of that lemma in other areas of mathematics?
  • Does the lemma fit into a wider paradigm or program?

It is particularly useful to lecture on your field, or write lecture notes or other expository material, even if it is just for your own personal use. You will eventually be able to internalise even very difficult results using efficient mental shorthand; this not only allows you to use these results effortlessly, and improve your own ability in the field, but also frees up mental space to learn even more material.

Another useful way to learn more about one’s field is to take a key paper in that field, and perform a citation search on that paper (i.e. search for other papers that cite the key paper).  There are many tools for citation searches nowadays; for instance, MathSciNet offers this functionality, and even a general-purpose web search engine can often give useful “hits” that one might not have previously been aware of.

See also “ask yourself dumb questions“.