You are currently browsing the daily archive for February 11th, 2008.

Dan Reed has posted an interesting article both on his blog and on the Computing Research Policy Blog about the many problems in computing education. Ever since the dotcom bubble burst, computer science enrollment at universities has declined and even more so for women. So many ideas have been tossed around out there, trying to figure out just where we’re going wrong. Recently I wrote about Robert Dewar’s views on where CS education has failed. He made the case that graduates of most CS programs are incompetent and that employers have to go through a period of re-education. Whereas Dewar sees the problem more in the fact that core principles are not being taught to students, Dan Reed makes the case that core principles are really not necessary for everyone.

Both viewpoints are nuanced and so lumping them into polar categories like that results in major inaccuracies. Reed is not making the point that students shouldn’t be taught about operating systems and Dewar is not making the point that students must be taught assembly language. While many CS graduates are incompetent, learning about operating systems and compiler design is totally worthless to most programmers. Sure, there are certain skills that could applied to other areas and learning stuff like that will give you an appreciation for the various aspects of the field, but most programmers are never going to build a compiler or an operating system. As computer science is increasingly being applied to other fields (biology, chemistry, physics, astronomy, etc), it is crucial for new software engineers to have specific skillsets that aren’t being taught (and I mean CS skills). Reed makes the point very clearly:

First, as researchers and technologists we seek to reproduce students in our technical image, failing to acknowledge that most of our students will not develop compilers, write operating systems or design computer chips. Rather, they benefit from training in logical problem solving, knowledge of computing tools and their applicability to new domains.

Like any entrenched system (bureaucracy), it is easy for computer science educators to fall prey to the lament that “CS grads these days are not like they used to be.” I’m going to go out on an anthropological limb and say that’s a human universal. The day will come (and I think it already has) when there is just too much core CS information to feed into our brains and to continue to try to cram that into young learners is going to result in spillover and disillusionment. There will always be people capable of soaking all of it up (though they will become rarer as the volume increases), but we must be aware of the futility of over-educating. Let me be clear, in a four year program, I believe it is more of a disservice to students to give them a shallow but broad understanding of the computing field (thereby making them incompetent) than it is to give them a deeper understanding of a subfield where they will be competent but lacking in other so-called core areas.

So I have a couple off-the-cuff ideas that need to be refined but which I want to put out there. All of these core principles can be boiled down into the true essentials, the things programmers actually need to know to do their jobs. Instead of having classes on computer architecture, operating systems, compilers, etc., combine those concepts into one or two classes with a name like “Core Computing Principles.” As Reed points out, the focus should be on teaching algorithmic problem solving skills and logic. From there, students can pursue different directions like theoretical CS, natural language processing, or large-scale systems. An undergraduate education that puts a stronger focus on statistical methods would have been hugely helpful for me. Having a broad range of options that are mapped out for students who really don’t have a clue how to get there, but know basically where they want to go, would be great.

In any case, there are many views and some will side with Dewar, some with Reed. Ultimately I think the field will settle closer to Reed’s side. I’m looking forward to hearing some of the ideas the CRA-E committee that Reed mentioned (pdf) will come up with.

When you go to a search engine, you have an information need. There is something you are searching for that you can only articulate imprecisely and you do so in a few words. People are good at determining if something satisfies their information need, but not so great at stating it clearly. Librarians are trained to elicit this information need from you, by force if necessary. (Just kidding, librarian mafia, don’t hurt me!) Their method is a dialogue where they probe the various aspects of what you are searching for, what you are not searching for, what you already know about it, etc.

A search engine can’t engage in this dialogue, yet, but think about how you interact with a search engine. You start off with this information need (at whatever degree of vagueness) in mind and probably compose a short 2-3 word query. How often do you do one word queries? We’ve been trained by search engines that this rarely succeeds unless it’s a low-frequency word (or a brand name or jargon). Our first query brings up some useful stuff perhaps, but usually we see that we weren’t thinking clearly about our information need and we begin honing it over the next couple queries until we find what we need. Some people are better at forming this mental picture and stating clear queries from the beginning [citation needed], but most people need to narrow it down.

These queries we use for Google are often purely keyword queries, though sometimes we use slightly more sophisticated queries with link: or site: (etc) operators. You can make sure terms are included with the + operator and excluded with the - operator. You can even use wildcard operators (*) which can be nice (but also touchy). What you can’t do are structured queries. You can’t search for things like (nice or sweet) and (man or guy). You can’t search for words that co-occur in certain spans of documents (like 50-word windows). These things can be very helpful to an experienced researcher and having this ability over a web corpus the size of Google’s would be enormously helpful. Unfortunately, the computational and storage costs of such a thing are much higher.

So my question for you, reader, is would you even use this?  Would this be used by very many people or just the odd few researchers, paralegals, etc?  Computationally, I think Google could handle this.  The problem would come from the larger index to handle supporting such queries.  Even this would probably not be unreasonable for Google at this point.  So… why not?  My guess is the cost of doing such a thing (moderate to high) versus the customer demand (low to nil).

Am I wrong?

It is about 11 degrees Fahrenheit (-11.7 Celsius) with a wind chill of -1 (-18.3) degrees here in Pittsburgh at the moment. As such, the dogs should get sweaters, right? Well, Willow doesn’t really need it. She loves the cold, but she’s cute in it anyway.

My australian shepherd Willow in her winter sweater.

Subscribe to my RSS feed.

About Me

Jason M. Adams

My name is Jason Adams and I work on opinion mining for a growing startup in Atlanta, GA.

Calendar

February 2008
S M T W T F S
« Jan   Mar »
 12
3456789
10111213141516
17181920212223
242526272829  

Archives

Site Statistics

  • 120,089 reads

Site Information

Contact me: jaso...@gmail.com

Creative Commons License

This work by Jason M. Adams is licensed under a Creative Commons Attribution 3.0 License.

Header image credit seakwenby.

Twitter logo by Siah Design

Random Crap