|
"the syntax of endless sentences (the Nile of language, which here overflows and fructifies the regions of truth)" Walter Benjamin on the Style of Marcel Proust |
If you really want to have control over your computer, learn to program. If you only have a limited time to devote to it, learn a scripting language. But what scripting language should you learn? Perl, Python, Tcl/Tk, Ruby, Rexx, Rebol, PHP, Awk, or Javascript? If you just want to create web pages learn Javascript and PHP, but if you actually want to mine texts for material you can use in class, you're going to have to choose a language that is a little bit more sophisticated, a language that can open up files, find words and phrases, and then format them into formats suitable for printing and using in the classroom, a language that will allow you to throw together a simple GUI to provide exercises to your students.
Perl, Python, and Tcl/Tk all provide you with regular expressions that to find phrases in files and easy to write GUI interfaces, but there's also a language you've probably never heard of before: Scheme.
Nowadays, new scripting languages are created everyday. Each seems to duplicate the functionality of the last and none seems to communicate with any of the others. In the middle of this sea of change lies Scheme and C, languages that have been around forever growing slowly and cautiously since the early 1970's. Scheme is a dialect of Lisp whose origins date back to very invention of computers. Scheme is a very small but sophisticated language. It's a medium for research and innovation in computer science. It is also used to introduce new computer science students to programming at MIT (Massachusetts Institute of Technology) the world's foremost technical university. The most classic and respected textbook on programming uses Scheme.
Because of the volume of material available online Scheme is also the ideal language for those who want to educate themselves. There is even a tradition of implementing Prolog the most important language for computational linguistics in Scheme and Lisp.
Nowadays who has the time to put their life on hold and go back to school to learn. There are also millions people around the world who want to learn but who have no access to books. (This problem is particularly acute in the country that's my home away from home, Myanmar.) MIT is trying to remedy this problem by making the material for its courses available free of charge over the internet. For those who want to learn and who are motivated to learn, Scheme is the answer.
The idea behind this page is to point you to the numerous online textbooks for learning Scheme and to supplement these textbooks with small snippets of code that you can use to extract information out of texts that you can use in the classroom. The information you extract out of texts might be the ways in which a word is used, or it might be a particular grammatical construction. Some people are even starting to extract instances of language functions (e.g. polite requests or refusals, ways to ask for information,...) out of texts.
People often complain that scripting languages are slow. Well, that's where C comes in. If there's something you have to get done quickly (like looking for occurences of a set of words in a text) you can rewrite that part in C (the Aho-Corasick algorithm would be the algorithm to use in the example above) and hook it up to the slower scripting part with something known as a "Foreign Function Interface."
C is the language that first pulled programmers out of their assembly code without sacrificing efficiency. Scheme has been a primary vehicle for research, innovation, and education in computer science for decades, before the internet was even a twinkle in computer scientists eyes. Could there be a better language combination for applying new innovative techniques to the solution of practical scripting problems?
One final note about computational linguistics: There are a lot of exciting things happening in computational linguistics right now that are relevant to English teaching:
Unfortunately, most of these innovations are trapped in university research departments and are only available for large amounts of money. Let's try to free them up so we can use them!
The following simple examples use PLT Scheme: