Chomsky hierarchy

You can feed a string into the machine and it will accurately inform you whether that string is a member of the language described by the machine.

What are some real applications of the Chomsky hierarchy in computer science?

Chomsky's hierarchy is not a thing that is used in the same way that the laws of gravity are not used. We don't “apply the laws of gravity” to something; the laws of gravity simply are, and we can use that knowledge to our benefit at certain times. (Note that the laws of gravity are separate from the equations which describe those laws. We can “use” equations, but that's something else entirely.)

The Chomsky hierarchy is descriptive. It simply allows us to broadly categorize grammars based on their capabilities.

Let's get some terms clear:

language — a set of strings over a given alphabet
alphabet — a set of symbols
string — a concatenation of zero or more symbols

The symbols of an alphabet (sometimes instead called “letters” or “tokens”) can be whatever you like, though it's often easiest to pick things which are easy to write, like letters and numbers. We can define an alphabet A = {0, 1}, the alphabet consisting of just the symbols 0 and 1. Now we can make strings from that alphabet, such as “0”, “1”, “01”, “10”, “0001100”, or even the empty string “”. (The strings are wrapped in quotes to make them easier to discern, but the quotes are not part of the strings.)

Now, as I said, a language is a set of strings. Which set? Whatever you like! I can invent an arbitrary language L_0 = {“001”, “010”}. Is “000” in L_0? Nope! Because I defined the language to only contain those two strings.

A grammar is a device that shows how to build all of the valid strings within a language. The grammar describes the language. Essentially, we view the language as a theoretical thing, and the grammar is the concrete way to derive that thing.

More practically, a grammar is a set of production rules which indicate how to construct strings of a given language. I'll assume you're familiar with how to build a grammar, but if not just let me know and I can conjure up some examples.

So where does Chomsky's hierarchy fit into all this?

The hierarchy allows us to broadly classify grammars based on their productive power. This is useful because the levels of the hierarchy correspond exactly to specific machines (called “automata”) that can be built to recognize strings. By “recognize strings”, I mean you can feed a string into the machine and it will accurately inform you whether that string is a member of the language described by the machine.

The (relevant part of the) table provided by Wikipedia is:

Grammar	Languages	Automaton
Type-0	Recursively enumerable	Turing machine
Type-1	Context-sensitive	Linear-bounded non-deterministic Turing machine
Type-2	Context-free	Non-deterministic pushdown automaton
Type-3	Regular	Finite state automaton

There is a direct concordance between these automata and their corresponding grammars. If I write down a context-free grammar, then I can build a non-deterministic pushdown automaton to check membership in that grammar. I can also build a Turing machine to check it, but that would be overkill. I cannot build a finite state automaton to check it, because an FSA is not sufficiently powerful to model the grammar.

The hierarchy has different uses in different fields. I have experience in both linguistics and programming languages research, so I'll elaborate a bit on those.

In linguistics, we study the natural languages — meaning languages actually spoken by people. We usually try to model things as context-free grammars. In truth, CFGs are not nearly powerful enough to describe natural language because natural languages are ungainly and complex. However, writing context-sensitive grammars is very difficult for people to do without spending a lot of time and effort on it. The CFGs, though technically inaccurate, often give us enough to work with that they can be useful. We can (broadly) model syntax. X-bar theory was invented by Chomsky (no surprise) to model some aspects of syntax, and this is essentially the application of CFGs to language modeling. Although X-bar theory has widely fallen out of favor, it's still used in introductory syntax courses to teach certain concepts of language modeling.

We also use grammars in phonetics and phonology, though perhaps not in the way you'd expect. Certain morphological changes can be modeled through grammars, and these can be useful for describing things.

In computer science, grammars are often used to model programming languages. Programming languages are considerably easier to model than natural language specifically because they are designed to be that way. We design them specifically to be efficiently parseable, meaning we can take input, recognize whether it is valid, and determine its meaning. At the present time, we generally use the context-free grammars to build programming languages, as we have developed techniques that are somewhat efficient in this domain. We do not have efficient techniques for parsing context-sensitive languages.

Another use of grammars in computer science is the concept of regular expressions, or “regexes”. A regex is an analogy of a finite state automaton, by which I mean that they are two different kinds of machines that can recognize the same class of languages. (In fact, there are efficient techniques for converting an FSA into a regex and vice versa.) Regexes are generally used by programmers to match text. Perhaps I want to validate that a user has input a phone number of the form NNN-NNNN, where each N is a digit. The regex for this would be [0-9]{3}\-[0-9]{4}. I won't go into depth on how to write regexes (there are books and books about it), but take my word for it that they can be very useful in the right circumstances.

Of course, grammars find even more use in the fields of natural language processing and computational linguistics, which operate at the intersection of computer science with linguistics. (The former is more on the CS side, the latter more on the linguistics side.)

Summary

So, to summarize:

Chomsky's hierarchy is not used in the sense that it is not a tool that we directly apply. It is rather a way to describe grammars. This description informs us (a) of what class of languages can be produced by each type of grammar, and (b) of what machine we can build to model the grammar and thus recognize inputs for a language. But no discussion into any of the topics I mentioned can go on without a fundamental understanding of the powers limits that each category of grammars has relative to the others.

That's a lot of text, so please let me know if you have any questions about anything!

Chomsky hierarchy

Chomsky's Hierarchy

Key Terms

The Hierarchy

Applications

Linguistics

Computer Science

Summary