Sunday, April 15, 2012

What is a palindrome II? A specification in Haskell

In this blog post I will describe the concept of a palindrome more precisely than in my last blog post, by giving the property that determines whether or not a sequence of symbols is a palindrome in the functional programming language Haskell.

Why is a specification of a palindrome as a program more precise than the specification given in text in the previous blog post, which was taken from the Oxford English Dictionary (OED)? And why should someone interested in the concept of palindromes be interested in a program that describes the concept? These are philosophical questions, to which computer science helps to give an answer. The purpose of a definition is to distinguish certain things from other things. The definition of a palindrome distinguishes palindromic words or sequences of words from non-palindromic ones. How do I determine if a sequence of words is a palindrome? The textual definition in the OED says that it should read the same backwards as forwards, letter for letter. The first example in the OED is "Lewd did I live, and evil I did dwel", which doesn't read the same backwards and forwards, unless capital and underscore letters are considered equal, and comma's are ignored. Hmmm. Are there any more exceptions I should know of?

A much better way to define the concept of palindromes is to provide a program, which takes a string as input, and which returns either yes or no, determining whether or not the input string is a palindrome. The OED definition leads to three different programs for determining whether or not an input string is a palindrome: a definition which compares strings letter for letter, and only accepts string which are literally the same forwards and backwards, a definition which ignores punctuation and capitalization, and a definition which compares DNA strings. Such a program is a much more precise way to define what it means to be a palindrome. To determine if a string is a palindrome it suffices to run the program on the string. To study the concept of palindromes, it suffices to study the program. No surprises.

Not just palindromes profit from being defined by means of a program, many other concepts, ideas, and regulations would be better off being specified as a program. Besides the obvious anagrams, pangrams, ambigrams, and so on, tax laws, testaments, exam regulations, and anything that follows some rules is best described by means of a program. A program is precise about the rules, the exceptions, and when they apply. If a concept cannot be described precisely, using a program becomes harder. I wouldn't know how to construct a Turing test: a program to determine whether or not a species is intelligent. And even a simple concept like a chair is not easily captured in a program.

Haskell is a programming language celebrating its 25th anniversary in 2012. It was conceived at an academic conference on functional programming in 1987, in an attempt to combine efforts in developing a purely functional programming language. In the nineties of the last century it was used for research on advanced programming language concepts, and in teaching at many universities. Since a number of years it is becoming more popular in industry, in particular in the investment banking industry, where it is used to specify and implement involved mathematical financial models. I use Haskell because it allows me to describe ideas precisely and concisely. I will introduce the necessary Haskell concepts as I go.

This blog post itself is a literate Haskell program. If you save it as a file and make sure its name ends in .lhs, you can load it in ghci, an interpreter for Haskell programs, and use the definitions given in this post. To make this work I need the following two lines.


> import Prelude hiding (reverse)
> import Data.Char hiding (isLetter)

The first line says that I can use all the standard Haskell functions except the function reverse, which I am going to define in this blog post. The second line says that I can use all basic functions on characters, such as the function isSpace, which tells whether or nor a character is a space. The function isLetter is excluded, since I am going to define that function in this blog post too.

I use the following notation for a list of characters. The empty list of characters is denoted by [] and a symbol x followed by a list of characters xs is denoted by x:xs. An individual character is surrounded by quotes to distinguish it from arbitrary variables like x. For example, in this notation, I can write "refer" as 'r':'e':'f':'e':'r':[]. Furthermore, I can concatenate two lists of characters by means of the operator ++, so that "Madam, " ++ "I'm Adam" is "Madam, I'm Adam".

How can I determine whether or not a string (a list of characters) is a palindrome? The simplest method is to reverse the string and to compare it with itself. So the string xs is a palindrome (palindrome xs), if xs is equal to its reverse: xs == reverse xs, where xs == ys is True only when the strings xs and ys are exactly equal.


> palindrome xs  =  xs == reverse xs

This equation is actually a program in Haskell. We can load this program in ghci and run it. For example, palindrome "abba" evaluates, unsurprisingly, to True, and palindrome "yabadabadoo" evaluates to False.

How do I compute the reverse of a string? For this purpose, I define a recursive function: if I know how to calculate the reverse of the empty string, and if I know how to calculate the reverse of x:xs, given that I know how to calculate the reverse of xs, then I can use a computer to calculate the reverse of any string. The two cases for reverse are not hard: the reverse of the empty string is the empty string, and the reverse of x:xs is the reverse of xs followed by the string consisting of the single character x:


> reverse []      =  []
> reverse (x:xs)  =  reverse xs ++ [x]

Here [x] is the list containing the single element x, which can also be written "x". This completely defines what it means to be a palindrome.

The above definition only qualifies strings as palindromes when they are exactly equal when reversed. So "Madam, I'm Adam" does not pass this test. For this string to also pass the palindrome test, I slightly adapt the definition. I now say that a string is a palindrome if it is equal to its reverse after throwing away all punctuation symbols such as spaces, comma's, periods, etc, and after turning all characters into lower case characters. I call such a palindrome a textPalindrome.


> textPalindrome xs  =  let  ys  =  filter isLetter xs
>                            zs  =  map toLower ys
>                       in   zs == reverse zs
>
> isLetter l = not (isPunctuation l) && not (isSpace l)

Haskell's let ... in ... construct allows me to introduce new definitions after the let, which I can use in the definitions after the in. Here I use two intermediate results ys and zs. By filtering the original string xs with the predicate isLetter, I obtain a new string ys in which no punctuation or space characters appear anymore. So "Madam, I'm Adam" is turned into "MadamImAdam". Function filter is a standard function in Haskell which takes a predicate p and a list xs, and keeps all the elements of xs that satisfy the predicate p. The function isLetter uses two basic Haskell functions on characters, isPunctuation and isSpace. isPunctuation returns True for all punctuation characters such as the dot, comma, space, and exclamation mark, and False for all non-punctuation characters, such as letters. Function isSpace works similarly on space characters. Function isLetter takes a character, and checks that it is not a punctuation character, and (denoted by &&) not a space character. After filtering out all non-letter characters from xs giving ys, I map each letter in ys to its lower case by means of map toLower ys. So "MadamImAdam" is turned into "madamimadam". map is also a standard Haskell function, which takes a function as argument, toLower in this example, and applies this function to all characters in a string. The function toLower turns a capital letter into lowercase, and does nothing to lowercase letters. Function textPalindrome accepts all palindromes, irrespective of their punctuation.

Both palindrome and textPalindrome cannot be used to check if a DNA sequence is a palindrome, because the symbol 'A' should be considered equal to 'T', and 'C' to 'G', and the equality == used in the definition of palindrome considers them different. We need to change the equality function to compare characters in DNA strings. We define the DNA character equality function =:= by


> 'A' =:= 'T'  =  True
> 'T' =:= 'A'  =  True
> 'C' =:= 'G'  =  True
> 'G' =:= 'C'  =  True
> _   =:= _    =  False

We use this new equality function in a definition of dnaPalindrome for sequences of DNA symbols. We pairwise combine the elements of xs and reverse xs with the equality function =:= using the function zipWith. zipWith is another standard function, which takes an operator and two lists, and `zips' the two lists with the operator. Thus we obtain a list of comparisons, which we fold to a single result by requiring that each element in the list is True. Function and takes a list of boolean values, and returns True only if all elements in the list are True.

> dnaPalindrome xs  =  and (zipWith (=:=) xs (reverse xs))

Thus we have three functions for checking whether or not a string is a palindrome: palindrome for strings that are exactly equal when reversed, textPalindrome for strings that are exactly equal when reversed modulo punctuation and space characters, and dnaPalindrome for DNA strings.

No comments:

Post a Comment