One old joke is about Mr. Jensen that has a company with his two sons. Mr. Jensen orders a new sign to put up in front of his shop displaying “Jensen and Sons”. The sign painter sends Mr. Jensen a draft. However, Mr. Jensen is not satisfied with the position of the letters and he writes back to the painter: “I would like to have more space between Jensen and and and and and Sons”.Our course in Python programming (Technical University of Denmark course 02820) is presently running. One of the topics of the course is text and web mining with the associated slides available. Natural Language Toolkit (NLTK) Python module provides convenient support for some text mining approaches. Part-of speech tagging (POS-tagging) is part of NLTK and it is fairly straightforward to use. Working on the joke the Python code goes like this:
>>> import nltk >>> s = "I would like to have more space between Jensen and and and and and Sons" >>> words = nltk.word_tokenize(s) >>> nltk.pos_tag(words) [('I', 'PRP'), ('would', 'MD'), ('like', 'VB'), ('to', 'TO'), ('have', 'VB'), ('more', 'JJR'), ('space', 'NN'), ('between', 'IN'), ('Jensen', 'NNP'), ('and', 'CC'), ('and', 'CC'), ('and', 'CC'), ('and', 'CC'), ('and', 'CC'), ('Sons', 'NNPS')]
‘And’ is tagged as a ‘coordinating conjunction’. I am not sure NLTK sees the joke there.