Through my The Tweeted Times I was alerted to the recent issue of Apple’s patent attack
on HTC and Google Android described in Adrian Kingsley-Hughes’ ZDNet article as a “massive patent blow”. Here in July 2011 HTC was found to have infringed two of Apples patents, according to Bloomberg.
It occurred to me that wikis do identification of structured elements: weblinks and CamelCase, – but usually wikis do not identify phone numbers, names and street addresses. Identification of web links are associated with clickable links (well, of course). CamelCase are either associated with links to an existing page or construction of a new page. Ward Cunningham‘s wiki goes back to March 25, 1995. I was not familiar with that early system, but I suppose it had link extraction with regular expressions. So hasn’t Cunningham come up with a linkify functionality for web links already in 1995?
—When I was visiting NEC Research Center in Princeton in 1999-2000 several researchers were working with extracting structured elements from texts such as emails, scientific papers and web pages. Andries Kruger constructed a system that would identify names and dates from scientific call for papers. The system is described in DEADLINER: Building a new niche search engine. Kurt Bollacker, Steve Lawrence and C. Lee Giles were working on the CiteSeer system that extracted authors, titles and years among other things from scientific papers. This is described in CiteSeer: an autonous Web agent for automatic retrieval and identification of interesting publications. I also had a go on regular expressions for extraction of structured elements. Most recently I have been working on extracting brain coordinates from neuroscience papers. If you press the “Extract: Talairach coordinates from linked PDF” link on a Brede Wiki page there is a regular expression going over the text in a PDF so individual brain coordinates can be linked up to specialized neuroscientific searches. Similar extraction is happening in large-scale in Tal Yarkoni’s NeuroSynth database, see this entry for an example extraction rendered nicely on a MRI brain scan.