Month: June 2019

On the road to joint embedding with Wikidata lexemes?

Posted on June 13, 2019 Updated on June 13, 2019

road-to-joint-embedding

Is is possible to use Wikidata lexemes for joint embedding, i.e., combining word embedding and knowledge graph entity embedding?

You can create on-the-fly text examples for joint embedding with the Wikidata Query Service. This SPARQL will attempt to interpolate a knowledge graph entity identifier into a text using the short usage example text (P5831):

 SELECT * {
  ?lexeme dct:language ?language ;
          wikibase:lemma ?lemma ;
          ontolex:lexicalForm ?form ;
          p:P5831 [
            ps:P5831 ?text ;
            pq:P5830 ?form 
          ] .
  BIND(SUBSTR(STR(?form), 32) AS ?entity)

  ?form ontolex:representation ?word .
  BIND(REPLACE(?text, STR(?word), ?entity) AS ?interpolated_text)
}

The result is here.

The interpolations are not perfect: There is a problem with capitalization in the beginning of a sentence, and short words may be interpolated into the middle of longer words (I am not able to get a regular expression with word separator “\b” working). Alternatively the SPARQL query result may be downloaded and the interpolation performed in a language that supports advanced regular expression patterns.

The number of annotated usage examples in Wikidata across languages is ridiculously small compared to the corpora typically applied in successful word embedding.

Update:

You can also interpolate the sense identifier: Here is the Wikidata Query Service result.

This entry was posted in technical and tagged embedding, lexemes, Wikidata, Wikidata lexemes, Wikidata Query Service.

	Finn Årup Nielsen on Wikidata and ChatGPT integrati…
	derenrich on Wikidata and ChatGPT integrati…
	Finn Årup Nielsen on Wikidata and ChatGPT integrati…
	derenrich on Wikidata and ChatGPT integrati…
	Wikidata and ChatGPT… on Multihub question answering wi…

Finn Årup Nielsen's blog

– research, science, technology, music, personal opinions, etc.

Month: June 2019

On the road to joint embedding with Wikidata lexemes?