15/06/20 – Séminaire : Olena Orobinska (Institut Polytechnique de Kharkiv)

Automatic Keyword Extraction from specialized text

Lundi 15/06/2020, 11:00, Conférence Webex

There are a number of methods for term extraction from the texts. More of them are needed for the grand corpora to be effective. The method is based on the algorithm proposed in [1] . It is operable on individual documents and it permits to extract the one-word and multi-word terms. The main idea is to use the stop-word list as delimiters to split the text into sequences of candidate terms. Usually the stop-words are eliminated from text. Word associations within these candidate terms are measured in a manner that automatically adapts to the style and content of the text, enabling adaptive and fine-grained measurement of word co-occurrences that will be used to score candidate keywords.

[1] Rose, Stuart & Engel, Dave & Cramer, Nick & Cowley, Wendy. (2010). Automatic Keyword Extraction from Individual Documents. 10.1002/9780470689646.ch1.

Laisser un commentaire