Language Processing Technologies

 

Syllabus

Formal Languages
Grammars. Classification of grammars and languages. Regular languages. Lexical parsing and regular expressions (REGEX). Regular expressions and finite state automata. The POSIX 1003.2 standard for REGEX. Design of lexical parsers. Use of Lex. Context-free languages. Syntactic parsing: parsing trees, ambiguity, top- down and bottom-up parsers. Top-down recursive parsing. LL(1) parsers. Bottom-up shift-reduce parsers. Conflicts in parsing tables. LR(k) grammars and parsers. Construction of bottom-up parsing tables: the case of SLR grammars. Use of YACC.

Natural Language Processing
Eliza and chatbots. Lexical entities: words and morphology; finite state transducers; the SFTS library; stemming; spell-checking; mimimum edit distance; N-gram statistical language models (estimates, smoothing and back-off techniques); the Naive Bayes text classifier. Part-Of-Speech Tagging. Grammatical classes, tagsets, rule-based PoS tagging, probabilistic PoS tagging and Hidden Markov Models. Syntax analysis. Context Free grammars in NLP. Phrase components and chunking: Noun Phrase, Verb Phrase, and Prepositional Phrase. Implementation of the rules for genre/number agreement and verbal constructions (subcategorization frame). Ambiguity and parsing tree construction (attachment ambiguity, coordination ambiguity). The Earley Parser. Contraints and grammars with feature structures. Probablistic Context Free Grammar (introduction to). Lexical semantics. Meanings, relationships among meanings. Wordnet. Word sense disambiguation. Word similarity. Applications of NLP technologies. Question answering, Text classification, Sentiment Analysis. NLP libraries. OpenNLP. MaxEnt classifiers.

Slides

  1. Languages and Compilers
  2. Regular languages
  3. Context Free Languages
  4. YACC
  5. NLP: words (morphology, spell-checking, N-gram models)
  6. NLP: Part of Speech Tagging
  7. NLP: Syntax
  8. NLP: Semantics
  9. OpenNLP

Examples and references

  1. Flex
  2. Parser CF
  3. Eliza (Netbeans project - zip file)
  4. Stuttgart Finite State Transducer (zip with examples)
  5. Porter Stemmer (zip)
  6. Morph-it (Morphologic analyzer for Italian)
  7. Wordnet, Wordnet::similarity, WordVis, VisuWords
  8. OpenNLP