Author ORCID Identifier

http://orcid.org/0000-0002-8398-1531

Degree Year

2020

Document Type

Thesis

Degree Name

Bachelor of Arts

Department

Computer Science

Advisor(s)

John L. Donaldson

Keywords

Machine learning, NLP, Deep learning

Abstract

Since the first bidirectional deep learn- ing model for natural language understanding, BERT, emerged in 2018, researchers have started to study and use pretrained bidirectional autoencoding or autoregressive models to solve language problems. In this project, I conducted research to fully understand BERT and XLNet and applied their pretrained models to two language tasks: reading comprehension (RACE) and part-of-speech tagging (The Penn Treebank). After experimenting with those released models, I implemented my own version of ELECTRA, a pretrained text encoder as a discriminator instead of a generator to improve compute-efficiency, with BERT as its underlying architecture. To reduce the number of parameters, I replaced BERT with ALBERT in ELEC- TRA and named the new model, ALE (A Lite ELECTRA). I compared the performance of BERT, ELECTRA, and ALE on GLUE benchmark dev set after pretraining them with the same datasets for the same amount of training FLOPs.

Share

COinS