BERT base model (uncased)¶

Transformers model pretrained on a large corpus of English data in a self-supervised fashion

Publisher	License	Version	Release
Hugging Face team	Unknown	Unknown	Unknown

Model Summary¶

BERT is a transformers model pretrained on a large corpus of English data in a self-supervised fashion. It was introduced in a paper and first released in a repository. It is uncased, meaning it does not differentiate between English and english. It can be used for masked language modeling or next sentence prediction, but is primarily intended to be fine-tuned on downstream tasks such as sequence classification, token classification or question answering. It was trained on 4 cloud TPUs for one million steps with a batch size of 256 and a sequence length of 128 tokens for 90% of the steps and 512 for the remaining 10%. When fine-tuned on downstream tasks, it achieves good results on Glue test results.

Model Resources¶

🤗 Hugging Face | 🐱 Github | 📃 Research Paper

Info

This model card was generated using PromptxAI API querying recent web content sources with large language model generations. As of Feb 2023 it is not possible to query models like GPT-3 (via applications like ChatGPT) on the latest web content. This is because the model is trained on a static dataset and is not updated with new web content. PromptxAI API solves this problem by chaining recent web content sources with large language model outputs. This allows you to query models like GPT-3 on latest web content.

Model Details¶

Size: Unknown

Use Cases: Sequence classification, token classification or question answering

Training corpus: BookCorpus, English Wikipedia

Training method: Adam optimizer, learning rate of 1e-4, β1=0.9, β2=0.999, weight decay of 0.01, learning rate warmup for 10,000 steps and linear decay of the learning rate after

Evaluation method: Glue test results

Compute: 4 cloud TPUs in Pod configuration (16 TPU chips total)

Features: Lowercased and tokenized using WordPiece and a vocabulary size of 30,000

Limitations: Primarily aimed at being fine-tuned on tasks that use the whole sentence (potentially masked) to make decisions

Strengths: Can use lots of publicly available data