# BERT_Tokenizer_for_classification

**Repository Path**: liuerin/BERT_Tokenizer_for_classification

## Basic Information

- **Project Name**: BERT_Tokenizer_for_classification
- **Description**: No description available
- **Primary Language**: Unknown
- **License**: Not specified
- **Default Branch**: master
- **Homepage**: None
- **GVP Project**: No

## Statistics

- **Stars**: 0
- **Forks**: 0
- **Created**: 2021-01-28
- **Last Updated**: 2021-01-28

## Categories & Tags

**Categories**: Uncategorized

**Tags**: None

## README

# BERT_Tokenizer_for_classification:
This repo gives a step by step guide of using BERT Style tokenizer
and how it can be used for tasks like sentiment analysis with models like CNN, LSTM etc.
BERT has a unique way of tokenizing, and we could leverage similar tokenization technique
to feed tokenized data to our traditional models.

# Experiment:
We will try to experiement and check out BERT's tokenizer utility.
then we will build a 1-D CNN model to see the whole flow.
To minimize the data loss due to padding, we will use a batching trick 
to create batches of sentences with similar length while training.

Please feel free to use similar steps for Glueing with other kind of models.