# BERT_Tokenizer_for_classification **Repository Path**: liuerin/BERT_Tokenizer_for_classification ## Basic Information - **Project Name**: BERT_Tokenizer_for_classification - **Description**: No description available - **Primary Language**: Unknown - **License**: Not specified - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2021-01-28 - **Last Updated**: 2021-01-28 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # BERT_Tokenizer_for_classification: This repo gives a step by step guide of using BERT Style tokenizer and how it can be used for tasks like sentiment analysis with models like CNN, LSTM etc. BERT has a unique way of tokenizing, and we could leverage similar tokenization technique to feed tokenized data to our traditional models. # Experiment: We will try to experiement and check out BERT's tokenizer utility. then we will build a 1-D CNN model to see the whole flow. To minimize the data loss due to padding, we will use a batching trick to create batches of sentences with similar length while training. Please feel free to use similar steps for Glueing with other kind of models.