Dependency Parsing

- Head: the semantic center
- Dependent: supplements (modifies) the meaning of the head
- Primarily studied in languages like Korean where word order is flexible and omission is common
Rules
- Heads are postpositional
- The head always appears after the dependent
- Each dependent has exactly one head, and vice versa.
- No crossing dependency structures.
- Nesting is allowed though. If A is a head for one word, it can simultaneously be a dependent of another.
Classification Method
Classification via sequence labeling.
Applications
- Complex natural language forms can be structured as graphs.
- Information about each entity can be extracted.
Single Sentence Classification Task
Determines which class a given sentence belongs to.
- Sentiment Analysis
- Classifying a sentence as positive/negative/neutral, etc.
- Hate speech classification
- Corporate monitoring
- Topic Labeling
- Classifying sentences into categories
- Large-scale document classification
- VoC (Voice of Customer): classifying customer feedback
- Language Detection
- Identifying which language a sentence is in
- Translation
- Data filtering
- Intent Classification
- Classifying the intent of a sentence
- Chatbots: understanding intent to generate appropriate responses
Korean Sentence Classification Datasets
- Kor_hate
- Hate speech data
- Bias expressions, not just profanity
- Kor_sarcasm
- Sarcasm expression data
- Kor_sae
- Question type data
- e.g.,
- Yes/no questions
- Questions asking for alternative choices
- Prohibitions, requests, commands
- Kor_3i4k
- Intent-related data
Sentence Classification Model Architecture

Based on BERT, with a classifier attached to the CLS token for sentence classification.
The parameters used are standard BERT configuration values:
- input_ids: input sequence tokens
- attention_mask: mask of [0,1] to distinguish padding tokens
- token_type_ids: [0,1] to distinguish first and second sentences
- position_ids: embedding indices for each input sequence position
- inputs_embeds: directly assign embedding representations instead of input_ids
- labels: labels for loss computation
- Next_sentence_label: labels for next sentence prediction loss
Training Process
- Prepare dataset
- Preprocess and tokenize dataset
- Design dataloader
- Prepare train and test datasets
- Configure TrainingArguments
- Import pretrained model
- Set up Trainer
- Train model
- Implement prediction and evaluation