Passage Retrieval
Retrieving documents (passages) that match a query. 
Database
- Could use a DBMS in practice.
- Here, we use Wiki data.
Passage Retrieval with MRC
Open-domain Question Answering: finding answers to questions from a large set of documents.

Combining passage retrieval with MRC gives you open-domain question answering.
- Passage Retrieval
- Provides contexts likely containing the answer to the MRC model.
- MRC
- Derives the answer from the given context.
Overview
 Passage retrieval searches based on embedding space.
- Embed query and passages into embedding space.
- Passages are pre-embedded in batch for efficiency.
- Compute similarity scores between query and passage embeddings.
- Vector distance calculation; shorter distance = more similar.
- Inner product calculation; higher value = more similar.
- Output passages ranked by similarity to the query.
Passage Embedding
Like word embedding, passages are embedded to vectorize them.  Similarity is computed via inner product or distance calculation between vectors in the embedding space, just like any typical embedding space.
Sparse Embedding
Sparse: the opposite of dense. Very few non-zero numbers present.
A representative example is BoW (Bag of Words). Words each constitute a dimension, so when embedding a document, there need to be as many dimensions as the vocabulary size. Most values will inevitably be 0, with very few non-zero entries.
In BoW, increasing n in n-gram causes the number of possible combinations to grow exponentially. So typically only bigrams (2-grams) are used, occasionally trigrams.
Term Value
- Binary: just whether a term appears in the document or not.
- Term frequency: how many times a term appears.
- e.g., TF-IDF
Characteristics of Sparse Embedding
- Dimension of embedding vector = number of terms
- Grows as n in n-gram increases.
- Useful when you need to precisely detect term overlap.
- e.g., determining if a term is contained in a document during search.
- Cannot compare words that are semantically similar but different.
- Not at all!
TF-IDF
Term Frequency - Inverse Document Frequency
- TF (Term Frequency): how often a word appears.
- Compute raw count / num words and normalize.
- Log normalization is also used.
- IDF (Inverse Document Frequency): the amount of information a word provides.
- Frequently appearing words: judged to provide less information.
- Rarely appearing words: judged to provide more information.
IDF  : the number of documents in which term t appears.
- Unlike TF, it depends only on the term.
- The fewer documents term t appears in, the higher IDF(t) becomes.
TF-IDF
- Articles would have low TF-IDF.
- TF might be high, but IDF converges to 0.
- Rare proper nouns would have high TF-IDF.
- Even if TF is low, IDF is much higher.
Computing TF-IDF
You can compute it for each term individually, or multiply tables together. 
Rows represent individual documents. Columns represent the vocabulary. In this example, each document contains only one vocab element, so all values are 0 or 1. The table values are directly the TF values.

IDF values computed for each term. Since IDF depends only on the term, values are the same regardless of document.
TF-IDF is just TF times IDF, so element-wise multiplication of the two tables gives you the full TF-IDF.

Using TF-IDF
Can be used to get similarity between query and passages in passage retrieval.
- Tokenize the query.
- Exclude tokens not in the vocabulary.
- Treat the query as a document and compute TF-IDF.
- Compute similarity scores with pre-computed passage TF-IDFs.
- Select the passage with the highest score.

BM25
Based on TF-IDF. Also considers document length.
- Caps the TF value.
- Gives more weight to shorter-than-average documents when a term matches.
- An algorithm still widely used in search engines, recommendation systems, etc. 