Word2Vec: Why Do We Need Word Representations?

Serokell
9 min readMay 17, 2023

Word2Vec, short for “word to vector,” is a technology used to represent the relationships between different words in the form of a graph. This technology is widely used in machine learning for embedding and text analysis.

Google introduced Word2Vec for their search engine and patented the algorithm, along with several following updates, in 2013. This collection of interconnected algorithms was developed by Tomas Mikolov.

In this article, we will explore the notion and mechanics of generating embeddings with Word2Vec.

What is a word embedding?

If you ask someone which word is more similar to “king” — “ruler” or “worker” — most people would say “ruler” makes more sense, right? But how do we teach this intuition to a computer? That’s where word embeddings come in handy.

A word embedding is a representation of a word used in text analysis. It usually takes the form of a vector, which encodes the word’s meaning in such a way that words closer in the vector space are expected to be similar in meaning. Language modeling and feature learning techniques are typically used to obtain word embeddings, where words or phrases from the vocabulary are mapped to vectors of real numbers.

--

--

Serokell

Serokell is a software development company focused on building innovative solutions for complex problems. Come visit us at serokell.io!