Published by Contributor
Tokenization is when we break down a sentence into individual words or smaller parts, called tokens. Think of it as cutting a sentence into smaller pieces, like separating the words to understand each one.
Vectorization is when we turn those words into numbers, so a computer can work with them. Each word gets its own number or a list of numbers (called a vector) that helps the computer understand the meaning of the word.
Sentence:
Tokenization:
["I", "love", "apples"]
.Vectorization:
[0.1, 0.2, 0.3]
[0.4, 0.5, 0.6]
[0.7, 0.8, 0.9]
Using Search:
This is how tokenization and vectorization help computers understand text and match similar content!
Want to report this post?
Please contact the ChemistAi team.