Skip to content

Commit 41a1150

Browse files
committed
vector search in nodejs tutorial - added KNN query
1 parent 5961ba1 commit 41a1150

File tree

2 files changed

+295
-9
lines changed

2 files changed

+295
-9
lines changed
238 KB
Loading

docs/howtos/solutions/vector/getting-started-vector/index-getting-started-vector.mdx

Lines changed: 295 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -46,6 +46,12 @@ In a more complex scenario, like natural language processing (NLP), words or ent
4646
Vector similarity is a measure that quantifies how alike two vectors are, typically by evaluating the `distance` or `angle` between them in a multi-dimensional space.
4747
When vectors represent data points, such as texts or images, the similarity score can indicate how similar the underlying data points are in terms of their features or content.
4848

49+
### Use cases for vector similarity:
50+
51+
- **Recommendation Systems**: If you have vectors representing user preferences or item profiles, you can quickly find items that are most similar to a user's preference vector.
52+
- **Image Search**: Store vectors representing image features, and then retrieve images most similar to a given image's vector.
53+
- **Textual Content Retrieval**: Store vectors representing textual content (e.g., articles, product descriptions) and find the most relevant texts for a given query vector.
54+
4955
## How to calculate vector similarity?
5056

5157
There are several ways to calculate vector similarity, but some of the most common methods include:
@@ -266,7 +272,7 @@ const imageEmbeddings = await generateImageEmbeddings('images/11001.jpg');
266272
console.log(imageEmbeddings);
267273
/*
268274
1024 dim vector output
269-
embeddings = [
275+
imageEmbeddings = [
270276
0.013823275454342365, 0.33256298303604126, 0,
271277
2.2764432430267334, 0.14010703563690186, 0.972867488861084,
272278
1.2307443618774414, 2.254523992538452, 0.44696325063705444,
@@ -280,15 +286,13 @@ console.log(imageEmbeddings);
280286
*/
281287
```
282288
283-
## Querying vectors with Redis
289+
## Database setup
284290
285-
### Sample JSON
291+
### Sample Data seeding
286292
287-
consider below sample `products` JSON for our vector generation demonstration.
293+
Let's assume a simplified e-commerce scenario. consider below `products` JSON for vector search demonstration in this tutorial.
288294
289-
**give local links to images**
290-
291-
```js
295+
```js title="src/data.ts"
292296
const products = [
293297
{
294298
_id: '1',
@@ -333,6 +337,288 @@ const products = [
333337
];
334338
```
335339
336-
### Vector KNN Query
340+
:::tip GITHUB CODE
341+
342+
Below is a command to the clone the source code used in this tutorial
343+
344+
git clone https://github.com/redis-developer/redis-vector-nodejs-solutions.git
345+
:::
346+
347+
Below is the sample code to add `products` data as JSON in Redis along with vectors of product descriptions and product image.
348+
349+
```js title="src/index.ts"
350+
async function addProductWithEmbeddings(_products) {
351+
const nodeRedisClient = getNodeRedisClient();
352+
353+
if (_products && _products.length) {
354+
for (let product of _products) {
355+
console.log(
356+
`generating description embeddings for product ${product._id}`,
357+
);
358+
const sentenceEmbedding = await generateSentenceEmbeddings(
359+
product.productDescription,
360+
);
361+
product['productDescriptionEmbeddings'] = sentenceEmbedding;
362+
363+
console.log(`generating image embeddings for product ${product._id}`);
364+
const imageEmbedding = await generateImageEmbeddings(product.imageURL);
365+
product['productImageEmbeddings'] = imageEmbedding;
366+
367+
await nodeRedisClient.json.set(`products:${product._id}`, '$', {
368+
...product,
369+
});
370+
console.log(`product ${product._id} added to redis`);
371+
}
372+
}
373+
}
374+
```
375+
376+
Data view in RedisInsight
377+
378+
![products data in RedisInsight](./images/products-data-gui.png)
379+
380+
:::tip
381+
Download <u>[RedisInsight](https://redis.com/redis-enterprise/redis-insight/)</u> to view your Redis data or to play with raw Redis commands in the workbench. learn more about <u>[RedisInsight in tutorials](/explore/redisinsight/)</u>
382+
:::
383+
384+
### Create vector index
385+
386+
Below implementation shows indexing different field types in Redis including vector fields like productDescriptionEmbeddings and productImageEmbeddings.
387+
388+
```ts title="src/redis-index.ts"
389+
import {
390+
createClient,
391+
SchemaFieldTypes,
392+
VectorAlgorithms,
393+
RediSearchSchema,
394+
} from 'redis';
395+
396+
const PRODUCTS_KEY_PREFIX = 'products';
397+
const PRODUCTS_INDEX_KEY = 'idx:products';
398+
const REDIS_URI = 'redis://localhost:6379';
399+
let nodeRedisClient = null;
400+
401+
const getNodeRedisClient = async () => {
402+
if (!nodeRedisClient) {
403+
nodeRedisClient = createClient({ url: REDIS_URI });
404+
await nodeRedisClient.connect();
405+
}
406+
return nodeRedisClient;
407+
};
408+
409+
const createRedisIndex = async () => {
410+
/* (RAW COMMAND)
411+
FT.CREATE idx:products
412+
ON JSON
413+
PREFIX 1 "products:"
414+
SCHEMA
415+
"$.productDisplayName" as productDisplayName TEXT NOSTEM SORTABLE
416+
"$.brandName" as brandName TEXT NOSTEM SORTABLE
417+
"$.price" as price NUMERIC SORTABLE
418+
"$.masterCategory" as "masterCategory" TAG
419+
"$.subCategory" as subCategory TAG
420+
"$.productDescriptionEmbeddings" as productDescriptionEmbeddings VECTOR "FLAT" 10
421+
"TYPE" FLOAT32
422+
"DIM" 768
423+
"DISTANCE_METRIC" "L2"
424+
"INITIAL_CAP" 111
425+
"BLOCK_SIZE" 111
426+
"$.productDescription" as productDescription TEXT NOSTEM SORTABLE
427+
"$.imageURL" as imageURL TEXT NOSTEM
428+
"$.productImageEmbeddings" as productImageEmbeddings VECTOR "HNSW" 8
429+
"TYPE" FLOAT32
430+
"DIM" 1024
431+
"DISTANCE_METRIC" "COSINE"
432+
"INITIAL_CAP" 111
433+
434+
*/
435+
const nodeRedisClient = await getNodeRedisClient();
436+
437+
const schema: RediSearchSchema = {
438+
'$.productDisplayName': {
439+
type: SchemaFieldTypes.TEXT,
440+
NOSTEM: true,
441+
SORTABLE: true,
442+
AS: 'productDisplayName',
443+
},
444+
'$.brandName': {
445+
type: SchemaFieldTypes.TEXT,
446+
NOSTEM: true,
447+
SORTABLE: true,
448+
AS: 'brandName',
449+
},
450+
'$.price': {
451+
type: SchemaFieldTypes.NUMERIC,
452+
SORTABLE: true,
453+
AS: 'price',
454+
},
455+
'$.masterCategory': {
456+
type: SchemaFieldTypes.TAG,
457+
AS: 'masterCategory',
458+
},
459+
'$.subCategory': {
460+
type: SchemaFieldTypes.TAG,
461+
AS: 'subCategory',
462+
},
463+
'$.productDescriptionEmbeddings': {
464+
type: SchemaFieldTypes.VECTOR,
465+
TYPE: 'FLOAT32',
466+
ALGORITHM: VectorAlgorithms.FLAT,
467+
DIM: 768,
468+
DISTANCE_METRIC: 'L2',
469+
INITIAL_CAP: 111,
470+
BLOCK_SIZE: 111,
471+
AS: 'productDescriptionEmbeddings',
472+
},
473+
'$.productDescription': {
474+
type: SchemaFieldTypes.TEXT,
475+
NOSTEM: true,
476+
SORTABLE: true,
477+
AS: 'productDescription',
478+
},
479+
'$.imageURL': {
480+
type: SchemaFieldTypes.TEXT,
481+
NOSTEM: true,
482+
AS: 'imageURL',
483+
},
484+
'$.productImageEmbeddings': {
485+
type: SchemaFieldTypes.VECTOR,
486+
TYPE: 'FLOAT32',
487+
ALGORITHM: VectorAlgorithms.HNSW, //Hierarchical Navigable Small World graphs
488+
DIM: 1024,
489+
DISTANCE_METRIC: 'COSINE',
490+
INITIAL_CAP: 111,
491+
AS: 'productImageEmbeddings',
492+
},
493+
};
494+
console.log(`index ${PRODUCTS_INDEX_KEY} created`);
495+
496+
try {
497+
await nodeRedisClient.ft.dropIndex(PRODUCTS_INDEX_KEY);
498+
} catch (indexErr) {
499+
console.error(indexErr);
500+
}
501+
await nodeRedisClient.ft.create(PRODUCTS_INDEX_KEY, schema, {
502+
ON: 'JSON',
503+
PREFIX: PRODUCTS_KEY_PREFIX,
504+
});
505+
};
506+
```
507+
508+
:::note FLAT VS HNSW indexing
509+
FLAT : When you index your vectors in a "FLAT" manner, you're essentially storing them as they are, without any additional structure or hierarchy. When you query against a FLAT index, the algorithm will perform a linear scan through all the vectors to find the most similar ones. This is a more accurate, but much slower and compute intensive approach (suitable for smaller dataset).
510+
511+
HNSW : (Hierarchical Navigable Small World) :
512+
HNSW is a graph-based method for indexing high-dimensional data. For bigger datasets it becomes slower to compare with every single vector in the index, so a probabilistic approach through the HNSW algorithm provides very fast search results (but sacrifices some accuracy)
513+
:::
514+
515+
## What is vector search by KNN?
516+
517+
KNN, or k-Nearest Neighbors, is an algorithm used in both classification and regression tasks, but when referring to "KNN Search," we're typically discussing the task of finding the "k" points in a dataset that are closest (most similar) to a given query point. In the context of vector search, this means identifying the "k" vectors in our database that are most similar to a given query vector, usually based on some distance metric like cosine similarity or Euclidean distance.
518+
519+
Redis provides support for vector search, allowing you to index and then search for vectors [using the KNN approach](https://redis.io/docs/stack/search/reference/vectors/#pure-knn-queries).
520+
521+
### Vector KNN query with Redis
522+
523+
```ts title="src/knn-query.ts"
524+
const float32Buffer = (arr) => {
525+
const floatArray = new Float32Array(arr);
526+
const float32Buffer = Buffer.from(floatArray.buffer);
527+
return float32Buffer;
528+
};
529+
const queryProductDescriptionEmbeddingsByKNN = async (
530+
_searchTxt,
531+
_resultCount,
532+
) => {
533+
//A KNN query will give us the top n documents that best match the query vector.
534+
535+
/* sample raw query
536+
537+
FT.SEARCH idx:products
538+
"*=>[KNN 5 @productDescriptionEmbeddings $searchBlob AS score]"
539+
RETURN 4 score brandName productDisplayName imageURL
540+
SORTBY score
541+
PARAMS 2 searchBlob "6\xf7\..."
542+
DIALECT 2
543+
544+
*/
545+
//https://redis.io/docs/interact/search-and-query/query/
546+
547+
console.log(`queryProductDescriptionEmbeddingsByKNN started`);
548+
let results = {};
549+
if (_searchTxt) {
550+
_resultCount = _resultCount ?? 5;
551+
552+
const nodeRedisClient = getNodeRedisClient();
553+
const searchTxtVectorArr = await generateSentenceEmbeddings(_searchTxt);
554+
555+
const searchQuery = `*=>[KNN ${_resultCount} @productDescriptionEmbeddings $searchBlob AS score]`;
556+
557+
results = await nodeRedisClient.ft.search(PRODUCTS_INDEX_KEY, searchQuery, {
558+
PARAMS: {
559+
searchBlob: float32Buffer(searchTxtVectorArr),
560+
},
561+
RETURN: ['score', 'brandName', 'productDisplayName', 'imageURL'],
562+
SORTBY: {
563+
BY: 'score',
564+
// DIRECTION: "DESC"
565+
},
566+
DIALECT: 2,
567+
});
568+
} else {
569+
throw 'Search text cannot be empty';
570+
}
571+
572+
return results;
573+
};
574+
```
575+
576+
```js title="sample output"
577+
const result = await queryProductDescriptionEmbeddingsByKNN(
578+
'Puma watch with cat',
579+
3,
580+
);
581+
console.log(JSON.stringify(result, null, 4));
582+
583+
/*
584+
(Lower score/distance indicates higher similarity)
585+
{
586+
"total": 3,
587+
"documents": [
588+
{
589+
"id": "products:1",
590+
"value": {
591+
"score": "0.762174725533",
592+
"brandName": "Puma",
593+
"productDisplayName": "Puma Men Race Black Watch",
594+
"imageURL": "images/11002.jpg"
595+
}
596+
},
597+
{
598+
"id": "products:2",
599+
"value": {
600+
"score": "0.825711071491",
601+
"brandName": "Puma",
602+
"productDisplayName": "Puma Men Top Fluctuation Red Black Watches",
603+
"imageURL": "images/11001.jpg"
604+
}
605+
},
606+
{
607+
"id": "products:3",
608+
"value": {
609+
"score": "1.79949247837",
610+
"brandName": "Inkfruit",
611+
"productDisplayName": "Inkfruit Women Behind Cream Tshirts",
612+
"imageURL": "images/11008.jpg"
613+
}
614+
}
615+
]
616+
}
617+
*/
618+
```
619+
620+
- [hybrid-knn-queries](https://redis.io/docs/interact/search-and-query/search/vectors/#hybrid-knn-queries)
621+
622+
## What is vector search by range ?
337623
338-
### Vector Range Query
624+
### Vector range query with Redis

0 commit comments

Comments
 (0)