Skip to content

tfidf

find_similarities(tfidf_matrix, index, top_n=5)

Use to find similarities.

Parameters:

Name Type Description Default
tfidf_matrix List

Tf-idf-weighted document-term matrix.

required
index int

Dish id from dataset.

required
top_n int

Max recommendation count, max value 30.

5

Returns:

Type Description
dict

indice, similarity value (dictionary): Indice, similarity value.

Source code in koolsla/tfidf.py
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
def find_similarities(tfidf_matrix: List, index: int, top_n: int = 5) -> dict:
    """Use to find similarities.
    Args:
      tfidf_matrix (sparse matrix, [n_samples, n_features]): Tf-idf-weighted document-term matrix.
      index (int): Dish id from dataset.
      top_n (int): Max recommendation count, max value 30.
    Returns:
      indice, similarity value (dictionary): Indice, similarity value.
    """

    # Find cosine similarities
    cosine_similarities = linear_kernel(tfidf_matrix[index:index+1], tfidf_matrix).flatten()
    # Prepare related indices
    related_docs_indices = [i for i in cosine_similarities.argsort()[::-1] if i != index]
    # Return dish indices and similarity values
    return [(index, cosine_similarities[index]) for index in related_docs_indices][0:top_n]

train_engine(plots)

Train engine from dish names.

Parameters:

Name Type Description Default
plots List

List of dish names.

required

Returns:

Type Description
tfidf_matrix (sparse matrix, [n_samples, n_features])

Tf-idf-weighted document-term matrix.

Source code in koolsla/tfidf.py
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
def train_engine(plots: List):
    """Train engine from dish names.
    Args:
      plots (array): List of dish names.
    Returns:
      tfidf_matrix (sparse matrix, [n_samples, n_features]): Tf-idf-weighted document-term matrix.
    """

    # Initializing tf-idf vectorizer
    vectorizer = TfidfVectorizer(
                    analyzer='word',
                    lowercase=True,
                    min_df=3,
                    max_df=0.9,
                    ngram_range=(1, 2),
                    stop_words='english')
    # Fit and transform corpus
    tfidf_matrix = vectorizer.fit_transform(plots)
    # Pack and return the results
    return tfidf_matrix