How to build a simple recommendation engine?

By Prajwal Haniya

Techletter #94 | October 12, 2024

This week let’s build a simple recommendation engine. Recommendation engines are really complex pieces of software. So, in this letter, I try to build a simple one based on the book Guide 2 Data Mining.

As more and more data get’s generated, it would be very difficult for people to choose among thousands of products or would be difficult to watch a video or listen to a song that is perfect for you without a good recommendation engine.

I personally use YouTube Premium, and I have experienced better video suggestions than just using without a premium.

Collaborative filtering

It’s called collaborative because it makes recommendations based on other people in effect, people collaborate to come up with recommendations.

Suppose the task is to recommend a book to you. I search among other users of the site to find one that is similar to you in the books a person enjoys. Once I find that similar person I can see what that person likes and recommend those books to you. So now, how do we find the other person who has a similar interest just like you?

Here’s a simple 2D (dimensional) explanation. Suppose users rate books on a 5-star system zero stars mean the book is terrible, and 5 stars mean the book is great. Because I said we are looking at the simple 2D case, we restrict our ratings to two books: Neal Stephenson’s Snow Crash and Steig Larsson’s The Girl with the Dragon Tattoo.

Now, I would like to recommend a book to the mysterious Ms. X who rated Snow Crash 4 stars, and The Girl with the Dragon Tattoo 2 stars. The first task is to find the person who is most similar, or closest, to Ms. X. I do this by computing distance.

Manhattan distance:

The easiest distance measure to compute is what is called Manhattan Distance or cab driver distance. In the 2D case, each person is represented by an (x, y) point. I will add a subscript to the x and y to refer to different people. So (x1, y1) might be Amy, and (x2, y2) might be the elusive Ms. X. Manhattan Distance is then calculated by

|x1 - x2| + | y1 - y2 |

The absolute value of the difference between the x values plus the absolute value of the difference between the y values).

Now let’s calculate the distance for Amy:

|5 - 4| + |5 - 2| => 4

So the Manhattan Distance for Amy and Ms. X is 4.

Below is the Python code to calculate the Manhattan distance:

def manhattan_distance(rating_1, rating_2):
    distance = 0
    for key in rating_1:
        if key in rating_2:
            distance += abs(rating_1[key] - rating_2[key])
    return distance

result = manhattan_distance(users["Hailey"], users["Veronica"])
print(result)

Now, let’s compute the nearest neighbor

def compute_nearest_neighbor(username, users):
    distances = []
    for user in users:
        if user != username:
            distance = manhattan_distance(users[user], users[username])
            distances.append((distance, user))
    distances.sort()
    return distances

# user to whom you need to compute the nearest neighbor
nearest_neighbors = compute_nearest_neighbor("Hailey", users)

In case if you want to look at the complete code then you can look at this repo: gitHub.com/prajwalhaniya/_archive

Recommend after sorting

def recommend(username, users):
    nearest_neighbor = compute_nearest_neighbor(username, users)[0][1]
    recommendations = []

    neighbor_ratings = users[nearest_neighbor]
    user_ratings = users[username]

    for artist in neighbor_ratings:
        if not artist in user_ratings:
            recommendations.append((artist, neighbor_ratings[artist]))

    return sorted(recommendations, key = lambda artist_tuple: artist_tuple[1], reverse = True)

recommendations = recommend("Hailey", users)