This my daughter at the IKEA very close to our home.

import fastbook
fastbook.setup_book()
from fastbook import *

%config Completer.use_jedi = False

Collaborative filtering modules:

Exploring the data

from fastai.collab import *
from fastai.tabular.all import*

Downloading and extracting data from the URL list

path = untar_data(URLs.ML_100k)

Giving columns names and readind first five rows.

ratings = pd.read_csv(path/'u.data',delimiter = '\t', header= None, engine='python',names=['user','movie','rating','timestamp'])
ratings.head()

How to recommend movies. Assume the movie has three properties, scince fiction(ness), action, old(ness).

Last skywalker is a sci-fi, and action and not old.

How model learn about our preferences

last_skywalker = np.array([0.98,0.9,-0.9])

And a user who likes sci-fi and action movies and not so old movies would like this.

user1= np.array([.9,.8,-.6])

If we multiply these two vectors and sum it. We get:

(user1*last_skywalker).sum()

2.1420000000000003

this our matching score, it is a positive value that shows there is a match between the movie and the user1

casablanka= np.array([-.99,-.33,.8])

(user1*casablanka).sum()

-1.635

this is low at this time. There is no match.

Latent Factors

We can pick arbitrary number of parameters for the array. Above, we use three. That could be much more of them. We call them Latent Factors. We start training with random parameters and learn from the ratings given by users.

How to create Dataloaders

Note: It is not easy to use data as it was. For this dataset, movie id and movie title are not on the same table.

movies = pd.read_csv(path/'u.item', delimiter='|', engine= 'python',header=None,encoding='latin1', usecols=(0,1),names=('movie','title'))
movies.head()

Let's bring ratings and movies together. (movie id will be the key parameter)

ratings=ratings.merge(movies)
ratings.head()

For Dataloaders, we use CollabDataLoaders this Dataloader use first column for the user and second one for the item, in our situation we should change the default one because our item will be title.

dls=CollabDataLoaders.from_df(ratings, item_name='title',bs=64)
dls.show_batch()

dls.classes['user'][:15]

(#15) ['#na#',1,2,3,4,5,6,7,8,9...]

dls.classes['title'][:15]

(#15) ['#na#',"'Til There Was You (1997)",'1-900 (1994)','101 Dalmatians (1996)','12 Angry Men (1957)','187 (1997)','2 Days in the Valley (1996)','20,000 Leagues Under the Sea (1954)','2001: A Space Odyssey (1968)','3 Ninjas: High Noon At Mega Mountain (1998)'...]

n_users = len(dls.classes['user'])
n_movies = len(dls.classes['title'])
n_factors = 5
user_factors = torch.randn(n_users,n_factors)
movie_factors = torch.randn(n_movies, n_factors)

More PyTorch Less Python

Tip: This is how one_hot works.

one_hot(0,5)

tensor([1, 0, 0, 0, 0], dtype=torch.uint8)

one_hot_3 = one_hot(3,n_users).float()
one_hot_3[:10]

tensor([0., 0., 0., 1., 0., 0., 0., 0., 0., 0.])

and multiply by users_factors(matrix multiplication)

user_factors.t() @ one_hot_3

tensor([ 0.4286,  0.8374, -0.5413, -1.6935,  0.1618])

This might look a bit daunting but it is not. Basically we want utilize pytorch more and python less. PyTorch very good at matrix multiplication, python is not. With this matrix multiplication we can access every index of the latent factor tensor in one move. Otherwise we would have use regular python loop and index which is very very slow.

This is Python version:

user_factors[3]

tensor([ 0.4286,  0.8374, -0.5413, -1.6935,  0.1618])

This is same. Great.

Collaborative Filtering from Scratch

At this point there is a section regarding OOP if you want to learn OOP the check the original book page 260 (3rd release) or the course notebook

class DotProduct(Module):
    def __init__(self, n_users, n_movies, n_factors):
        self.user_factors = Embedding(n_users, n_factors)
        self.movie_factors = Embedding(n_movies, n_factors)
        
    def forward(self, x):
        users = self.user_factors(x[:,0])
        movies = self.movie_factors(x[:,1])
        return (users * movies).sum(dim=1)

Important: This forward method is a bit confusing but I guess what happens there is, x is merged df (it became part of the dls) from above so first column is user id and the second is movie id. check this part:

python
ratings=ratings.merge(movies)
ratings.head()

x,y = dls.one_batch()
x.shape

torch.Size([64, 2])

x[0]

tensor([804, 763])

first one is user id and the second is movie.

y[0]

tensor([5], dtype=torch.int8)

must be the rating.

Let's train

model = DotProduct(n_users, n_movies, 50)
learn = Learner(dls, model, loss_func=MSELossFlat())

learn.fit_one_cycle(5, 5e-3)

not bad but we can force our model to make predictions into range 0-5

class DotProduct(Module):
    def __init__(self, n_users, n_movies, n_factors, y_range=(0,5.5)):
        self.user_factors = Embedding(n_users, n_factors)
        self.movie_factors = Embedding(n_movies, n_factors)
        self.y_range = y_range
        
    def forward(self, x):
        users = self.user_factors(x[:,0])
        movies = self.movie_factors(x[:,1])
        return sigmoid_range((users * movies).sum(dim=1), *self.y_range)

The dls has values in this range as dependent variables (ratings) and there is a special method in the fastai(I assume) for that.

doc(sigmoid_range)

model = DotProduct(n_users, n_movies, 50)
learn = Learner(dls, model, loss_func=MSELossFlat())
learn.fit_one_cycle(5, 5e-3)

A little bit better results.

Bias

Sometimes a user could give low (or high) ratings based on his/her subjective preference even the others thinks that is a very good movie. Let's add a net parameter for that is bias. Bias effects all other parameters in negative or positive way.

class DotProductBias(Module):
    def __init__(self, n_users, n_movies, n_factors, y_range=(0,5.5)):
        self.user_factors = Embedding(n_users, n_factors)
        self.user_bias = Embedding(n_users, 1)
        self.movie_factors = Embedding(n_movies, n_factors)
        self.movie_bias = Embedding(n_movies, 1)
        self.y_range = y_range
        
    def forward(self, x):
        users = self.user_factors(x[:,0])
        movies = self.movie_factors(x[:,1])
        res = (users * movies).sum(dim=1, keepdim=True)
        res += self.user_bias(x[:,0]) + self.movie_bias(x[:,1])
        return sigmoid_range(res, *self.y_range)

model = DotProductBias(n_users, n_movies, 50)
learn = Learner(dls, model, loss_func=MSELossFlat())
learn.fit_one_cycle(5, 5e-3)

And the training loss goes down faster and faster, but valid loss not so.

Weight Decay (or L2 regularization)

loss_with_wd = loss + wd * (parameters**2).sum()

(#0) []

torch.Tensor

(#1) [Parameter containing:
tensor([1., 1., 1.], requires_grad=True)]

(#1) [Parameter containing:
tensor([[-0.0643],
        [-0.8105],
        [ 0.1346]], requires_grad=True)]

torch.nn.parameter.Parameter

torch.Tensor

['Children of the Corn: The Gathering (1996)',
 'Crow: City of Angels, The (1996)',
 'Jury Duty (1995)',
 'Mortal Kombat: Annihilation (1997)',
 'Cable Guy, The (1996)']

['Titanic (1997)',
 "Schindler's List (1993)",
 'Star Wars (1977)',
 'Shawshank Redemption, The (1994)',
 'Rear Window (1954)']

EmbeddingDotBias(
  (u_weight): Embedding(944, 50)
  (i_weight): Embedding(1665, 50)
  (u_bias): Embedding(944, 1)
  (i_bias): Embedding(1665, 1)
)

["Schindler's List (1993)",
 'Titanic (1997)',
 'Shawshank Redemption, The (1994)',
 'Rear Window (1954)',
 'Star Wars (1977)']

x = np.linspace(-2,2,100)
a_s = [1,2,5,10,50] 
ys = [a * x**2 for a in a_s]
_,ax = plt.subplots(figsize=(8,6))
for a,y in zip(a_s,ys): ax.plot(x,y, label=f'a={a}')
ax.set_ylim([0,5])
ax.legend();

model = DotProductBias(n_users, n_movies, 50)
learn = Learner(dls, model, loss_func=MSELossFlat())
learn.fit_one_cycle(5, 5e-3, wd=0.1)

Not so good traing loss but at this time validation loss is far better.

Creating Our Own Embedding Module

class T(Module):
    def __init__(self): self.a = torch.ones(3)

L(T().parameters())

(#0) []

There is no pararameters, by its definition parameters must be trainable.

type(torch.ones(3)[0])

torch.Tensor

Note: this is tensor not parameter.So it is not trainable. No gradiend tracking.

from the book:

To tell Module that we want to treat a tensor as a parameter, we have to wrap it in the nn.Parameter class. This class doesn't actually add any functionality (other than automatically calling requires_grad_ for us). It's only used as a "marker" to show what to include in parameters:

class T(Module):
    def __init__(self): self.a = nn.Parameter(torch.ones(3))

L(T().parameters())

(#1) [Parameter containing:
tensor([1., 1., 1.], requires_grad=True)]

and

class T(Module):
    def __init__(self): self.a = nn.Linear(1, 3, bias=False)

t = T()
L(t.parameters())

(#1) [Parameter containing:
tensor([[-0.0643],
        [-0.8105],
        [ 0.1346]], requires_grad=True)]

type(t.a.weight)

torch.nn.parameter.Parameter

Note: This is a parameter

type(t.a.weight.data)

torch.Tensor

Note: This is not.

Custom Embedding without using Pytorch `Embedding`

def create_params(size):
    return nn.Parameter(torch.zeros(*size).normal_(0, 0.01))

doc(create_params)

class DotProductBias(Module):
    def __init__(self, n_users, n_movies, n_factors, y_range=(0,5.5)):
        self.user_factors = create_params([n_users, n_factors])
        self.user_bias = create_params([n_users])
        self.movie_factors = create_params([n_movies, n_factors])
        self.movie_bias = create_params([n_movies])
        self.y_range = y_range
        
    def forward(self, x):
        users = self.user_factors[x[:,0]]
        movies = self.movie_factors[x[:,1]]
        res = (users*movies).sum(dim=1)
        res += self.user_bias[x[:,0]] + self.movie_bias[x[:,1]]
        return sigmoid_range(res, *self.y_range)

model = DotProductBias(n_users, n_movies, 50)
learn = Learner(dls, model, loss_func=MSELossFlat())
learn.fit_one_cycle(5, 5e-3, wd=0.1)

Very similiar results.

Interpreting Embeddings and Biases

Lowest biases in the model.

movie_bias=learn.model.movie_bias.squeeze()
idxs=movie_bias.argsort()[0:5]
[dls.classes['title'][i] for i in idxs]

['Children of the Corn: The Gathering (1996)',
 'Crow: City of Angels, The (1996)',
 'Jury Duty (1995)',
 'Mortal Kombat: Annihilation (1997)',
 'Cable Guy, The (1996)']

from the book:

Think about what this means. What it's saying is that for each of these movies, even when a user is very well matched to its latent factors (which, as we will see in a moment, tend to represent things like level of action, age of movie, and so forth), they still generally don't like it. We could have simply sorted the movies directly by their average rating, but looking at the learned bias tells us something much more interesting. It tells us not just whether a movie is of a kind that people tend not to enjoy watching, but that people tend not to like watching it even if it is of a kind that they would otherwise enjoy! By the same token, here are the movies with the highest bias:

idxs = movie_bias.argsort(descending=True)[:5]
[dls.classes['title'][i] for i in idxs]

['Titanic (1997)',
 "Schindler's List (1993)",
 'Star Wars (1977)',
 'Shawshank Redemption, The (1994)',
 'Rear Window (1954)']

g = ratings.groupby('title')['rating'].count()
top_movies = g.sort_values(ascending=False).index.values[:1000]
top_idxs = tensor([learn.dls.classes['title'].o2i[m] for m in top_movies])
movie_w = learn.model.movie_factors[top_idxs].cpu().detach()
movie_pca = movie_w.pca(3)
fac0,fac1,fac2 = movie_pca.t()
idxs = list(range(50))
X = fac0[idxs]
Y = fac2[idxs]
plt.figure(figsize=(12,12))
plt.scatter(X, Y)
for i, x, y in zip(top_movies[idxs], X, Y):
    plt.text(x,y,i, color=np.random.rand(3)*0.7, fontsize=11)
plt.show()

EmbeddingDotBias(
  (u_weight): Embedding(944, 50)
  (i_weight): Embedding(1665, 50)
  (u_bias): Embedding(944, 1)
  (i_bias): Embedding(1665, 1)
)

["Schindler's List (1993)",
 'Titanic (1997)',
 'Shawshank Redemption, The (1994)',
 'Rear Window (1954)',
 'Star Wars (1977)']

'Body Snatcher, The (1945)'

[(944, 74), (1665, 102)]

g = ratings.groupby('title')['rating'].count()
top_movies = g.sort_values(ascending=False).index.values[:1000]
top_idxs = tensor([learn.dls.classes['title'].o2i[m] for m in top_movies])
movie_w = learn.model.movie_factors[top_idxs].cpu().detach()
movie_pca = movie_w.pca(3)
fac0,fac1,fac2 = movie_pca.t()
idxs = list(range(50))
X = fac0[idxs]
Y = fac2[idxs]
plt.figure(figsize=(12,12))
plt.scatter(X, Y)
for i, x, y in zip(top_movies[idxs], X, Y):
    plt.text(x,y,i, color=np.random.rand(3)*0.7, fontsize=11)
plt.show()

Lets try changing X axis.

g = ratings.groupby('title')['rating'].count()
top_movies = g.sort_values(ascending=False).index.values[:1000]
top_idxs = tensor([learn.dls.classes['title'].o2i[m] for m in top_movies])
movie_w = learn.model.movie_factors[top_idxs].cpu().detach()
movie_pca = movie_w.pca(3)
fac0,fac1,fac2 = movie_pca.t()
idxs = list(range(50))
X = fac1[idxs]
Y = fac2[idxs]
plt.figure(figsize=(12,12))
plt.scatter(X, Y)
for i, x, y in zip(top_movies[idxs], X, Y):
    plt.text(x,y,i, color=np.random.rand(3)*0.7, fontsize=11)
plt.show()

Very interesting to study changes.

Using fastai.collab¶

Same thing with fastai collab_learner

learn = collab_learner(dls, n_factors=50, y_range=(0, 5.5))

learn.fit_one_cycle(5, 5e-3, wd=0.1)

learn.model

EmbeddingDotBias(
  (u_weight): Embedding(944, 50)
  (i_weight): Embedding(1665, 50)
  (u_bias): Embedding(944, 1)
  (i_bias): Embedding(1665, 1)
)

movie_bias = learn.model.i_bias.weight.squeeze()
idxs = movie_bias.argsort(descending=True)[:5]
[dls.classes['title'][i] for i in idxs]

["Schindler's List (1993)",
 'Titanic (1997)',
 'Shawshank Redemption, The (1994)',
 'Rear Window (1954)',
 'Star Wars (1977)']

Embedding Distance

Basically it means if two movies has similar latent factors.(embedding vector) This is the movie very similar latent factors with Silence of the lambs.

movie_factors = learn.model.i_weight.weight
idx = dls.classes['title'].o2i['Silence of the Lambs, The (1991)']
distances = nn.CosineSimilarity(dim=1)(movie_factors, movie_factors[idx][None])
idx = distances.argsort(descending=True)[1]
dls.classes['title'][idx]

'Body Snatcher, The (1945)'

Bootstrapping a Collaborative Filtering Model

read the all section from the original book at page 270 (3rd release) or the course notebook

Deep Learning for Collaborative Filtering

First fastai could make a recommendation for right embedding sizes(latent factors).

embs = get_emb_sz(dls)
embs

[(944, 74), (1665, 102)]

class CollabNN(Module):
    def __init__(self, user_sz, item_sz, y_range=(0,5.5), n_act=100):
        self.user_factors = Embedding(*user_sz)
        self.item_factors = Embedding(*item_sz)
        self.layers = nn.Sequential(
            nn.Linear(user_sz[1]+item_sz[1], n_act),
            nn.ReLU(),
            nn.Linear(n_act, 1))
        self.y_range = y_range
        
    def forward(self, x):
        embs = self.user_factors(x[:,0]),self.item_factors(x[:,1])
        x = self.layers(torch.cat(embs, dim=1))
        return sigmoid_range(x, *self.y_range)

model = CollabNN(*embs)

learn = Learner(dls, model, loss_func=MSELossFlat())
learn.fit_one_cycle(5, 5e-3, wd=0.01)

with one step.

above is possibble(again) with collab_learner with one step. just use use_nn=True.

learn = collab_learner(dls, use_nn=True, y_range=(0, 5.5), layers=[100,50])
learn.fit_one_cycle(5, 5e-3, wd=0.1)

from the book:

Although the results of EmbeddingNN are a bit worse than the dot product approach (which shows the power of carefully constructing an architecture for a domain), it does allow us to do something very important: we can now directly incorporate other user and movie information, date and time information, or any other information that may be relevant to the recommendation. That's exactly what TabularModel does. In fact, we've now seen that EmbeddingNN is just a TabularModel, with n_cont=0 and out_sz=1. So, we'd better spend some time learning about TabularModel, and how to use it to get great results! We'll do that in the next chapter.

	user	movie	rating	timestamp
0	196	242	3	881250949
1	186	302	3	891717742
2	22	377	1	878887116
3	244	51	2	880606923
4	166	346	1	886397596

	movie	title
0	1	Toy Story (1995)
1	2	GoldenEye (1995)
2	3	Four Rooms (1995)
3	4	Get Shorty (1995)
4	5	Copycat (1995)

	user	movie	rating	timestamp	title
0	196	242	3	881250949	Kolya (1996)
1	63	242	3	875747190	Kolya (1996)
2	226	242	5	883888671	Kolya (1996)
3	154	242	3	879138235	Kolya (1996)
4	306	242	5	876503793	Kolya (1996)

	user	title	rating
0	581	Brassed Off (1996)	3
1	864	Jaws (1975)	4
2	873	Contact (1997)	3
3	58	Wings of Desire (1987)	3
4	497	Hard Target (1993)	2
5	892	Jungle Book, The (1994)	4
6	43	Santa Clause, The (1994)	3
7	751	Strictly Ballroom (1992)	4
8	894	Mighty Aphrodite (1995)	4
9	390	Spitfire Grill, The (1996)	5

epoch	train_loss	valid_loss	time
0	1.382019	1.291539	00:06
1	1.064109	1.072716	00:06
2	0.977546	0.980324	00:06
3	0.869058	0.885319	00:06
4	0.803102	0.871484	00:07

epoch	train_loss	valid_loss	time
0	0.991997	0.972494	00:08
1	0.856079	0.889023	00:08
2	0.677160	0.858434	00:07
3	0.455940	0.864097	00:07
4	0.371842	0.868755	00:07

epoch	train_loss	valid_loss	time
0	0.950526	0.923924	00:08
1	0.811043	0.851933	00:08
2	0.609098	0.852216	00:08
3	0.400987	0.877794	00:08
4	0.289632	0.884916	00:08

epoch	train_loss	valid_loss	time
0	0.977155	0.931246	00:08
1	0.870503	0.858914	00:07
2	0.746876	0.823343	00:07
3	0.573917	0.810015	00:07
4	0.480767	0.810702	00:07

epoch	train_loss	valid_loss	time
0	0.925678	0.936475	00:08
1	0.820004	0.864908	00:08
2	0.718156	0.818959	00:08
3	0.589835	0.812693	00:08
4	0.462965	0.813873	00:08

epoch	train_loss	valid_loss	time
0	0.940206	0.939116	00:07
1	0.886674	0.867349	00:07
2	0.750853	0.824058	00:07
3	0.610644	0.811104	00:07
4	0.497868	0.812237	00:07

epoch	train_loss	valid_loss	time
0	0.959750	0.937144	00:09
1	0.919930	0.894244	00:08
2	0.857974	0.870025	00:08
3	0.814399	0.854047	00:08
4	0.763636	0.860031	00:08

epoch	train_loss	valid_loss	time
0	0.989683	0.959795	00:10
1	0.902582	0.904747	00:10
2	0.864139	0.879289	00:10
3	0.824376	0.847727	00:10
4	0.790679	0.850178	00:10

Exploring the data

How model learn about our preferences

Latent Factors

How to create Dataloaders

More PyTorch Less Python

Collaborative Filtering from Scratch

Let's train

Bias

Weight Decay (or L2 regularization)

Creating Our Own Embedding Module

Custom Embedding without using Pytorch Embedding

Interpreting Embeddings and Biases

Using fastai.collab¶

Embedding Distance

Bootstrapping a Collaborative Filtering Model

Deep Learning for Collaborative Filtering

with one step.

Custom Embedding without using Pytorch `Embedding`