My Deep Learning for Coders with fastai & PyTorch has arrived. It is very good for taking notes on it directly. (Some chapters are slightly different from the notebook version)

#!pip install -Uqq fastbook
import fastbook
from fastbook import *

Multi-Label Classification

I think main theme of this lesson is "Binary Cross Entropy". It is important when a photo have more than one category or when there is no category to find. Think about the Bear classifier. It classifies between two. But if there is an another bear breed in the picture it just tries to pick either black or grizzlie anyway which is not good, or if there is one grizzlie and one rabbit, probably it would be confused between these labels, at least its confidence level would be lower.

The Dataset

from import *
path = untar_data(URLs.PASCAL_2007)
df = pd.read_csv(path/'train.csv')
fname labels is_valid
0 000005.jpg chair True
1 000007.jpg car True
2 000009.jpg horse person True
3 000012.jpg car False
4 000016.jpg bicycle True

Note: pd at the beginning is pandas it is library for creating data frames df from csv files. Data frame is a table contains columns and rows. there are some rows that contains more than one labels. Check row no 2.

Pandas and DataFrames

0       000005.jpg
1       000007.jpg
2       000009.jpg
3       000012.jpg
4       000016.jpg
5006    009954.jpg
5007    009955.jpg
5008    009958.jpg
5009    009959.jpg
5010    009961.jpg
Name: fname, Length: 5011, dtype: object

Note: Very easy to navigate in a dataframe.
# Trailing :s are always optional (in numpy, pytorch, pandas, etc.),
#   so this is equivalent:
fname       000005.jpg
labels           chair
is_valid          True
Name: 0, dtype: object
0       000005.jpg
1       000007.jpg
2       000009.jpg
3       000012.jpg
4       000016.jpg
5006    009954.jpg
5007    009955.jpg
5008    009958.jpg
5009    009959.jpg
5010    009961.jpg
Name: fname, Length: 5011, dtype: object

Note: it is possible to use column names to select a column in the dataframe.
tmp_df = pd.DataFrame({'a':[1,2], 'b':[3,4]})
a b
0 1 3
1 2 4
tmp_df['c'] = tmp_df['a']+tmp_df['b']
a b c
0 1 3 4
1 2 4 6

Note: It is also possible to create new column.

Note: From the book:

Pandas is a fast and flexible library, and an important part of every data scientist’s Python toolbox. Unfortunately, its API can be rather confusing and surprising, so it takes a while to get familiar with it. If you haven’t used Pandas before, we’d suggest going through a tutorial; we are particularly fond of the book Python for Data Analysis by Wes McKinney, the creator of Pandas (O'Reilly). It also covers other important libraries like matplotlib and numpy. We will try to briefly describe Pandas functionality we use as we come across it, but will not go into the level of detail of McKinney’s book.

Constructing a DataBlock

Dataset and Dataloader

  • Dataset: Anything in which we can index to it and you can take the length to it.
  • DataLoader: An iterator that provides a stream of mini-batches, where each mini-batch is a tuple of a batch of independent variables and a batch of dependent variables
a = list (enumerate(string.ascii_lowercase))
((0, 'a'), 26)

Note: above index and length.
dl_a = DataLoader(a, batch_size=8, shuffle=True)
b = first(dl_a)
(tensor([ 6, 11, 13,  0,  8, 22, 10,  3]),
 ('g', 'l', 'n', 'a', 'i', 'w', 'k', 'd'))

Note: batch_size=8 for mini-batch size. first just takes the first batch.
[(tensor(6), 'g'),
 (tensor(11), 'l'),
 (tensor(13), 'n'),
 (tensor(0), 'a'),
 (tensor(8), 'i'),
 (tensor(22), 'w'),
 (tensor(10), 'k'),
 (tensor(3), 'd')]

Note: This is how you can see which independent and dependent variables are correspond each other.
[(tensor(6), 'g'),
 (tensor(11), 'l'),
 (tensor(13), 'n'),
 (tensor(0), 'a'),
 (tensor(8), 'i'),
 (tensor(22), 'w'),
 (tensor(10), 'k'),
 (tensor(3), 'd')]

Note: Short cut for zipping.*try to understand) It used for transposing (JH said in the lesson video, check it how)

Dataset(s) and Dataloader(s)

  • Datasets: An object that contains a training Dataset and a validation Dataset
  • DataLoaders: An object that contains a training DataLoader and a validation DataLoader
a = list (string.ascii_lowercase)
('a', 26)

similar dataset as previous one. but there is no enumeration.

dss = Datasets(a)

For creating our dependent and independent variable we can use functions. e.g.:

def f1 (o): return o+'a'
def f2 (o): return o+'b'
dss = Datasets(a,[[f1]])
dss = Datasets(a,[[f1,f2]])

Note: that means if we have ’a’ in the inital dataset, our independent value should be ’aa’ and our dependent should be ’ab’. But it is not at the moment. [[f1,f2]]) is a list of lists and if we change the shape of the input arguments a bit:
dss = Datasets(a,[[f1],[f2]])
('aa', 'ab')

Note: Now we are good to go. Now we can create our Dataloaders from our Datasets.
dls = DataLoaders.from_dsets(dss, batch_size=4)
(('va', 'ra', 'ea', 'ua'), ('vb', 'rb', 'eb', 'ub'))

Note: Our dataloaders is ready. This is how we create dataloaders from scratch.

What is DataBlock ?

Note: There is much more easier way to create our datasets.
dblock = DataBlock()

Note: An empty DataBlock.
dsets = dblock.datasets(df)

Note: From the book:

We can create a Datasets object from this. The only thing needed is a source—in this case, our DataFrame df

(4009, 1002)

Our training and validation sets are ready. How? First: if we didn't give any argument for splitting then the split is random and the split ratio is %20.

x,y = dsets.train[0]
(fname       001293.jpg
 labels             dog
 is_valid          True
 Name: 646, dtype: object,
 fname       001293.jpg
 labels             dog
 is_valid          True
 Name: 646, dtype: object)

This is first row of the batch repeated twice. (this is how default value works)


Note: However we need file name (fname) as a independent and labels as dependent variables.
dblock = DataBlock(get_x = lambda r: r['fname'], get_y = lambda r: r['labels'])
dsets = dblock.datasets(df)
('006610.jpg', 'diningtable bottle person')

Note: like this.

Same thing with functions without lambda functions. Most of the time it is much more relevant because Lambda causes problems if you try to serialize the model.

def get_x(r): return r['fname']
def get_y(r): return r['labels']
dblock = DataBlock(get_x = get_x, get_y = get_y)
dsets = dblock.datasets(df)
('006828.jpg', 'bottle')
def get_x(r): return path/'train'/r['fname']
def get_y(r): return r['labels'].split(' ')
dblock = DataBlock(get_x = get_x, get_y = get_y)
dsets = dblock.datasets(df)
 ['bicycle', 'person'])

Note: all are ok again. Datasets object is ready, the shape is right.
dblock = DataBlock(blocks=(ImageBlock, MultiCategoryBlock),
                   get_x = get_x, get_y = get_y)
dsets = dblock.datasets(df)
(PILImage mode=RGB size=375x500,
 TensorMultiCategory([0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.]))

Note: blocks types are important, in previous lessons we used different type of blocks. Based on selected type Datablock gains additional capabilities. In the current one ImageBlock help us to see the information as image. MultiCategoryBlock encodes labels as a tensor that every index correspond to a object label. (onehot encoding). Only one thing I do not understand about it that how fastai understands number of categories.(total 20 now)
idxs = torch.where(dsets.train[0][1]==1.)[0]
(#1) ['bottle']

Note: Example above there are two categories. (it changes every run)
def splitter(df):
    train = df.index[~df['is_valid']].tolist()
    valid = df.index[df['is_valid']].tolist()
    return train,valid

dblock = DataBlock(blocks=(ImageBlock, MultiCategoryBlock),

dsets = dblock.datasets(df)
(PILImage mode=RGB size=500x333,
 TensorMultiCategory([0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.]))

Note: ~ is a bitwise operation that reverses bits.
dblock = DataBlock(blocks=(ImageBlock, MultiCategoryBlock),
                   item_tfms = RandomResizedCrop(128, min_scale=0.35))
dls = dblock.dataloaders(df)
dls.show_batch(nrows=1, ncols=3)

What is Binary Cross-Entropy ?

learn = cnn_learner(dls, resnet18)
x,y = to_cpu(dls.train.one_batch())
activs = learn.model(x)
torch.Size([64, 20])

Note: model refers to resnet at this statement and activs are activation from the last layer of the resnet18 for one batch. Jeremy says ’learn.model(x)’ is plain pytorch. (didn’t know it)
tensor([ 1.1090, -1.4315,  2.8930,  0.5827,  3.0797,  2.5147, -1.3310,  1.7237, -0.2547,  0.3985, -0.2740, -0.1811, -1.5258, -1.0918, -1.7862,  0.3597, -0.4354, -0.1203,  2.2807, -0.3097],

Note: Values are not between 0 and 1.
def binary_cross_entropy(inputs, targets):
    inputs = inputs.sigmoid()
    return -torch.where(targets==1, inputs, 1-inputs).log().mean()

Note: a couple of things going on here:
- Sigmoid brings everything between zero and one.
  • Log just adjust results such a way based on their relative confidence level. (Check the section on Chapter -5)
  • Broadcasting. We'll get the results for every item.
loss_func = nn.BCEWithLogitsLoss()
loss = loss_func(activs, y)
TensorMultiCategory(1.0493, grad_fn=<AliasBackward>)

Important: Although there is a pytorch equivalent for our binary_cross_entropy ( F.binary_cross_entropy and nn.BCELoss) they don’t include sigmoid. So instead we use F.binary_cross_entropy_with_logits or nn.BCEWithLogitsLoss .

Note: Direct from the book:

We don't actually need to tell fastai to use this loss function (although we can if we want) since it will be automatically chosen for us. fastai knows that the DataLoaders has multiple category labels, so it will use nn.BCEWithLogitsLoss by default.

One change compared to the last chapter is the metric we use: because this is a multilabel problem, we can't use the accuracy function. Why is that? Well, accuracy was comparing our outputs to our targets like so:

def accuracy(inp, targ, axis=-1):
    "Compute accuracy with `targ` when `pred` is bs * n_classes"
    pred = inp.argmax(dim=axis)
    return (pred == targ).float().mean()

The class predicted was the one with the highest activation (this is what argmax does). Here it doesn't work because we could have more than one prediction on a single image. After applying the sigmoid to our activations (to make them between 0 and 1), we need to decide which ones are 0s and which ones are 1s by picking a threshold. Each value above the threshold will be considered as a 1, and each value lower than the threshold will be considered a 0:

def accuracy_multi(inp, targ, thresh=0.5, sigmoid=True):
    "Compute accuracy when `inp` and `targ` are the same size."
    if sigmoid: inp = inp.sigmoid()
    return ((inp>thresh)==targ.bool()).float().mean()

Note: We need to pass our accuracy function to the learner for getting accuracy. See learn statement below. The only problem is our default value is 0.5. When we need an another value, we need to use Python partial functionality. See usage below or check original file check original file and video.

What is partial function ?

it is used when there is a need to change default values.

def say_hello(name, say_what="Hello"): return f"{say_what} {name}."
say_hello('Jeremy'),say_hello('Jeremy', 'Ahoy!')
('Hello Jeremy.', 'Ahoy! Jeremy.')
f = partial(say_hello, say_what="Bonjour")
('Bonjour Jeremy.', 'Bonjour Sylvain.')
learn = cnn_learner(dls, resnet50, metrics=partial(accuracy_multi, thresh=0.2))
learn.fine_tune(3, base_lr=3e-3, freeze_epochs=4)
epoch train_loss valid_loss accuracy_multi time
0 0.936221 0.694192 0.238267 00:11
1 0.821117 0.555462 0.291215 00:11
2 0.600436 0.203543 0.820837 00:10
3 0.357310 0.127290 0.935956 00:10
epoch train_loss valid_loss accuracy_multi time
0 0.133780 0.117657 0.943227 00:12
1 0.117373 0.106338 0.949582 00:12
2 0.097523 0.102176 0.954183 00:12

How to find right threshold ?

learn.metrics = partial(accuracy_multi, thresh=0.1)
(#2) [0.10217633843421936,0.9333268404006958]

Note: Low threshold (selects even on low confidence)
learn.metrics = partial(accuracy_multi, thresh=0.99)
(#2) [0.10217633843421936,0.9434263110160828]

Note: High threshold (selects only on high confidence)
preds,targs = learn.get_preds()
accuracy_multi(preds, targs, thresh=0.9, sigmoid=False)

Note: This a better way to pick the right threshold value.Testing for a range of values.
xs = torch.linspace(0.05,0.95,29)
accs = [accuracy_multi(preds, targs, thresh=i, sigmoid=False) for i in xs]

0.5 looks best.

Image Regression

Note: Classification is used for finding right classes and Regression for continuous values, e.g. house prices or a coordinates of something or length etc...

Assemble the Data

path = untar_data(URLs.BIWI_HEAD_POSE)

Note: In this exaple we will find center point of a heads.
Path.BASE_PATH = path
(#50) [Path('01'),Path('01.obj'),Path('02'),Path('02.obj'),Path('03'),Path('03.obj'),Path('04'),Path('04.obj'),Path('05'),Path('05.obj')...]
(#1000) [Path('01/'),Path('01/frame_00003_pose.txt'),Path('01/frame_00003_rgb.jpg'),Path('01/frame_00004_pose.txt'),Path('01/frame_00004_rgb.jpg'),Path('01/frame_00005_pose.txt'),Path('01/frame_00005_rgb.jpg'),Path('01/frame_00006_pose.txt'),Path('01/frame_00006_rgb.jpg'),Path('01/frame_00007_pose.txt')...]

Note: From the book:
  • Inside the subdirectories, we have different frames, each of them come with an image (_rgb.jpg) and a pose file (_pose.txt). We can easily get all the image files recursively with get_image_files, then write a function that converts an image filename to its associated pose file:
img_files = get_image_files(path)
def img2pose(x): return Path(f'{str(x)[:-7]}pose.txt')

img2pose creates a path based for coordinate file based on image name.

im = PILImage.create(img_files[0])
(480, 640)
cal = np.genfromtxt(path/'01'/'', skip_footer=6)
def get_ctr(f):
    ctr = np.genfromtxt(img2pose(f), skip_header=3)
    c1 = ctr[0] * cal[0][0]/ctr[2] + cal[0][2]
    c2 = ctr[1] * cal[1][1]/ctr[2] + cal[1][2]
    return tensor([c1,c2])

Note: From the book:
- The Biwi dataset website used to explain the format of the pose text file associated with each image, which shows the location of the center of the head. The details of this aren't important for our purposes, so we'll just show the function we use to extract the head center point:
tensor([447.7369, 283.9802])
biwi = DataBlock(
    blocks=(ImageBlock, PointBlock),
    splitter=FuncSplitter(lambda o:'13'),

Note: Most important thing about this DataBlock is splitter, basically we only use person no 13 see explanation(of course lots of pics of the person no:13), if we’d split randomly then there would be a very high chance for same person to be in the both training and validations sets. (there are lots of pictures of same person in this dataset)
Also see dependent variable is continious value which is PointBlock as coordinates. There is also a normalization process as batch_tmfs.
dls = biwi.dataloaders(path)
dls.show_batch(max_n=9, figsize=(8,6))
xb,yb = dls.one_batch()
(torch.Size([64, 3, 240, 320]), torch.Size([64, 1, 2]))

(torch.Size([64, 3, 240, 320]), torch.Size([64, 1, 2])) is important.

  • 64 : batch size
  • 3 : RGB
  • 240,320: image size
  • 1,2 : one row with two values (one point with two coordinates)
TensorPoint([[-0.1250,  0.0771]], device='cuda:0')

Dependent variable.

Training the Model

learn = cnn_learner(dls, resnet18, y_range=(-1,1))

y_range=(-1,1) is important we tell fast ai that we need results in this range.(for coordinates) y_range is implemented in fastai using sigmoid_range, which is defined as:

def sigmoid_range(x, lo, hi): return torch.sigmoid(x) * (hi-lo) + lo
plot_function(partial(sigmoid_range,lo=-1,hi=1), min=-4, max=4)
FlattenedLoss of MSELoss()

Default value

/home/niyazi/anaconda3/envs/fastbook/lib/python3.8/site-packages/fastai/callback/ UserWarning: color is redundantly defined by the 'color' keyword argument and the fmt string "ro" (-> color='r'). The keyword argument will take precedence.
  ax.plot(val, idx, 'ro', label=nm, c=color)
lr = 1e-2
learn.fine_tune(3, lr)
epoch train_loss valid_loss time
0 0.049803 0.002713 00:46
epoch train_loss valid_loss time
0 0.008684 0.002087 00:56
1 0.003187 0.000621 00:56
2 0.001467 0.000064 00:57
learn.show_results(ds_idx=1, nrows=3, figsize=(6,8))


  1. How could multi-label classification improve the usability of the bear classifier?
  2. How do we encode the dependent variable in a multi-label classification problem?
  3. How do you access the rows and columns of a DataFrame as if it was a matrix?
  4. How do you get a column by name from a DataFrame?
  5. What is the difference between a Dataset and DataLoader?
  6. What does a Datasets object normally contain?
  7. What does a DataLoaders object normally contain?
  8. What does lambda do in Python?
  9. What are the methods to customize how the independent and dependent variables are created with the data block API?
  10. Why is softmax not an appropriate output activation function when using a one hot encoded target?
  11. Why is nll_loss not an appropriate loss function when using a one-hot-encoded target?
  12. What is the difference between nn.BCELoss and nn.BCEWithLogitsLoss?
  13. Why can't we use regular accuracy in a multi-label problem?
  14. When is it okay to tune a hyperparameter on the validation set?
  15. How is y_range implemented in fastai? (See if you can implement it yourself and test it without peeking!)
  16. What is a regression problem? What loss function should you use for such a problem?
  17. What do you need to do to make sure the fastai library applies the same data augmentation to your inputs images and your target point coordinates?

Further Research

  1. Read a tutorial about Pandas DataFrames and experiment with a few methods that look interesting to you. See the book's website for recommended tutorials.
  2. Retrain the bear classifier using multi-label classification. See if you can make it work effectively with images that don't contain any bears, including showing that information in the web application. Try an image with two different kinds of bears. Check whether the accuracy on the single-label dataset is impacted using multi-label classification.