Chapter 6 - Other Computer Vision Problems
Deep Learning For Coders with fastai & Pytorch - Multi-Label Classification, Regression. In this notebook, I followed both Jeremy Howard's Lesson on fast.ai and Weigh and Biases reading group videos. Lots of notes added. .Click `open in colab` button at the right side to view as notebook.
My Deep Learning for Coders with fastai & PyTorch
has arrived. It is very good for taking notes on it directly. (Some chapters are slightly different from the notebook version)
#!pip install -Uqq fastbook
import fastbook
fastbook.setup_book()
from fastbook import *
Multi-Label Classification
I think main theme of this lesson is "Binary Cross Entropy". It is important when a photo have more than one category or when there is no category to find. Think about the Bear classifier. It classifies between two. But if there is an another bear breed in the picture it just tries to pick either black or grizzlie anyway which is not good, or if there is one grizzlie and one rabbit, probably it would be confused between these labels, at least its confidence level would be lower.
from fastai.vision.all import *
path = untar_data(URLs.PASCAL_2007)
df = pd.read_csv(path/'train.csv')
df.head()
pd
at the beginning is pandas
it is library for creating data frames df
from csv files. Data frame is a table contains columns and rows. there are some rows that contains more than one labels. Check row no 2.
df.iloc[:,0]
df.iloc[0,:]
# Trailing :s are always optional (in numpy, pytorch, pandas, etc.),
# so this is equivalent:
df.iloc[0]
df['fname']
tmp_df = pd.DataFrame({'a':[1,2], 'b':[3,4]})
tmp_df
tmp_df['c'] = tmp_df['a']+tmp_df['b']
tmp_df
Pandas is a fast and flexible library, and an important part of every data scientist’s Python toolbox. Unfortunately, its API can be rather confusing and surprising, so it takes a while to get familiar with it. If you haven’t used Pandas before, we’d suggest going through a tutorial; we are particularly fond of the book Python for Data Analysis by Wes McKinney, the creator of Pandas (O'Reilly). It also covers other important libraries like matplotlib
and numpy
. We will try to briefly describe Pandas functionality we use as we come across it, but will not go into the level of detail of McKinney’s book.
-
Dataset
: Anything in which we can index to it and you can take the length to it. -
DataLoader
: An iterator that provides a stream of mini-batches, where each mini-batch is a tuple of a batch of independent variables and a batch of dependent variables
a = list (enumerate(string.ascii_lowercase))
a[0],len(a)
dl_a = DataLoader(a, batch_size=8, shuffle=True)
b = first(dl_a)
b
batch_size=8
for mini-batch size. first
just takes the first batch.
list(zip(b[0],b[1]))
list(zip(*b))
-
Datasets
: An object that contains a trainingDataset
and a validationDataset
-
DataLoaders
: An object that contains a trainingDataLoader
and a validationDataLoader
a = list (string.ascii_lowercase)
a[0],len(a)
similar dataset as previous one. but there is no enumeration.
dss = Datasets(a)
dss[0]
For creating our dependent and independent variable we can use functions. e.g.:
def f1 (o): return o+'a'
def f2 (o): return o+'b'
dss = Datasets(a,[[f1]])
dss[0]
dss = Datasets(a,[[f1,f2]])
dss[0]
[[f1,f2]])
is a list of lists and if we change the shape of the input arguments a bit:
dss = Datasets(a,[[f1],[f2]])
dss[0]
Dataloaders
from our Datasets
.
dls = DataLoaders.from_dsets(dss, batch_size=4)
first(dls.train)
dataloaders
is ready. This is how we create dataloaders from scratch.
dblock = DataBlock()
DataBlock
.
dsets = dblock.datasets(df)
We can create a Datasets
object from this. The only thing needed is a source—in this case, our DataFrame df
len(dsets.train),len(dsets.valid)
Our training and validation sets are ready. How? First: if we didn't give any argument for splitting then the split is random and the split ratio is %20.
x,y = dsets.train[0]
x,y
This is first row of the batch repeated twice. (this is how default value works)
x['fname']
fname
) as a independent and labels
as dependent variables.
dblock = DataBlock(get_x = lambda r: r['fname'], get_y = lambda r: r['labels'])
dsets = dblock.datasets(df)
dsets.train[0]
Same thing with functions without lambda functions. Most of the time it is much more relevant because Lambda causes problems if you try to serialize the model.
def get_x(r): return r['fname']
def get_y(r): return r['labels']
dblock = DataBlock(get_x = get_x, get_y = get_y)
dsets = dblock.datasets(df)
dsets.train[0]
def get_x(r): return path/'train'/r['fname']
def get_y(r): return r['labels'].split(' ')
dblock = DataBlock(get_x = get_x, get_y = get_y)
dsets = dblock.datasets(df)
dsets.train[0]
Datasets
object is ready, the shape is right.
dblock = DataBlock(blocks=(ImageBlock, MultiCategoryBlock),
get_x = get_x, get_y = get_y)
dsets = dblock.datasets(df)
dsets.train[0]
blocks
types are important, in previous lessons we used different type of blocks. Based on selected type Datablock
gains additional capabilities. In the current one ImageBlock
help us to see the information as image. MultiCategoryBlock encodes labels as a tensor that every index correspond to a object label. (onehot encoding). Only one thing I do not understand about it that how fastai understands number of categories.(total 20 now)
idxs = torch.where(dsets.train[0][1]==1.)[0]
print(idxs)
dsets.train.vocab[idxs]
def splitter(df):
train = df.index[~df['is_valid']].tolist()
valid = df.index[df['is_valid']].tolist()
return train,valid
dblock = DataBlock(blocks=(ImageBlock, MultiCategoryBlock),
splitter=splitter,
get_x=get_x,
get_y=get_y)
dsets = dblock.datasets(df)
dsets.train[0]
~
is a bitwise operation that reverses bits.
dblock = DataBlock(blocks=(ImageBlock, MultiCategoryBlock),
splitter=splitter,
get_x=get_x,
get_y=get_y,
item_tfms = RandomResizedCrop(128, min_scale=0.35))
dls = dblock.dataloaders(df)
dls.show_batch(nrows=1, ncols=3)
learn = cnn_learner(dls, resnet18)
x,y = to_cpu(dls.train.one_batch())
activs = learn.model(x)
activs.shape
model
refers to resnet
at this statement and activs
are activation from the last layer of the resnet18 for one batch. Jeremy says ’learn.model(x)’ is plain pytorch. (didn’t know it)
activs[0]
def binary_cross_entropy(inputs, targets):
inputs = inputs.sigmoid()
return -torch.where(targets==1, inputs, 1-inputs).log().mean()
Sigmoid
brings everything between zero and one.
-
Log
just adjust results such a way based on their relative confidence level. (Check the section on Chapter -5) -
Broadcasting
. We'll get the results for every item.
loss_func = nn.BCEWithLogitsLoss()
loss = loss_func(activs, y)
loss
binary_cross_entropy
( F.binary_cross_entropy
and nn.BCELoss
) they don’t include sigmoid
. So instead we use F.binary_cross_entropy_with_logits
or nn.BCEWithLogitsLoss
.
We don't actually need to tell fastai to use this loss function (although we can if we want) since it will be automatically chosen for us. fastai knows that the DataLoaders
has multiple category labels, so it will use nn.BCEWithLogitsLoss
by default.
One change compared to the last chapter is the metric we use: because this is a multilabel problem, we can't use the accuracy function. Why is that? Well, accuracy was comparing our outputs to our targets like so:
def accuracy(inp, targ, axis=-1):
"Compute accuracy with `targ` when `pred` is bs * n_classes"
pred = inp.argmax(dim=axis)
return (pred == targ).float().mean()
The class predicted was the one with the highest activation (this is what argmax
does). Here it doesn't work because we could have more than one prediction on a single image. After applying the sigmoid to our activations (to make them between 0 and 1), we need to decide which ones are 0s and which ones are 1s by picking a threshold. Each value above the threshold will be considered as a 1, and each value lower than the threshold will be considered a 0:
def accuracy_multi(inp, targ, thresh=0.5, sigmoid=True):
"Compute accuracy when `inp` and `targ` are the same size."
if sigmoid: inp = inp.sigmoid()
return ((inp>thresh)==targ.bool()).float().mean()
learn
statement below. The only problem is our default value is 0.5
. When we need an another value, we need to use Python partial
functionality. See usage below or check original file check original file and video.
it is used when there is a need to change default values.
def say_hello(name, say_what="Hello"): return f"{say_what} {name}."
say_hello('Jeremy'),say_hello('Jeremy', 'Ahoy!')
f = partial(say_hello, say_what="Bonjour")
f("Jeremy"),f("Sylvain")
learn = cnn_learner(dls, resnet50, metrics=partial(accuracy_multi, thresh=0.2))
learn.fine_tune(3, base_lr=3e-3, freeze_epochs=4)
learn.metrics = partial(accuracy_multi, thresh=0.1)
learn.validate()
learn.metrics = partial(accuracy_multi, thresh=0.99)
learn.validate()
preds,targs = learn.get_preds()
accuracy_multi(preds, targs, thresh=0.9, sigmoid=False)
xs = torch.linspace(0.05,0.95,29)
accs = [accuracy_multi(preds, targs, thresh=i, sigmoid=False) for i in xs]
plt.plot(xs,accs);
0.5 looks best.
Classification
is used for finding right classes and Regression
for continuous values, e.g. house prices or a coordinates of something or length etc...
path = untar_data(URLs.BIWI_HEAD_POSE)
Path.BASE_PATH = path
path.ls().sorted()
(path/'01').ls().sorted()
- Inside the subdirectories, we have different frames, each of them come with an image (_rgb.jpg) and a pose file (_pose.txt). We can easily get all the image files recursively with
get_image_files
, then write a function that converts an image filename to its associated pose file:
img_files = get_image_files(path)
def img2pose(x): return Path(f'{str(x)[:-7]}pose.txt')
img2pose(img_files[0])
img2pose
creates a path based for coordinate file based on image name.
im = PILImage.create(img_files[0])
im.shape
im.to_thumb(160)
cal = np.genfromtxt(path/'01'/'rgb.cal', skip_footer=6)
def get_ctr(f):
ctr = np.genfromtxt(img2pose(f), skip_header=3)
c1 = ctr[0] * cal[0][0]/ctr[2] + cal[0][2]
c2 = ctr[1] * cal[1][1]/ctr[2] + cal[1][2]
return tensor([c1,c2])
get_ctr(img_files[0])
biwi = DataBlock(
blocks=(ImageBlock, PointBlock),
get_items=get_image_files,
get_y=get_ctr,
splitter=FuncSplitter(lambda o: o.parent.name=='13'),
batch_tfms=[*aug_transforms(size=(240,320)),
Normalize.from_stats(*imagenet_stats)]
)
splitter
, basically we only use person no 13 see explanation(of course lots of pics of the person no:13), if we’d split randomly then there would be a very high chance for same person to be in the both training and validations sets. (there are lots of pictures of same person in this dataset)
PointBlock
as coordinates.
There is also a normalization process as batch_tmfs
.
dls = biwi.dataloaders(path)
dls.show_batch(max_n=9, figsize=(8,6))
xb,yb = dls.one_batch()
xb.shape,yb.shape
(torch.Size([64, 3, 240, 320]), torch.Size([64, 1, 2]))
is important.
- 64 : batch size
- 3 : RGB
- 240,320: image size
- 1,2 : one row with two values (one point with two coordinates)
yb[0]
Dependent variable.
learn = cnn_learner(dls, resnet18, y_range=(-1,1))
y_range=(-1,1)
is important we tell fast ai that we need results in this range.(for coordinates) y_range is implemented in fastai using sigmoid_range, which is defined as:
def sigmoid_range(x, lo, hi): return torch.sigmoid(x) * (hi-lo) + lo
plot_function(partial(sigmoid_range,lo=-1,hi=1), min=-4, max=4)
dls.loss_func
Default value
learn.lr_find()
lr = 1e-2
learn.fine_tune(3, lr)
math.sqrt(0.0001)
learn.show_results(ds_idx=1, nrows=3, figsize=(6,8))
- How could multi-label classification improve the usability of the bear classifier?
- How do we encode the dependent variable in a multi-label classification problem?
- How do you access the rows and columns of a DataFrame as if it was a matrix?
- How do you get a column by name from a DataFrame?
- What is the difference between a
Dataset
andDataLoader
? - What does a
Datasets
object normally contain? - What does a
DataLoaders
object normally contain? - What does
lambda
do in Python? - What are the methods to customize how the independent and dependent variables are created with the data block API?
- Why is softmax not an appropriate output activation function when using a one hot encoded target?
- Why is
nll_loss
not an appropriate loss function when using a one-hot-encoded target? - What is the difference between
nn.BCELoss
andnn.BCEWithLogitsLoss
? - Why can't we use regular accuracy in a multi-label problem?
- When is it okay to tune a hyperparameter on the validation set?
- How is
y_range
implemented in fastai? (See if you can implement it yourself and test it without peeking!) - What is a regression problem? What loss function should you use for such a problem?
- What do you need to do to make sure the fastai library applies the same data augmentation to your inputs images and your target point coordinates?
- Read a tutorial about Pandas DataFrames and experiment with a few methods that look interesting to you. See the book's website for recommended tutorials.
- Retrain the bear classifier using multi-label classification. See if you can make it work effectively with images that don't contain any bears, including showing that information in the web application. Try an image with two different kinds of bears. Check whether the accuracy on the single-label dataset is impacted using multi-label classification.