Chapter 5 - Image Classification
Deep Learning For Coders with fastai & Pytorch - Image Classification, In this notebook I followed both Jeremy Howard's Lesson on fast.ai and Weigh and Biases reading group videos. Lots of notes added, some cell's order changed some are added to make the topic more understandable for me. (Check Manual calculation `log_softmax` + `nll_loss`). Click `open in colab` button at the right side to view as notebook.
- PLAYING WITH THE DATASET
- BASELINE MODEL
- FUNCTION FOR CLASSIFIYING MORE THAN TWO CATEGORY
- REVISITING THE BASELINE MODEL (Model Interpretation)
- IMPROVING THE MODEL
I'm a Doctor Who fan and this is my cyberman coffee cup, as I remember got it from Manchester Science Museum.
import fastbook
fastbook.setup_book()
%config Completer.use_jedi = False
from fastbook import *
[[chapter_pet_breeds]]
from fastai.vision.all import *
path = untar_data(URLs.PETS)
untar
we download the data. This data originally come from Oxford University Visual Geomety Group and our dataset is here:
path
Path.BASE_PATH = path
path
Now the path
is looks different.
path.ls()
#2
is number of item in the list. annotations
represents target variables of this datasets but we do not use them at this time instead we create our own labels.
(path/"images").ls()
fname = (path/"images").ls()[0]
fname
path
list.
re.findall(r'(.+)_\d+.jpg$', fname.name)
findall
method, Check geeksforgeeks.org
tutorial here
pets = DataBlock(blocks = (ImageBlock, CategoryBlock),
get_items=get_image_files,
splitter=RandomSplitter(seed=42),
get_y=using_attr(RegexLabeller(r'(.+)_\d+.jpg$'), 'name'),
item_tfms=Resize(460),
batch_tfms=aug_transforms(size=224, min_scale=0.75))
dls = pets.dataloaders(path/"images")
item_tmsf
and batch_transfdrms
may look a bit meaningless. Check below to find out why.
As a summary FastAi gives a chance to augment our images in a smarter way (presizing
) such that provide much more detail and information for the training. First, we presize images with item_tfms
then push them to GPU and use augmentation.
#caption A comparison of fastai's data augmentation strategy (left) and the traditional approach (right).
dblock1 = DataBlock(blocks=(ImageBlock(), CategoryBlock()),
get_y=parent_label,
item_tfms=Resize(460))
# Place an image in the 'images/grizzly.jpg' subfolder where this notebook is located before running this
dls1 = dblock1.dataloaders([(Path.cwd()/'images'/'chapter-05'/'grizzly.jpg')]*100, bs=8)
dls1.train.get_idxs = lambda: Inf.ones
x,y = dls1.valid.one_batch()
_,axs = subplots(1, 2)
x1 = TensorImage(x.clone())
x1 = x1.affine_coord(sz=224)
x1 = x1.rotate(draw=30, p=1.)
x1 = x1.zoom(draw=1.2, p=1.)
x1 = x1.warp(draw_x=-0.2, draw_y=0.2, p=1.)
tfms = setup_aug_tfms([Rotate(draw=30, p=1, size=224), Zoom(draw=1.2, p=1., size=224),
Warp(draw_x=-0.2, draw_y=0.2, p=1., size=224)])
x = Pipeline(tfms)(x)
#x.affine_coord(coord_tfm=coord_tfm, sz=size, mode=mode, pad_mode=pad_mode)
TensorImage(x[0]).show(ctx=axs[0])
TensorImage(x1[0]).show(ctx=axs[1]);
dls.show_batch(nrows=3, ncols=3)
pets1 = DataBlock(blocks = (ImageBlock, CategoryBlock),
get_items=get_image_files,
splitter=RandomSplitter(seed=42),
get_y=using_attr(RegexLabeller(r'(.+)_\d+.jpg$'), 'name'))
pets1.summary(path/"images")
pets1.summary(path/"images")
Check the summary above, it has lots of details. It is natural to get an error in this example because we are trying the put diffent size images into the same DataBlock
.
For every project, just start with a Baseline. Baseline is a good point to think about the project/domain/problem at the same time, then start improve and make experiments about architecture, hyperparameters etc.
learn = cnn_learner(dls, resnet34, metrics=error_rate)
learn.fine_tune(2)
learn.loss_func
learn.lr
loss_func
and learning rate lr
.
first(dls.train)
x,y = dls.one_batch()
dls.vocab
dls.vocab[0]
vocab
gives as all labels as text.
y
x
preds,_ = learn.get_preds(dl=[(x,y)])
preds[0]
_
len(preds[0]),preds[0].sum()
Prediction for 37 categories that adds up to one.
For classifiying more than two category, we need to employ a new function. It is not totally different than sigmoid, in fact it starts with a sigmoid function.
plot_function(torch.sigmoid, min=-4,max=4)
torch.sigmoid
squishes values between 0 and 1.
torch.random.manual_seed(42);
acts = torch.randn((6,2))*2
acts
acts.sigmoid()
(acts[:,0]-acts[:,1]).sigmoid()
this part is a bit different in the lesson video. so check the video. 1:35:20
sm_acts = torch.softmax(acts, dim=1)
sm_acts
torch.softmax
does that in one step. Now results for each item adds up to one and identical.
targ = tensor([0,1,0,1,1,0])
this is our softmax activations:
sm_acts
idx = range(6)
sm_acts[idx, targ]
lets see everything in a table:
from IPython.display import HTML
df = pd.DataFrame(sm_acts, columns=["3","7"])
df['targ'] = targ
df['idx'] = idx
df['loss'] = sm_acts[range(6), targ]
t = df.style.hide_index()
#To have html code compatible with our script
html = t._repr_html_().split('</style>')[1]
html = re.sub(r'<table id="([^"]+)"\s*>', r'<table >', html)
display(HTML(html))
-sm_acts[idx, targ]
Pytorch way of doing the same here:
F.nll_loss(sm_acts, targ, reduction='none')
Taking the Log
section below. The reason is F.nll_loss (negative log likelihood loss) needs arguments such that log is already applied to make the calculation right.(loss)
nll_loss
stands for "negative log likelihood," but it doesn’t actually take the log at all! It assumes you have already taken the log. PyTorch has a function called log_softmax
that combines log
and softmax
in a fast and accurate way. nll_loss
is designed to be used after log_softmax
.
When we first take the softmax, and then the log likelihood of that, that combination is called cross-entropy loss. In PyTorch, this is available as nn.CrossEntropyLoss
(which, in practice, actually does log_softmax
and then nll_loss
):
pytorch's crossEntropy:
loss_func = nn.CrossEntropyLoss()
loss_func(acts, targ)
or:
F.cross_entropy(acts, targ)
and this is all results without taking the mean:
nn.CrossEntropyLoss(reduction='none')(acts, targ)
First log_softmax:
log_sm_acts = torch.log_softmax(acts, dim=1)
log_sm_acts
Then negative log likelihood:
F.nll_loss(log_sm_acts, targ, reduction='none')
interp = ClassificationInterpretation.from_learner(learn)
interp.plot_confusion_matrix(figsize=(12,12), dpi=60)
interp.most_confused(min_val=5)
this is our baseline we can start improveing from this point.
Fine tune the model with default arguments:
learn = cnn_learner(dls, resnet34, metrics=error_rate)
learn.fine_tune(1, base_lr=0.1)
learn = cnn_learner(dls, resnet34, metrics=error_rate)
suggested_lr= learn.lr_find()
suggested_lr
print(f"suggested: {suggested_lr.valley:.2e}")
learn = cnn_learner(dls, resnet34, metrics=error_rate)
learn.fine_tune(2, base_lr=8.32e-04)
At this time it decreases steadily
When we create a model from a pretrained network fastai automatically freezes all of the pretrained layers for us. When we call the fine_tune
method fastai does two things:
- Trains the randomly added layers for one epoch, with all other layers frozen
- Unfreezes all of the layers, and trains them all for the number of epochs requested
Lets do it manually
learn = cnn_learner(dls, resnet34, metrics=error_rate)
learn.fit_one_cycle(3, 8.32e-04)
learn.unfreeze()
Run the lr_find
again, because having more layers to train, and weights that have already been trained for three epochs, means our previously found learning rate isn't appropriate any more:
learn.lr_find()
Train again with the new lr.
learn.fit_one_cycle(6, lr_max=0.0001)
So far so good but there is more way to go
Basically we use variable learning rate for the model. Bigger rate for the later layers and smaller for early layers.
learn = cnn_learner(dls, resnet34, metrics=error_rate)
learn.fit_one_cycle(3, 8.32e-04)# first lr
learn.unfreeze()
learn.fit_one_cycle(12, lr_max=slice(0.00005,0.0005))#second lr with a range
It is better most of the times.(sometimes I don't get good results, need to arrange the slice
values more carefully)
learn.recorder.plot_loss()
As you can see, the training loss keeps getting better and better. But notice that eventually the validation loss improvement slows, and sometimes even gets worse! This is the point at which the model is starting to over fit. In particular, the model is becoming overconfident of its predictions. But this does not mean that it is getting less accurate, necessarily. Take a look at the table of training results per epoch, and you will often see that the accuracy continues improving, even as the validation loss gets worse. In the end what matters is your accuracy, or more generally your chosen metrics, not the loss. The loss is just the function we've given the computer to help us to optimize.
In general, a bigger model has the ability to better capture the real underlying relationships in your data, and also to capture and memorize the specific details of your individual images. However, using a deeper model is going to require more GPU RAM, so you may need to lower the size of your batches to avoid an out-of-memory error. This happens when you try to fit too much inside your GPU and looks like:
Cuda runtime error: out of memory
You may have to restart your notebook when this happens. The way to solve it is to use a smaller batch size, which means passing smaller groups of images at any given time through your model. You can pass the batch size you want to the call creating your DataLoaders
with bs=
.
The other downside of deeper architectures is that they take quite a bit longer to train. One technique that can speed things up a lot is mixed-precision training. This refers to using less-precise numbers (half-precision floating point, also called fp16) where possible during training. As we are writing these words in early 2020, nearly all current NVIDIA GPUs support a special feature called tensor cores that can dramatically speed up neural network training, by 2-3x. They also require a lot less GPU memory. To enable this feature in fastai, just add to_fp16()
after your Learner
creation (you also need to import the module).
You can't really know ahead of time what the best architecture for your particular problem is—you need to try training some. So let's try a ResNet-50 now with mixed precision:
from fastai.callback.fp16 import *
learn = cnn_learner(dls, resnet50, metrics=error_rate).to_fp16()
learn.fine_tune(12, freeze_epochs=3)
learn.recorder.plot_loss()
As above traing time is not changed much.