Tutorial: Classification Using MahotasΒΆ
New in version 0.8: Before version 0.8, texture was under mahotas, not under mahotas.features
Here is an example of using mahotas and milk
for image classification (but most of the code can easily be adapted to use
another machine learning package). I assume that there are three important
directories: positives/
and negatives/
contain the manually labeled
examples, and the rest of the data is in an unlabeled/
directory.
Here is the simple algorithm:
- Compute features for all of the images in positives and negatives
- learn a classifier
- use that classifier on the unlabeled images
In the code below I used jug to give you
the possibility of running it on multiple processors, but the code also works
if you remove every line which mentions TaskGenerator
.
We start with a bunch of imports:
from glob import glob
import mahotas
import mahotas.features
import milk
from jug import TaskGenerator
Now, we define a function which computes features. In general, texture features are very fast and give very decent results:
@TaskGenerator
def features_for(imname):
img = mahotas.imread(imname)
return mahotas.features.haralick(img).mean(0)
mahotas.features.haralick
returns features in 4 directions. We just take
the mean (sometimes you use the spread ptp()
too).
Now a pair of functions to learn a classifier and apply it. These are just
milk
functions:
@TaskGenerator
def learn_model(features, labels):
learner = milk.defaultclassifier()
return learner.train(features, labels)
@TaskGenerator
def classify(model, features):
return model.apply(features)
We assume we have three pre-prepared directories with the images in jpeg format. This bit you will have to adapt for your own settings:
positives = glob('positives/*.jpg')
negatives = glob('negatives/*.jpg')
unlabeled = glob('unlabeled/*.jpg')
Finally, the actual computation. Get features for all training data and learn a model:
features = map(features_for, negatives + positives)
labels = [0] * len(negatives) + [1] * len(positives)
model = learn_model(features, labels)
labeled = [classify(model, features_for(u)) for u in unlabeled]
This uses texture features, which is probably good enough, but you can play
with other features in mahotas.features
if you’d like (or try
mahotas.surf
, but that gets more complicated).
(This was motivated by a question on Stackoverflow).