============================ Classification Using Mahotas ============================ .. versionadded:: 0.8 Before version 0.8, texture was under mahotas, not under mahotas.features Here is an example of using mahotas and `milk `_ for image classification (but most of the code can easily be adapted to use another machine learning package). I assume that there are three important directories: ``positives/`` and ``negatives/`` contain the manually labeled examples, and the rest of the data is in an ``unlabeled/`` directory. Here is the simple algorithm: 1. Compute features for all of the images in positives and negatives 2. learn a classifier 3. use that classifier on the unlabeled images In the code below I used `jug `_ to give you the possibility of running it on multiple processors, but the code also works if you remove every line which mentions ``TaskGenerator``. We start with a bunch of imports:: from glob import glob import mahotas import mahotas.features import milk from jug import TaskGenerator Now, we define a function which computes features. In general, texture features are very fast and give very decent results:: @TaskGenerator def features_for(imname): img = mahotas.imread(imname) return mahotas.features.haralick(img).mean(0) ``mahotas.features.haralick`` returns features in 4 directions. We just take the mean (sometimes you use the spread ``ptp()`` too). Now a pair of functions to learn a classifier and apply it. These are just ``milk`` functions:: @TaskGenerator def learn_model(features, labels): learner = milk.defaultclassifier() return learner.train(features, labels) @TaskGenerator def classify(model, features): return model.apply(features) We assume we have three pre-prepared directories with the images in jpeg format. This bit you will have to adapt for your own settings:: positives = glob('positives/*.jpg') negatives = glob('negatives/*.jpg') unlabeled = glob('unlabeled/*.jpg') Finally, the actual computation. Get features for all training data and learn a model:: features = map(features_for, negatives + positives) labels = [0] * len(negatives) + [1] * len(positives) model = learn_model(features, labels) labeled = [classify(model, features_for(u)) for u in unlabeled] This uses texture features, which is probably good enough, but you can play with other features in ``mahotas.features`` if you'd like (or try ``mahotas.surf``, but that gets more complicated). (This was motivated by `a question on Stackoverflow `__).