This notebook provides a tutorial to explore image similarity. The goal is to design an API that, given an input image, returns a collection of similar images.
Install the required dependencies, download data and model.
# install ml dependencies
! pip install tensorflow
! pip install tensorflow_hub
! pip install opencv-python
# download a python file with helper methods for image similarity
! curl -L https://analytics.wikimedia.org/published/datasets/one-off/image_similarity/image_similarity_tools.py -o image_similarity_tools.py
# download data
! curl -L https://analytics.wikimedia.org/published/datasets/one-off/image_similarity/microtask_data.tar.gz -o microtask_data.tar.gz
! tar -xf microtask_data.tar.gz
! rm microtask_data.tar.gz
Collecting tensorflow Downloading tensorflow-2.6.0-cp38-cp38-manylinux2010_x86_64.whl (458.4 MB) |████████████████████████████████| 458.4 MB 11 kB/s /s eta 0:00:01 |█████ | 70.7 MB 90.1 MB/s eta 0:00:05�█████████▍ | 206.2 MB 74.1 MB/s eta 0:00:04 |███████████████████▎ | 276.1 MB 85.8 MB/s eta 0:00:03 |████████████████████▎ | 290.5 MB 77.9 MB/s eta 0:00:039 MB 82.8 MB/s eta 0:00:02██████████████████████▏ | 345.9 MB 82.8 MB/s eta 0:00:02██████████████████████▉ | 355.0 MB 82.8 MB/s eta 0:00:02 |█████████████████████████▍ | 364.0 MB 82.8 MB/s eta 0:00:02██████████████████ | 399.6 MB 78.9 MB/s eta 0:00:01 Collecting keras~=2.6 Downloading keras-2.6.0-py2.py3-none-any.whl (1.3 MB) |████████████████████████████████| 1.3 MB 79.5 MB/s eta 0:00:01 Requirement already satisfied: protobuf>=3.9.2 in /srv/paws/lib/python3.8/site-packages (from tensorflow) (3.18.1) Requirement already satisfied: wheel~=0.35 in /srv/paws/lib/python3.8/site-packages (from tensorflow) (0.37.0) Collecting wrapt~=1.12.1 Downloading wrapt-1.12.1.tar.gz (27 kB) Collecting six~=1.15.0 Downloading six-1.15.0-py2.py3-none-any.whl (10 kB) Collecting h5py~=3.1.0 Downloading h5py-3.1.0-cp38-cp38-manylinux1_x86_64.whl (4.4 MB) |████████████████████████████████| 4.4 MB 22.8 MB/s eta 0:00:01 |██████████████▎ | 1.9 MB 22.8 MB/s eta 0:00:01 Collecting gast==0.4.0 Downloading gast-0.4.0-py3-none-any.whl (9.8 kB) Collecting astunparse~=1.6.3 Downloading astunparse-1.6.3-py2.py3-none-any.whl (12 kB) Collecting flatbuffers~=1.12.0 Downloading flatbuffers-1.12-py2.py3-none-any.whl (15 kB) Collecting opt-einsum~=3.3.0 Downloading opt_einsum-3.3.0-py3-none-any.whl (65 kB) |████████████████████████████████| 65 kB 2.6 MB/s eta 0:00:01 Collecting typing-extensions~=3.7.4 Downloading typing_extensions-3.7.4.3-py3-none-any.whl (22 kB) Collecting google-pasta~=0.2 Downloading google_pasta-0.2.0-py3-none-any.whl (57 kB) |████████████████████████████████| 57 kB 2.6 MB/s eta 0:00:01 Collecting grpcio<2.0,>=1.37.0 Downloading grpcio-1.41.1-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.9 MB) |████████████████████████████████| 3.9 MB 30.5 MB/s eta 0:00:01 |█████████████████████▉ | 2.7 MB 30.5 MB/s eta 0:00:01 Collecting absl-py~=0.10 Downloading absl_py-0.15.0-py3-none-any.whl (132 kB) |████████████████████████████████| 132 kB 15.6 MB/s eta 0:00:01 Collecting termcolor~=1.1.0 Downloading termcolor-1.1.0.tar.gz (3.9 kB) Collecting numpy~=1.19.2 Downloading numpy-1.19.5-cp38-cp38-manylinux2010_x86_64.whl (14.9 MB) |████████████████████████████████| 14.9 MB 24.6 MB/s eta 0:00:01 |████████████████████▌ | 9.6 MB 24.6 MB/s eta 0:00:01 Collecting clang~=5.0 Downloading clang-5.0.tar.gz (30 kB) Collecting tensorboard~=2.6 Downloading tensorboard-2.7.0-py3-none-any.whl (5.8 MB) |████████████████████████████████| 5.8 MB 71.5 MB/s eta 0:00:01 Collecting keras-preprocessing~=1.1.2 Downloading Keras_Preprocessing-1.1.2-py2.py3-none-any.whl (42 kB) |████████████████████████████████| 42 kB 1.0 MB/s eta 0:00:01 Collecting tensorflow-estimator~=2.6 Downloading tensorflow_estimator-2.7.0-py2.py3-none-any.whl (463 kB) |████████████████████████████████| 463 kB 44.6 MB/s eta 0:00:01 Collecting google-auth<3,>=1.6.3 Downloading google_auth-2.3.2-py2.py3-none-any.whl (155 kB) |████████████████████████████████| 155 kB 73.3 MB/s eta 0:00:01 Requirement already satisfied: requests<3,>=2.21.0 in /srv/paws/lib/python3.8/site-packages (from tensorboard~=2.6->tensorflow) (2.26.0) Requirement already satisfied: markdown>=2.6.8 in /srv/paws/lib/python3.8/site-packages (from tensorboard~=2.6->tensorflow) (3.3.4) Collecting tensorboard-plugin-wit>=1.6.0 Downloading tensorboard_plugin_wit-1.8.0-py3-none-any.whl (781 kB) |████████████████████████████████| 781 kB 70.8 MB/s eta 0:00:01 Collecting tensorboard-data-server<0.7.0,>=0.6.0 Downloading tensorboard_data_server-0.6.1-py3-none-manylinux2010_x86_64.whl (4.9 MB) |████████████████████████████████| 4.9 MB 18.3 MB/s eta 0:00:01 |██▊ | 419 kB 18.3 MB/s eta 0:00:01 Collecting google-auth-oauthlib<0.5,>=0.4.1 Downloading google_auth_oauthlib-0.4.6-py2.py3-none-any.whl (18 kB) Requirement already satisfied: setuptools>=41.0.0 in /srv/paws/lib/python3.8/site-packages (from tensorboard~=2.6->tensorflow) (58.2.0) Requirement already satisfied: werkzeug>=0.11.15 in /srv/paws/lib/python3.8/site-packages (from tensorboard~=2.6->tensorflow) (2.0.2) Collecting pyasn1-modules>=0.2.1 Downloading pyasn1_modules-0.2.8-py2.py3-none-any.whl (155 kB) |████████████████████████████████| 155 kB 83.6 MB/s eta 0:00:01 Requirement already satisfied: cachetools<5.0,>=2.0.0 in /srv/paws/lib/python3.8/site-packages (from google-auth<3,>=1.6.3->tensorboard~=2.6->tensorflow) (4.2.4) Collecting rsa<5,>=3.1.4 Downloading rsa-4.7.2-py3-none-any.whl (34 kB) Requirement already satisfied: requests-oauthlib>=0.7.0 in /srv/paws/lib/python3.8/site-packages (from google-auth-oauthlib<0.5,>=0.4.1->tensorboard~=2.6->tensorflow) (1.3.0) Collecting pyasn1<0.5.0,>=0.4.6 Downloading pyasn1-0.4.8-py2.py3-none-any.whl (77 kB) |████████████████████████████████| 77 kB 5.0 MB/s eta 0:00:01 Requirement already satisfied: idna<4,>=2.5 in /srv/paws/lib/python3.8/site-packages (from requests<3,>=2.21.0->tensorboard~=2.6->tensorflow) (3.2) Requirement already satisfied: charset-normalizer~=2.0.0 in /srv/paws/lib/python3.8/site-packages (from requests<3,>=2.21.0->tensorboard~=2.6->tensorflow) (2.0.6) Requirement already satisfied: certifi>=2017.4.17 in /srv/paws/lib/python3.8/site-packages (from requests<3,>=2.21.0->tensorboard~=2.6->tensorflow) (2021.5.30) Requirement already satisfied: urllib3<1.27,>=1.21.1 in /srv/paws/lib/python3.8/site-packages (from requests<3,>=2.21.0->tensorboard~=2.6->tensorflow) (1.26.7) Requirement already satisfied: oauthlib>=3.0.0 in /srv/paws/lib/python3.8/site-packages (from requests-oauthlib>=0.7.0->google-auth-oauthlib<0.5,>=0.4.1->tensorboard~=2.6->tensorflow) (3.1.1) Building wheels for collected packages: clang, termcolor, wrapt Building wheel for clang (setup.py) ... done Created wheel for clang: filename=clang-5.0-py3-none-any.whl size=30692 sha256=469055f501e91cef1fc42e69248003c86ac99fa781b613cad8fb8c1e2bb847d9 Stored in directory: /home/paws/.cache/pip/wheels/f1/60/77/22b9b5887bd47801796a856f47650d9789c74dc3161a26d608 Building wheel for termcolor (setup.py) ... done Created wheel for termcolor: filename=termcolor-1.1.0-py3-none-any.whl size=4847 sha256=022209d01739e6265d6a921df4544fc9f9e02d47f05f67c61871735da3ae1662 Stored in directory: /home/paws/.cache/pip/wheels/a0/16/9c/5473df82468f958445479c59e784896fa24f4a5fc024b0f501 Building wheel for wrapt (setup.py) ... done Created wheel for wrapt: filename=wrapt-1.12.1-cp38-cp38-linux_x86_64.whl size=78578 sha256=a96babaa40d461d2b5fae41d28770ee1f5b739f4b05474e4c50c38c010d89d4b Stored in directory: /home/paws/.cache/pip/wheels/5f/fd/9e/b6cf5890494cb8ef0b5eaff72e5d55a70fb56316007d6dfe73 Successfully built clang termcolor wrapt Installing collected packages: pyasn1, six, rsa, pyasn1-modules, google-auth, tensorboard-plugin-wit, tensorboard-data-server, numpy, grpcio, google-auth-oauthlib, absl-py, wrapt, typing-extensions, termcolor, tensorflow-estimator, tensorboard, opt-einsum, keras-preprocessing, keras, h5py, google-pasta, gast, flatbuffers, clang, astunparse, tensorflow Attempting uninstall: six Found existing installation: six 1.16.0 Uninstalling six-1.16.0: Successfully uninstalled six-1.16.0 Attempting uninstall: numpy Found existing installation: numpy 1.21.2 Uninstalling numpy-1.21.2: Successfully uninstalled numpy-1.21.2 Attempting uninstall: typing-extensions Found existing installation: typing-extensions 3.10.0.2 Uninstalling typing-extensions-3.10.0.2: Successfully uninstalled typing-extensions-3.10.0.2 ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts. bokeh 2.4.0 requires typing-extensions>=3.10.0, but you have typing-extensions 3.7.4.3 which is incompatible. Successfully installed absl-py-0.15.0 astunparse-1.6.3 clang-5.0 flatbuffers-1.12 gast-0.4.0 google-auth-2.3.2 google-auth-oauthlib-0.4.6 google-pasta-0.2.0 grpcio-1.41.1 h5py-3.1.0 keras-2.6.0 keras-preprocessing-1.1.2 numpy-1.19.5 opt-einsum-3.3.0 pyasn1-0.4.8 pyasn1-modules-0.2.8 rsa-4.7.2 six-1.15.0 tensorboard-2.7.0 tensorboard-data-server-0.6.1 tensorboard-plugin-wit-1.8.0 tensorflow-2.6.0 tensorflow-estimator-2.7.0 termcolor-1.1.0 typing-extensions-3.7.4.3 wrapt-1.12.1 WARNING: You are using pip version 21.2.4; however, version 21.3.1 is available. You should consider upgrading via the '/srv/paws/bin/python3 -m pip install --upgrade pip' command. Collecting tensorflow_hub Downloading tensorflow_hub-0.12.0-py2.py3-none-any.whl (108 kB) |████████████████████████████████| 108 kB 14.0 MB/s eta 0:00:01 Requirement already satisfied: protobuf>=3.8.0 in /srv/paws/lib/python3.8/site-packages (from tensorflow_hub) (3.18.1) Requirement already satisfied: numpy>=1.12.0 in /srv/paws/lib/python3.8/site-packages (from tensorflow_hub) (1.19.5) Installing collected packages: tensorflow-hub Successfully installed tensorflow-hub-0.12.0 WARNING: You are using pip version 21.2.4; however, version 21.3.1 is available. You should consider upgrading via the '/srv/paws/bin/python3 -m pip install --upgrade pip' command. Collecting opencv-python Downloading opencv_python-4.5.4.58-cp38-cp38-manylinux2014_x86_64.whl (60.3 MB) |████████████████████████████████| 60.3 MB 83 kB/s s eta 0:00:01 Requirement already satisfied: numpy>=1.17.3 in /srv/paws/lib/python3.8/site-packages (from opencv-python) (1.19.5) Installing collected packages: opencv-python Successfully installed opencv-python-4.5.4.58 WARNING: You are using pip version 21.2.4; however, version 21.3.1 is available. You should consider upgrading via the '/srv/paws/bin/python3 -m pip install --upgrade pip' command. % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 2535 100 2535 0 0 45267 0 --:--:-- --:--:-- --:--:-- 44473 % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 62.3M 100 62.3M 0 0 28.4M 0 0:00:02 0:00:02 --:--:-- 28.4M
Importing the libraries. If this fails after installing the dependencies, restart the kernel.
%load_ext autoreload
%autoreload 2
import cv2
import image_similarity_tools
import os
import random
2021-10-30 18:40:37.106653: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /srv/paws/lib/python3.8/site-packages/cv2/../../lib64: 2021-10-30 18:40:37.106693: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
--------------------------------------------------------------------------- AttributeError Traceback (most recent call last) /tmp/ipykernel_118/1035847415.py in <module> 3 4 import cv2 ----> 5 import image_similarity_tools 6 import os 7 import random ~/image_similarity_tools.py in <module> 5 import os 6 import tensorflow as tf ----> 7 import tensorflow_hub as hub 8 9 ''' /srv/paws/lib/python3.8/site-packages/tensorflow_hub/__init__.py in <module> 86 87 ---> 88 from tensorflow_hub.estimator import LatestModuleExporter 89 from tensorflow_hub.estimator import register_module_for_export 90 from tensorflow_hub.feature_column import image_embedding_column /srv/paws/lib/python3.8/site-packages/tensorflow_hub/estimator.py in <module> 60 61 ---> 62 class LatestModuleExporter(tf.compat.v1.estimator.Exporter): 63 """Regularly exports registered modules into timestamped directories. 64 /srv/paws/lib/python3.8/site-packages/tensorflow/python/util/lazy_loader.py in __getattr__(self, item) 60 61 def __getattr__(self, item): ---> 62 module = self._load() 63 return getattr(module, item) 64 /srv/paws/lib/python3.8/site-packages/tensorflow/python/util/lazy_loader.py in _load(self) 43 """Load the module and insert it into the parent's globals.""" 44 # Import the target module and insert it into the parent's namespace ---> 45 module = importlib.import_module(self.__name__) 46 self._parent_module_globals[self._local_name] = module 47 /usr/lib/python3.8/importlib/__init__.py in import_module(name, package) 125 break 126 level += 1 --> 127 return _bootstrap._gcd_import(name[level:], package, level) 128 129 /srv/paws/lib/python3.8/site-packages/tensorflow_estimator/__init__.py in <module> 8 import sys as _sys 9 ---> 10 from tensorflow_estimator._api.v1 import estimator 11 12 del _print_function /srv/paws/lib/python3.8/site-packages/tensorflow_estimator/_api/v1/estimator/__init__.py in <module> 8 import sys as _sys 9 ---> 10 from tensorflow_estimator._api.v1.estimator import experimental 11 from tensorflow_estimator._api.v1.estimator import export 12 from tensorflow_estimator._api.v1.estimator import inputs /srv/paws/lib/python3.8/site-packages/tensorflow_estimator/_api/v1/estimator/experimental/__init__.py in <module> 8 import sys as _sys 9 ---> 10 from tensorflow_estimator.python.estimator.canned.dnn import dnn_logit_fn_builder 11 from tensorflow_estimator.python.estimator.canned.kmeans import KMeansClustering as KMeans 12 from tensorflow_estimator.python.estimator.canned.linear import LinearSDCA /srv/paws/lib/python3.8/site-packages/tensorflow_estimator/python/estimator/canned/dnn.py in <module> 25 from tensorflow.python.framework import ops 26 from tensorflow.python.util.tf_export import estimator_export ---> 27 from tensorflow_estimator.python.estimator import estimator 28 from tensorflow_estimator.python.estimator.canned import head as head_lib 29 from tensorflow_estimator.python.estimator.canned import optimizers /srv/paws/lib/python3.8/site-packages/tensorflow_estimator/python/estimator/estimator.py in <module> 68 69 @estimator_export(v1=['estimator.Estimator']) ---> 70 @doc_controls.inheritable_header("""\ 71 Warning: Estimators are not recommended for new code. Estimators run 72 `v1.Session`-style code which is more difficult to write correctly, and AttributeError: module 'tensorflow.tools.docs.doc_controls' has no attribute 'inheritable_header'
There are pictures of three categories: dog, fox and wolf. The image file name is the name of the file on wikiedia commons. We can iterate over the image files in their respective category folders in the data
directory.
for root, dirs, files in os.walk("data"):
if len(files)>0:
print(f'category {os.path.basename(root)} contains {len(files)} images')
print(f'\texample images: {files[:2]}')
category Wolf contains 290 images example images: ['Loups_siberie.jpg', 'Canis_lupus_signatus_(Kerkrade_Zoo)_26.jpg'] category Fox contains 142 images example images: ['Kew_Gardens_-_London_-_September_2008_(2958753889).jpg', 'Vulpes_vulpes_qtl1.jpg'] category Dog contains 283 images example images: ['Henry_Tenré.jpg', '(2)_Isha_female_rajapalayam.jpg']
image_similarity_tools.run_me()
Let's start with an initial analsys of the images in the dataset. This can be done by iterating over the files and using dict to accumulate results, but feel free to install to other libraries.
# TODO compute statistics
The image pixels can be loaded as numpy arrays using the open cv library, and images can be displayed using image_similarity_tools.plot
.
fox = 'data/Fox/Sierra_Nevada_red_fox_1_(cropped).jpg'
dog = 'data/Dog/Yacare_De_El_Siledin.jpg'
wolf = 'data/Wolf/WPZ_Gray_Wolf_02.jpg'
images_fn = [fox, dog, wolf]
# load pixels
image_pixels = [cv2.imread(f) for f in images_fn]
for i,im in enumerate(image_pixels):
print(f'the type of {images_fn[i]} is {type(im)} with shape {im.shape}')
# display
image_similarity_tools.plot(images_fn,(1,3))
the type of data/Fox/Sierra_Nevada_red_fox_1_(cropped).jpg is <class 'numpy.ndarray'> with shape (268, 600, 3) the type of data/Dog/Yacare_De_El_Siledin.jpg is <class 'numpy.ndarray'> with shape (398, 600, 3) the type of data/Wolf/WPZ_Gray_Wolf_02.jpg is <class 'numpy.ndarray'> with shape (399, 600, 3)