aboutsummaryrefslogtreecommitdiff
path: root/content/blog/2020-09-01-visual-recognition.org
blob: d703113d21de570c385c3aef008b0b1b32c0d3ba (plain) (blame)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
#+title: IBM Watson Visual Recognition
#+date: 2020-09-01
#+description: Exploring and visualizing data with Python.
#+filetags: :dev:

* What is IBM Watson?
If you've never heard of [[https://www.ibm.com/watson][Watson]], this
service is a suite of enterprise-ready AI services, applications, and
tooling provided by IBM. Watson contains quite a few useful tools for
data scientists and students, including the subject of this post today:
visual recognition.

If you'd like to view the official documentation for the Visual
Recognition API, visit the
[[https://cloud.ibm.com/apidocs/visual-recognition/visual-recognition-v3?code=python][API
Docs]].

* Prerequisites
To be able to use Watson Visual Recognition, you'll need the following:

1. Create a free account on
   [[https://www.ibm.com/cloud/watson-studio][IBM Watson Studio]].
2. Add the [[https://www.ibm.com/cloud/watson-visual-recognition][Watson
   Visual Recognition]] service to your IBM Watson account.
3. Get your API key and URL. To do this, first go to the
   [[https://dataplatform.cloud.ibm.com/home2?context=cpdaas][profile
   dashboard]] for your IBM account and click on the Watson Visual
   Recognition service you created. This will be listed in the section
   titled *Your services*. Then click the *Credentials** tab and open the
   *Auto-generated credentials** dropdown. Copy your API key and URL so
   that you can use them in the Python script later.
4. *[Optional]** While not required, you can also create the Jupyter
   Notebook for this project right inside
   [[https://www.ibm.com/cloud/watson-studio][Watson Studio]]. Watson
   Studio will save your notebooks inside an organized project and allow
   you to use their other integrated products, such as storage
   containers, AI models, documentation, external sharing, etc.

* Calling the IBM Watson Visual Recognition API
Okay, now let's get started.

To begin, we need to install the proper Python package for IBM Watson.

#+begin_src sh
pip install --upgrade --user "ibm-watson>=4.5.0"
#+end_src

Next, we need to specify the API key, version, and URL given to us when
we created the Watson Visual Recognition service.

#+begin_src python
apikey = "<your-apikey>"
version = "2018-03-19"
url = "<your-url>"
#+end_src

Now, let's import the necessary libraries and authenticate our service.

#+begin_src python
import json
from ibm_watson import VisualRecognitionV3
from ibm_cloud_sdk_core.authenticators import IAMAuthenticator

authenticator = IAMAuthenticator(apikey)
visual_recognition = VisualRecognitionV3(
  version=version,
  authenticator=authenticator
)

visual_recognition.set_service_url(url)
#+end_src

*[Optional]* If you'd like to tell the API not to use any data to
improve their products, set the following header.

#+begin_src python
visual_recognition.set_default_headers({'x-watson-learning-opt-out': "true"})
#+end_src

Now we have our API all set and ready to go. For this example, I'm going
to include a =dict= of photos to load as we test out the API.

#+begin_src python
data = [
  {
    "title": "Grizzly Bear",
    "url": "https://example.com/photos/image1.jpg"
  },
  {
    "title": "Nature Lake",
    "url": "https://example.com/photos/image2.jpg"
  },
  {
    "title": "Welcome Sign",
    "url": "https://example.com/photos/image3.jpg"
  },
  {
    "title": "Honey Badger",
    "url": "https://example.com/photos/image4.jpg"
  },
  {
    "title": "Grand Canyon Lizard",
    "url": "https://example.com/photos/image5.jpg"
  },
  {
    "title": "Castle",
    "url": "https://example.com/photos/image6.jpg"
  }
]
#+end_src

Now that we've set up our libraries and have the photos ready, let's
create a loop to call the API for each image. The code below shows a
loop that calls the URL of each image and sends it to the API,
requesting results with at least 60% confidence. The results are output
to the console with dotted lines separating each section.

In the case of an API error, the codes and explanations are output to
the console.

#+begin_src python
from ibm_watson import ApiException

for x in range(len(data)):
try:
   url = data[x]["url"]
   images_filename = data[x]["title"]
   classes = visual_recognition.classify(
       url=url,
       images_filename=images_filename,
       threshold='0.6',
       owners=["IBM"]).get_result()
   print("-----------------------------------------------")
   print("Image Title: ", data[x]["title"], "\n")
   print("Image URL: ", data[x]["url"], "\n")
   classification_results = classes["images"][0]["classifiers"][0]["classes"]
   for result in classification_results:
       print(result["class"], "(", result["score"], ")")
   print("-----------------------------------------------")
except ApiException as ex:
   print("Method failed with status code " + str(ex.code) + ": " + ex.message)
#+end_src

* The Results
Here we can see the full result set of our function above. If you view
each of the URLs that we sent to the API, you'll be able to see that it
was remarkably accurate. To be fair, these are clear high-resolution,
clear photos shot with a professional camera. In reality, you will most
likely be processing images that are lower quality and may have a lot of
noise in the photo.

However, we can clearly see the benefit of being able to call this API
instead of attempting to write our own image recognition function. Each
of the classifications returned was a fair description of the image.

If you wanted to restrict the results to those that are at least 90%
confident or greater, you would simply adjust the =threshold= in the
=visual_recognition.classify()= function.

When your program runs, it should show the output below for each photo
you provide.

#+begin_src txt
----------------------------------------------------------------
Image Title:  Grizzly Bear
Image URL: https://example.com/photos/image1.jpg

brown bear ( 0.944 )
bear ( 1 )
carnivore ( 1 )
mammal ( 1 )
animal ( 1 )
Alaskan brown bear ( 0.759 )
greenishness color ( 0.975 )
----------------------------------------------------------------
#+end_src

* Discussion
Now, this was a very minimal implementation of the API. We simply
supplied some images and looked to see how accurate the results were.
However, you could implement this type of API into many machine learning
(ML) models.

For example, you could be working for a company that scans their
warehouses or inventory using drones. Would you want to pay employees to
sit there and watch drone footage all day in order to identify or count
things in the video? Probably not. Instead, you could use a
classification system similar to this one in order to train your machine
learning model to correctly identify items that the drones show through
video. More specifically, you could have your machine learning model
watch a drone fly over a field of sheep in order to count how many sheep
are living in that field.

There are many ways to implement machine learning functionality, but
hopefully this post helped inspire some deeper thought about the tools
that can help propel us further into the future of machine learning and
AI.