Image Similarity System

October 1, 2021 · 4 min read

Principal Developer

/img/content-blog-raw-blog-image-similarity-system-untitled.png

Choice of variables

Image Encoder

We can select any pre-trained image classification model. These models are commonly known as encoders because their job is to encode an image into a feature vector. I analyzed four encoders named 1) MobileNet, 2) EfficientNet, 3) ResNet and 4) BiT. After basic research, I decided to select BiT model because of its performance and state-of-the-art nature. I selected the BiT-M-50x3 variant of model which is of size 748 MB. More details about this architecture can be found on the official page here.

Vector Similarity System

Images are represented in a fixed-length feature vector format. For the given input vector, we need to find the TopK most similar vectors, keeping the memory efficiency and real-time retrival objective in mind. I explored the most popular techniques and listed down five of them: Annoy, Cosine distance, L1 distance, Locally Sensitive Hashing (LSH) and Image Deep Ranking. I selected Annoy because of its fast and efficient nature. More details about Annoy can be found on the official page here.

Dataset

I listed down 3 datasets from Kaggle that were best fitting the criteria of this use case: 1) Fashion Product Images (Small), 2) Food-11 image dataset and 3) Caltech 256 Image Dataset. I selected Fashion dataset and Foods dataset.

Literature review

Determining Image similarity with Quasi-Euclidean Metric arxiv
CatSIM: A Categorical Image Similarity Metric arxiv
Central Similarity Quantization for Efficient Image and Video Retrieval arxiv
Improved Deep Hashing with Soft Pairwise Similarity for Multi-label Image Retrieval arxiv
Model-based Behavioral Cloning with Future Image Similarity Learning arxiv
Why do These Match? Explaining the Behavior of Image Similarity Models arxiv
Learning Non-Metric Visual Similarity for Image Retrieval arxiv

Process Flow

Step 1: Data Acquisition

Download the raw image dataset into a directory. Categorize these images into their respective category directories. Make sure that images are of the same type, JPEG recommended. We will also process the metadata and store it in a serialized file, CSV recommended.

Step 2: Encoder Fine-tuning

Download the pre-trained image model and add two additional layers on top of that: the first layer is a feature vector layer and the second layer is the classification layer. We will only train these 2 layers on our data and after training, we will select the feature vector layer as the output of our fine-tuned encoder. After fine-tuning the model, we will save the feature extractor for later use.

Fig: a screenshot of encoder fine-tuning process

Step 3: Image Vectorization

Now, we will use the encoder (prepared in step 2) to encode the images (prepared in step 1). We will save feature vector of each image as an array in a directory. After processing, we will save these embeddings for later use.

Step 4: Metadata and Indexing

We will assign a unique id to each image and create dictionaries to locate information of this image: 1) Image id to Image name dictionary, 2) Image id to image feature vector dictionary, and 3) (optional) Image id to metadata product id dictionary. We will also create an image id to image feature vector indexing. Then we will save these dictionaries and index object for later use.

Step 5: API Call

We will receive an image from user, encode it with our image encoder, find TopK similar vectors using Indexing object, and retrieve the image (and metadata) using dictionaries. We send these images (and metadata) back to the user.

Deployment

The API was deployed on AWS cloud infrastructure using AWS Elastic Beanstalk service.

/img/content-blog-raw-blog-image-similarity-system-untitled-2.png

Name & Address Parsing

October 1, 2021 · 4 min read

Sparsh Agarwal

Principal Developer

/img/content-blog-raw-blog-name-&-address-parsing-untitled.png

Introduction

Create an API that can parse and classify names and addresses given a string. We tried probablepeople and usaddress. These work well separately but need the functionality of these packages combined, and better accuracy than what probablepeople provides. For the API, I'd like to mimic this with some minor modifications. A few examples:

"KING JOHN A 5643 ROUTH CREEK PKWY #1314 RICHARDSON TEXAS 750820146 UNITED STATES OF AMERICA" would return type: person; first_name: JOHN; last_name: KING; middle: A; street_address: 5643 ROUTH CREEK PKWY #1314; city: RICHARDSON; state: TEXAS; zip: 75082-0146; country: UNITED STATES OF AMERICA.
"THRM NGUYEN LIVING TRUST 2720 SUMMERTREE CARROLLTON HOUSTON TEXAS 750062646 UNITED STATES OF AMERICA" would return type: entity; name: THRM NGUYEN LIVING TRUST; street_address: 2720 SUMMERTREE CARROLLTON; state: TEXAS; city: HOUSTON; zip: 75006-2646; country: UNITED STATES OF AMERICA.

Modeling Approach

List of Entities

List of Entities A - Person, Household, Corporation

List of Entities B - Person First name, Person Middle name, Person Last name, Street address, City, State, Pincode, Country, Company name

Endpoint Configuration

OOR Endpoint

Input Instance: ANDERSON, EARLINE 1423 NEW YORK AVE FORT WORTH, TX 76104 7522

Output Tags:-
<Type> - Person/Household/Corporation
<GivenName>, <MiddleName>, <Surname> - if Type Person/Household
<Name> - Full Name - if Type Person 
<Name> - Household - if Type Household
<Name> - Corporation - If Type Corporation
<Address> - Full Address
<StreetAddress>, <City>, <State>, <Zipcode>, <Country>
~~NameConfidence, AddrConfidence~~

Name Endpoint

Input Instance: ANDERSON, EARLINE

Output Tags:-

- <Type> - Person/Household/Corporation
- <GivenName>, <MiddleName>, <Surname> - if Type Person/Household
- <Name> - Full Name - if Type Person
- <Name> - Household - if Type Household
- <Name> - Corporation - If Type Corporation
- ~~NameConfidence~~

Address Endpoint

Input Instance: 1423 NEW YORK AVE FORT WORTH, TX 76104 7522

Output Tags:-

- <Address> - Full Address
- <StreetAddress>, <City>, <State>, <Zipcode>, <Country>
- ~~AddrConfidence~~

Process Flow

Pytorch Flair NER model
Pre trained word embeddings
Additional parsing models on top of name tags
Tagging of 1000+ records to create training data
Deployment as REST api with 3 endpoints - name parse, address parse and whole string parse

Framework

/img/content-blog-raw-blog-name-&-address-parsing-untitled-1.png

/img/content-blog-raw-blog-name-&-address-parsing-untitled-2.png

Tagging process

I used Doccano (https://github.com/doccano/doccano) for labeling the dataset. This tool is open-source and free to use. I deployed it with a one-click Heroku service (fig 1). After launching the app, log in with the provided credentials, and create a project (fig 2). Create the labels and upload the dataset (fig 3). Start the annotation process (fig 4). Now after enough annotations (you do not need complete all annotations in one go), go back to projects > edit section and export the data (fig 5). Bring the exported JSON file in python and run the model training code. The whole model will automatically get trained on the new annotations. To make the training faster, you can use Nvidia GPU support.

fig 1: screenshot taken from Doccano's github page

fig 2: Doccano's deployed app homepage

fig 3: create the labels. I defined these labels for my project

fig 5: export the annotations

Model

I first tried the Spacy NER blank model but it was not giving high-quality results. So I moved to the PyTorch Flair NER model. This model was a way faster (5 min training because of GPU compatibility comparing to 1-hour Spacy training time) and also much more accurate. F1 results for all tags were near perfect (score of 1). This score will increase further with more labeled data. This model is production-ready.

Inference

For OOR, I directly used the model's output for core tagging and created the aggregated tags like recipient (aggregation of name tags) and address (aggregation of address tags like city and state) using simple conditional concatenation. For only Name and only Address inference, I added the dummy address in name text and dummy name in address text. This way, I passed the text in same model and later on filtered the required tags as output.

API

I used Flask REST framework in Python to build the API with 3 endpoints. This API is production-ready.

Results and Discussion

0.99 F1 score on 6 out of 8 tags & 0.95+ F1 score on other 2 tags
API inference time of less than 1 second on single CPU

Choice of variables

Image Encoder​

Vector Similarity System​

Dataset​

Literature review

Process Flow

Step 1: Data Acquisition​

Step 2: Encoder Fine-tuning​

Step 3: Image Vectorization​

Step 4: Metadata and Indexing​

Step 5: API Call​

Deployment

Introduction

Modeling Approach

List of Entities​

Endpoint Configuration​

Process Flow​

Framework

Tagging process

Model

Inference

API​

Results and Discussion

Image Encoder

Vector Similarity System

Dataset

Step 1: Data Acquisition

Step 2: Encoder Fine-tuning

Step 3: Image Vectorization

Step 4: Metadata and Indexing

Step 5: API Call

List of Entities

Endpoint Configuration

Process Flow

API