Skip to main content

4 posts tagged with "similarity"

View All Tags

· 2 min read
Sparsh Agarwal

/img/content-blog-raw-blog-document-recommendation-untitled.png

Introduction

Business objective

For the given user query, recommend relevant documents (BRM_ifam)

Technical objective

1-to-N mapping of given input text

Proposed Framework 1 — Hybrid Recommender System

  • Text → Vector (Universal Sentence Embedding with TF Hub)
  • Vector → Content-based Filtering Recommendation
  • Index → Interaction Matrix
  • Interaction Matrix → Collaborative Filtering Recommendation
  • Collaborative + Content-based → Hybrid Recommendation
  • Evaluation: Area-under-curve

Proposed Framework 2 — Content-based Recommender System

  1. Find A most similar user → Cosine similarity
  2. For each user in A, find TopK Most Similar Items → Map Argsort
  3. For each item Find TopL Most Similar Items → Cosine similarity
  4. Display
  5. Implement an evaluation metric
  6. Evaluate

Results and Discussion

  • build.py → this script will take the training data as input and save all the required files in the same working directory
  • recommend.py → this script will take the user query as input and predict top-K BRM recommendations

Variables (during recommendation, you will be asked 2–3 choices, the meaning of those choices are as following)

  • top-K — how many top items you want to get in recommendation
  • secondary items: this will determine how many similar items you would like to add in consideration, for each primary matching item
  • sorted by frequency: since multiple input queries might point to same output, therefore this option allows to take that frequence count of outputs in consideration and will move the more frequent items at the top.

Code

https://gist.github.com/sparsh-ai/4e5f06ba3c55192b33a276ee67dbd42c#file-text-recommendations-ipynb

· 4 min read
Sparsh Agarwal

/img/content-blog-raw-blog-image-similarity-system-untitled.png

Choice of variables

Image Encoder

We can select any pre-trained image classification model. These models are commonly known as encoders because their job is to encode an image into a feature vector. I analyzed four encoders named 1) MobileNet, 2) EfficientNet, 3) ResNet and 4) BiT. After basic research, I decided to select BiT model because of its performance and state-of-the-art nature. I selected the BiT-M-50x3 variant of model which is of size 748 MB. More details about this architecture can be found on the official page here.

Vector Similarity System

Images are represented in a fixed-length feature vector format. For the given input vector, we need to find the TopK most similar vectors, keeping the memory efficiency and real-time retrival objective in mind. I explored the most popular techniques and listed down five of them: Annoy, Cosine distance, L1 distance, Locally Sensitive Hashing (LSH) and Image Deep Ranking. I selected Annoy because of its fast and efficient nature. More details about Annoy can be found on the official page here.

Dataset

I listed down 3 datasets from Kaggle that were best fitting the criteria of this use case: 1) Fashion Product Images (Small), 2) Food-11 image dataset and 3) Caltech 256 Image Dataset. I selected Fashion dataset and Foods dataset.

Literature review

  • Determining Image similarity with Quasi-Euclidean Metric arxiv
  • CatSIM: A Categorical Image Similarity Metric arxiv
  • Central Similarity Quantization for Efficient Image and Video Retrieval arxiv
  • Improved Deep Hashing with Soft Pairwise Similarity for Multi-label Image Retrieval arxiv
  • Model-based Behavioral Cloning with Future Image Similarity Learning arxiv
  • Why do These Match? Explaining the Behavior of Image Similarity Models arxiv
  • Learning Non-Metric Visual Similarity for Image Retrieval arxiv

Process Flow

Step 1: Data Acquisition

Download the raw image dataset into a directory. Categorize these images into their respective category directories. Make sure that images are of the same type, JPEG recommended. We will also process the metadata and store it in a serialized file, CSV recommended.

Step 2: Encoder Fine-tuning

Download the pre-trained image model and add two additional layers on top of that: the first layer is a feature vector layer and the second layer is the classification layer. We will only train these 2 layers on our data and after training, we will select the feature vector layer as the output of our fine-tuned encoder. After fine-tuning the model, we will save the feature extractor for later use.

Fig: a screenshot of encoder fine-tuning process

Fig: a screenshot of encoder fine-tuning process

Step 3: Image Vectorization

Now, we will use the encoder (prepared in step 2) to encode the images (prepared in step 1). We will save feature vector of each image as an array in a directory. After processing, we will save these embeddings for later use.

Step 4: Metadata and Indexing

We will assign a unique id to each image and create dictionaries to locate information of this image: 1) Image id to Image name dictionary, 2) Image id to image feature vector dictionary, and 3) (optional) Image id to metadata product id dictionary. We will also create an image id to image feature vector indexing. Then we will save these dictionaries and index object for later use.

Step 5: API Call

We will receive an image from user, encode it with our image encoder, find TopK similar vectors using Indexing object, and retrieve the image (and metadata) using dictionaries. We send these images (and metadata) back to the user.

Deployment

The API was deployed on AWS cloud infrastructure using AWS Elastic Beanstalk service.

/img/content-blog-raw-blog-image-similarity-system-untitled-2.png

· 2 min read
Sparsh Agarwal

/img/content-blog-raw-blog-semantic-similarity-untitled.png

Introduction

Deliverable - Two paragraph-level distance outputs for L and Q, each has 35 columns.

For each paragraph, we need to calculate the L1 distance of consecutive sentences in this paragraph, and then generate the mean and standard deviation of all these distances for this paragraph. For example, say the paragraph 1 starts from sentence1 and ends with sentence 5. First, calculate the L1 distances for L1(1,2), L1(2,3), L1(3,4) and L1(4,5) and then calculate the mean and standard deviation of the 4 distances. In the end we got two measures for this paragraph: L1_m and L1_std. Similarly, we need to calculate the mean and standard deviation using L2 distance, plus a simple mean and deviation of the distances. We use 6 different embeddings: all dimensions of BERT embeddings, 100,200 and 300 dimensions of PCA Bert embeddings (PCA is a dimension reduction technique

In the end, we will have 35 columns for each paragraph : Paragraph ID +#sentences in the paragraph +(cosine_m, cosine_std,cossimillarity_m, cosimmilarity_std, L1_m, L1_std, L2_m, L2_std ) – by- ( all, 100, 200, 300)= 3+8*4.

Note: for paragraph that only has 1 sentence, the std measures are empty.

Modeling Approach

Process Flow for Use Case 1

  1. Splitting paragraphs into sentences using 1) NLTK Sentence Tokenizer, 2) Spacy Sentence Tokenizer and, on two additional symbols : and ...
  2. Text Preprocessing: Lowercasing, Removing Non-alphanumeric characters, Removing Null records, Removing sentence records (rows) having less than 3 words.
  3. TF-IDF vectorization
  4. LSA over document-term matrix
  5. Cosine distance calculation of adjacent sentences (rows)

Process Flow for Use Case 2

  • Split paragraphs into sentences
  • Text cleaning
  • BERT Sentence Encoding
  • BERT PCA 100
  • BERT PCA 200
  • BERT PCA 300
  • Calculate distance between consecutive sentences in the paragraph
  • Distances: L1, L2 and Cosine and Cosine similarity
  • Statistics: Mean, SD

Experimental Setup

  1. #IncrementalPCA
  2. GPU to speed up
  3. Data chunking
  4. Calculate BERT for a chunk and store in disk

· 14 min read
Sparsh Agarwal

/img/content-blog-raw-blog-vehicle-suggestions-untitled.png

Introduction

The customer owns a franchise store for selling Tesla Automobiles. The objective is to predict user preferences using social media data.

Task 1 - Suggest the best vehicle for the given description

Task 2 - Suggest the best vehicle for the given social media id of the user

Customer queries

// car or truck or no mention of vehicle type means Cyber Truck
// SUV mention means Model X
const one = "I'm looking for a fast suv that I can go camping without worrying about recharging".;
const two = "cheap red car that is able to go long distances";
const three = "i am looking for a daily driver that i can charge everyday, do not need any extras";
const four = "i like to go offroading a lot on my jeep and i want to do the same with the truck";
const five = "i want the most basic suv possible";
const six = "I want all of the addons";
// mentions of large family or many people means model x
const seven = "I have a big family and want to be able to take them around town and run errands without worrying about charging";
  • Expected output
    const oneJson = {
    vehicle: 'Model X',
    trim : 'adventure',
    exteriorColor: 'whiteExterior',
    wheels: "22Performance",
    tonneau: "powerTonneau",
    packages: "",
    interiorAddons: "",
    interiorColor: "blackInterior",
    range: "extendedRange",
    software: "",
    }

    const twoJSON = {
    vehicle: 'Cyber Truck',
    trim : 'base',
    exteriorColor: 'whiteExterior',
    wheels: "21AllSeason",
    tonneau: "powerTonneau",
    packages: "",
    interiorAddons: "",
    interiorColor: "blackInterior",
    range: "extendedRange",
    software: "",
    }

    const threeJSON = {
    vehicle: 'Cyber Truck',
    trim : 'base',
    exteriorColor: 'whiteExterior',
    wheels: "21AllSeason",
    tonneau: "powerTonneau",
    packages: "",
    interiorAddons: "",
    interiorColor: "blackInterior",
    range: "standardRange",
    software: "",
    }

    const fourJSON = {
    vehicle: 'Cyber Truck',
    trim : 'adventure',
    exteriorColor: 'whiteExterior',
    wheels: "20AllTerrain",
    tonneau: "powerTonneau",
    packages: "offroadPackage,matchingSpareTire",
    interiorAddons: "",
    interiorColor: "blackInterior",
    range: "extendedRange",
    software: "",
    }

    const fiveJSON = {
    vehicle: 'Model X',
    trim : 'base',
    exteriorColor: 'whiteExterior',
    wheels: "20AllTerrain",
    tonneau: "manualTonneau",
    packages: "",
    interiorAddons: "",
    interiorColor: "blackInterior",
    range: "standardRange",
    software: "",
    }

    const sixJSON = {
    vehicle: 'Cyber Truck',
    trim : 'adventure',
    exteriorColor: 'whiteExterior',
    wheels: "20AllTerrain",
    tonneau: "powerTonneau",
    packages: "offroadPackage,matchingSpareTire",
    interiorAddons: "wirelessCharger",
    interiorColor: "blackInterior",
    range: "extendedRange",
    software: "selfDrivingPackage",
    }

    const sevenJSON = {
    vehicle: 'Model X',
    trim : 'base',
    exteriorColor: 'whiteExterior',
    wheels: "21AllSeason",
    tonneau: "powerTonneau",
    packages: "",
    interiorAddons: "",
    interiorColor: "blackInterior",
    range: "mediumRange",
    software: "",
    }
  • Vehicle model configurations
    const configuration = {
    meta: {
    configurationId: '???',
    storeId: 'US_SALES',
    country: 'US',
    version: '1.0',
    effectiveDate: '???',
    currency: 'USD',
    locale: 'en-US',
    availableLocales: ['en-US'],
    },

    defaults: {
    basePrice: 50000,
    deposit: 1000,
    initialSelection: [
    'adventure',
    'whiteExterior',
    '21AllSeason',
    'powerTonneau',
    'blackInterior',
    'mediumRange',
    ],
    },

    groups: {
    trim: {
    name: { 'en-US': 'Choose trim' },
    multiselect: false,
    required: true,
    options: ['base', 'adventure'],
    },
    exteriorColor: {
    name: { 'en-US': 'Choose paint' },
    multiselect: false,
    required: true,
    options: [
    'whiteExterior',
    'blueExterior',
    'silverExterior',
    'greyExterior',
    'blackExterior',
    'redExterior',
    'greenExterior',
    ],
    },
    wheels: {
    name: { 'en-US': 'Choose wheels' },
    multiselect: false,
    required: true,
    options: ['21AllSeason', '20AllTerrain', '22Performance'],
    },
    tonneau: {
    name: { 'en-US': 'Choose tonneau cover' },
    multiselect: false,
    required: true,
    options: ['manualTonneau', 'powerTonneau'],
    },
    packages: {
    name: { 'en-US': 'Choose upgrades' },
    multiselect: true,
    required: false,
    options: ['offroadPackage', 'matchingSpareTire'],
    },
    interiorColor: {
    name: { 'en-US': 'Choose interior' },
    multiselect: false,
    required: true,
    options: ['greyInterior', 'blackInterior', 'greenInterior'],
    },
    interiorAddons: {
    name: { 'en-US': 'Choose upgrade' },
    multiselect: true,
    required: false,
    options: ['wirelessCharger'],
    },
    range: {
    name: { 'en-US': 'Choose range' },
    multiselect: false,
    required: true,
    options: ['standardRange', 'mediumRange', 'extendedRange'],
    },
    software: {
    name: { 'en-US': 'Choose upgrade' },
    multiselect: true,
    required: false,
    options: ['selfDrivingPackage'],
    },
    specs: {
    name: { 'en-US': 'Specs overview *' },
    attrs: {
    description: {
    'en-US':
    "* Options, specs and pricing may change as we approach production. We'll contact you to review any updates to your preferred build.",
    },
    },
    multiselect: false,
    required: false,
    options: ['acceleration', 'power', 'towing', 'range'],
    },
    },

    options: {
    base: {
    name: { 'en-US': 'Base' },
    attrs: {
    description: { 'en-US': 'Production begins 2022' },
    },
    visual: true,
    price: 0,
    },
    adventure: {
    name: { 'en-US': 'Adventure' },
    attrs: {
    description: { 'en-US': 'Production begins 2021' },
    },
    visual: true,
    price: 10000,
    },

    standardRange: {
    name: { 'en-US': 'Standard' },
    attrs: {
    description: { 'en-US': '230+ miles' },
    },
    price: 0,
    },
    mediumRange: {
    name: { 'en-US': 'Medium' },
    attrs: {
    description: { 'en-US': '300+ miles' },
    },
    price: 3000,
    },
    extendedRange: {
    name: { 'en-US': 'Extended' },
    attrs: {
    description: { 'en-US': '400+ miles' },
    },
    price: 8000,
    },

    greenExterior: {
    name: { 'en-US': 'Adirondack Green' },
    attrs: {
    imageUrl: '/public/images/configurationOptions/exteriorcolors/green.svg',
    },
    visual: true,
    price: 2000,
    },
    blueExterior: {
    name: { 'en-US': 'Trestles Blue' },
    attrs: {
    imageUrl: '/public/images/configurationOptions/exteriorcolors/blue.svg',
    },
    visual: true,
    price: 1000,
    },
    whiteExterior: {
    name: { 'en-US': 'Arctic White' },
    attrs: {
    imageUrl: '/public/images/configurationOptions/exteriorcolors/white.svg',
    },
    visual: true,
    price: 0,
    },
    silverExterior: {
    name: { 'en-US': 'Silver Gracier' },
    attrs: {
    imageUrl: '/public/images/configurationOptions/exteriorcolors/silver.svg',
    },
    visual: true,
    price: 1000,
    },
    blackExterior: {
    name: { 'en-US': 'Cosmic Black' },
    attrs: {
    imageUrl: '/public/images/configurationOptions/exteriorcolors/black.svg',
    },
    visual: true,
    price: 1000,
    },
    redExterior: {
    name: { 'en-US': 'Red Rocks' },
    attrs: {
    imageUrl: '/public/images/configurationOptions/exteriorcolors/red.svg',
    },
    visual: true,
    price: 2000,
    },
    greyExterior: {
    name: { 'en-US': 'Antracite Grey' },
    attrs: {
    imageUrl: '/public/images/configurationOptions/exteriorcolors/grey.svg',
    },
    visual: true,
    price: 1000,
    },

    '21AllSeason': {
    name: { 'en-US': '21" Cast Wheel - All Season' },
    attrs: {
    imageUrl: '/public/images/configurationOptions/wheels/twentyone.svg',
    },
    visual: true,
    price: 0,
    },
    '20AllTerrain': {
    name: { 'en-US': '20" Forged Wheel - All Terrain' },
    attrs: {
    imageUrl: '/public/images/configurationOptions/wheels/twenty.svg',
    },
    visual: true,
    price: 0,
    },
    '22Performance': {
    name: { 'en-US': '22" Cast Wheel - Performance' },
    attrs: {
    imageUrl: '/public/images/configurationOptions/wheels/twentytwo.svg',
    },
    visual: true,
    price: 2000,
    },

    manualTonneau: {
    name: { 'en-US': 'Manual' },
    attrs: {
    description: { 'en-US': 'Description here' },
    },
    price: 0,
    },
    powerTonneau: {
    name: { 'en-US': 'Powered' },
    attrs: {
    description: { 'en-US': 'Description here' },
    },
    price: 0,
    },

    blackInterior: {
    name: { 'en-US': 'Black' },
    attrs: {
    imageUrl: '/public/images/configurationOptions/interiorcolors/black.svg',
    },
    visual: true,
    price: 0,
    },
    greyInterior: {
    name: { 'en-US': 'Grey' },
    attrs: {
    imageUrl: '/public/images/configurationOptions/interiorcolors/grey.svg',
    },
    visual: true,
    price: 1000,
    },
    greenInterior: {
    name: { 'en-US': 'Green' },
    attrs: {
    imageUrl: '/public/images/configurationOptions/interiorcolors/green.svg',
    },
    visual: true,
    price: 2000,
    },

    offroadPackage: {
    name: { 'en-US': 'Off-Road' },
    attrs: {
    description: { 'en-US': 'Lorem ipsum dolor sit amet.' },
    imageUrl: '/public/images/configurationOptions/packages/offroad.png',
    },
    visual: true,
    price: 5000,
    },
    matchingSpareTire: {
    name: { 'en-US': 'Matching Spare Tire' },
    attrs: {
    description: { 'en-US': 'Full sized tire' },
    imageUrl: '/public/images/configurationOptions/packages/spare.png',
    },
    price: 500,
    },

    wirelessCharger: {
    name: { 'en-US': 'Wireless charger' },
    attrs: {
    description: { 'en-US': 'Lorem ipsum dolor sit amet.' },
    imageUrl: '/public/images/configurationOptions/packages/wireless.png',
    },
    price: 100,
    },
    selfDrivingPackage: {
    name: { 'en-US': 'Autonomy' },
    attrs: {
    description: { 'en-US': 'Lorem ipsum dolor sit amet.' },
    imageUrl: '/public/images/configurationOptions/packages/autonomy.png',
    },
    price: 7000,
    },

    acceleration: {
    name: { 'en-US': '0 - 60 mph' },
    attrs: {
    units: { 'en-US': 'sec' },
    decimals: 1,
    },
    value: 3.4,
    },
    power: {
    name: { 'en-US': 'Horsepower' },
    attrs: {
    units: { 'en-US': 'hp' },
    },
    value: 750,
    },
    towing: {
    name: { 'en-US': 'Towing' },
    attrs: {
    units: { 'en-US': 'lbs' },
    },
    value: 10000,
    },
    range: {
    name: { 'en-US': 'Range' },
    attrs: {
    units: { 'en-US': 'mi' },
    },
    value: 400,
    },
    }
    };

Public datasets

  • Instagram: 16539 images from 972 Instagram influencers (link)
  • TechCrunchPosts: (link)
  • Tweets: (link)

Primary (available for academic use only, need university affiliation for access)

Secondary (low quality data, not sure if can be used at all)

Logical Reasoning

  • If I implicitly rate pictures of blue car, that means I might prefer a blue car.
  • If I like posts of self-driving, that means I might prefer a self-driving option.

Scope

Scope 1

/img/content-blog-raw-blog-vehicle-suggestions-untitled-2.png

Scope 2

media content categories: text and images

platforms: facebook, twitter and instagram

implicit rating categories: like, comment, share

columns: userid, timestamp, platform, type, content, rating

Model Framework

Model framework 1

  1. Convert user's natural language query into vector using Universal Sentence Embedding model
  2. Create a product specs binary matrix based on different categories
  3. Find TopK similar query vectors using cosine distance
  4. For each TopK vector, Find TopM product specs using interaction table weights
  5. For each TopM specification, find TopN similar specs using binary matrix
  6. Show all the qualified product specifications

Model framework 2

  1. Seed data: 10 users with ground-truth persona, media content and implicit ratings
  2. Inflated data: 10 users with media content and implicit ratings
  3. media content → Implicit rating (A)
  4. media content → feature vector (B) + (A) → weighted pooling → similar users (C)
  5. media content → QA model → slot filling → global pooling → item associations (D)
  6. (C) → content-based filtering → item recommendations → (D) → top-k recommendations

User selection

Model framework 3

User-User Similarity (clustering)

  • User → Media content → Embedding → Average pooling
  • Cosine Similarity of user's social vector with other user's social vector

User-Item Similarity (reranking)

  • User → Implicit Rating on media content M → M's correlation with item features
  • Item features: familySize
  • Cosine Similarity of user's social vector with item's feature vector

User-User Similarity (clustering)

  • User → Media content → Embedding → Average pooling
  • Cosine Similarity of user's social vector with other user's social vector

User-Item Similarity (reranking)

  • User → Implicit Rating on media content M → M's correlation with item features
  • Item features: familySize
  • Cosine Similarity of user's social vector with item's feature vector

Model framework 4

/img/content-blog-raw-blog-vehicle-suggestions-untitled-3.png

Text → Prepare → Vectorize → Average → Similar Users

Image → Prepare → Vectorize → Average → Similar Users

Text → Prepare → QA → Slot filling

Image → Prepare → VQA → Slot filling

Image → Similar Image from users → Detailed enquiry

Model framework 5

  1. Topic Clusters Text
  2. Topic Clusters Image
  3. Fetch raw text and images
  4. Combine, Clean and Store text in text dataframe
  5. Vectorize Texts
  6. Cosine similarities of texts with topic clusters
  7. Vectorize Images
  8. Cosine similarities of images with topic clusters

Experimental Setup

  • Experiment 1
    import numpy as np
    import pandas as pd
    import tensorflow_hub as hub
    from itertools import product
    from sklearn.preprocessing import OneHotEncoder
    from sklearn.metrics.pairwise import cosine_similarity

    vehicle = ['modelX', 'cyberTruck']
    trim = ['adventure', 'base']
    exteriorColor = ['whiteExterior', 'blueExterior', 'silverExterior', 'greyExterior', 'blackExterior', 'redExterior', 'greenExterior']
    wheels = ['20AllTerrain', '21AllSeason', '22Performance']
    tonneau = ['powerTonneau', 'manualTonneau']
    interiorColor = ['blackInterior', 'greyInterior', 'greenInterior']
    range = ['standardRange', 'mediumRange', 'extendedRange']
    packages = ['offroadPackage', 'matchingSpareTire', 'offroadPackage,matchingSpareTire', 'None']
    interiorAddons = ['wirelessCharger', 'None']
    software = ['selfDrivingPackage', 'None']

    specs_cols = ['vehicle', 'trim', 'exteriorColor', 'wheels', 'tonneau', 'interiorColor', 'range', 'packages', 'interiorAddons', 'software']
    specs = pd.DataFrame(list(product(vehicle, trim, exteriorColor, wheels, tonneau, interiorColor, range, packages, interiorAddons, software)),
    columns=specs_cols)

    enc = OneHotEncoder(handle_unknown='error', sparse=False)
    specs = pd.DataFrame(enc.fit_transform(specs))

    specs_ids = specs.index.tolist()

    query_list = ["I'm looking for a fast suv that I can go camping without worrying about recharging",
    "cheap red car that is able to go long distances",
    "i am looking for a daily driver that i can charge everyday, do not need any extras",
    "i like to go offroading a lot on my jeep and i want to do the same with the truck",
    "i want the most basic suv possible",
    "I want all of the addons",
    "I have a big family and want to be able to take them around town and run errands without worrying about charging"]

    queries = pd.DataFrame(query_list, columns=['query'])
    query_ids = queries.index.tolist()

    const_oneJSON = {
    'vehicle': 'modelX',
    'trim' : 'adventure',
    'exteriorColor': 'whiteExterior',
    'wheels': "22Performance",
    'tonneau': "powerTonneau",
    'packages': "None",
    'interiorAddons': "None",
    'interiorColor': "blackInterior",
    'range': "extendedRange",
    'software': "None",
    }

    const_twoJSON = {
    'vehicle': 'cyberTruck',
    'trim' : 'base',
    'exteriorColor': 'whiteExterior',
    'wheels': "21AllSeason",
    'tonneau': "powerTonneau",
    'packages': "None",
    'interiorAddons': "None",
    'interiorColor': "blackInterior",
    'range': "extendedRange",
    'software': "None",
    }

    const_threeJSON = {
    'vehicle': 'cyberTruck',
    'trim' : 'base',
    'exteriorColor': 'whiteExterior',
    'wheels': "21AllSeason",
    'tonneau': "powerTonneau",
    'packages': "None",
    'interiorAddons': "None",
    'interiorColor': "blackInterior",
    'range': "standardRange",
    'software': "None",
    }

    const_fourJSON = {
    'vehicle': 'cyberTruck',
    'trim' : 'adventure',
    'exteriorColor': 'whiteExterior',
    'wheels': "20AllTerrain",
    'tonneau': "powerTonneau",
    'packages': "offroadPackage,matchingSpareTire",
    'interiorAddons': "None",
    'interiorColor': "blackInterior",
    'range': "extendedRange",
    'software': "None",
    }

    const_fiveJSON = {
    'vehicle': 'modelX',
    'trim' : 'base',
    'exteriorColor': 'whiteExterior',
    'wheels': "20AllTerrain",
    'tonneau': "manualTonneau",
    'packages': "None",
    'interiorAddons': "None",
    'interiorColor': "blackInterior",
    'range': "standardRange",
    'software': "None",
    }

    const_sixJSON = {
    'vehicle': 'cyberTruck',
    'trim' : 'adventure',
    'exteriorColor': 'whiteExterior',
    'wheels': "20AllTerrain",
    'tonneau': "powerTonneau",
    'packages': "offroadPackage,matchingSpareTire",
    'interiorAddons': "wirelessCharger",
    'interiorColor': "blackInterior",
    'range': "extendedRange",
    'software': "selfDrivingPackage",
    }

    const_sevenJSON = {
    'vehicle': 'modelX',
    'trim' : 'base',
    'exteriorColor': 'whiteExterior',
    'wheels': "21AllSeason",
    'tonneau': "powerTonneau",
    'packages': "None",
    'interiorAddons': "None",
    'interiorColor': "blackInterior",
    'range': "mediumRange",
    'software': "None",
    }

    historical_data = pd.DataFrame([const_oneJSON, const_twoJSON, const_threeJSON, const_fourJSON, const_fiveJSON, const_sixJSON, const_sevenJSON])

    input_vec = enc.transform([specs_frame.append(historical_data.iloc[0], sort=False).iloc[-1]])
    idx = np.argsort(-cosine_similarity(input_vec, specs.values))[0,:][:1]
    rslt = enc.inverse_transform([specs.iloc[idx]])

    interactions = pd.DataFrame(columns=['query_id','specs_id'])
    interactions['query_id'] = queries.index.tolist()
    input_vecs = enc.transform(specs_frame.append(historical_data, sort=False).iloc[-len(historical_data):])
    interactions['specs_id'] = np.argsort(-cosine_similarity(input_vecs, specs.values))[:,0]

    module_url = "https://tfhub.dev/google/universal-sentence-encoder/4"
    embed_model = hub.load(module_url)
    def embed(input):
    return embed_model(input)
    query_vecs = embed(queries['query'].tolist()).numpy()

    _query = input('Please enter query: ') or 'i want the most basic suv possible'
    _query_vec = embed([_query]).numpy()
    _match_qid = np.argsort(-cosine_similarity(_query_vec, query_vecs))[0,:][:1]
    _match_sid = interactions.loc[interactions['query_id']==_match_qid[0], 'specs_id'].values[0]
    input_vec = enc.transform([specs_frame.append(historical_data.iloc[0], sort=False).iloc[-1]])
    idx = np.argsort(-cosine_similarity([specs.iloc[_match_sid].values], specs.values))[0,:][:5]
    results = []
    for x in idx:
    results.append(enc.inverse_transform([specs.iloc[x]]))
    _temp = np.array(results).reshape(5,-1)
    _temp = pd.DataFrame(_temp, columns=specs_frame.columns)
    print(_temp)

Experiment 2

Celeb Scraping

Facebook Scraping

/img/content-blog-raw-blog-vehicle-suggestions-untitled-4.png

Twitter Scraping

/img/content-blog-raw-blog-vehicle-suggestions-untitled-5.png

Dataframe

/img/content-blog-raw-blog-vehicle-suggestions-untitled-6.png

Insta Image Grid

/img/content-blog-raw-blog-vehicle-suggestions-untitled-7.png

User Text NER

/img/content-blog-raw-blog-vehicle-suggestions-untitled-8.png

Experiment 3

Topic model

Topic scores

/img/content-blog-raw-blog-vehicle-suggestions-untitled-9.png

JSON rules

/img/content-blog-raw-blog-vehicle-suggestions-untitled-10.png

Results and Discussion

  • API with 3 input fields - Facebook username, Twitter handle & Instagram username
  • The system will automatically scrap the user's publicly available text and images from these 3 social media platforms and provide a list of recommendations from most to least preferred product