5 posts tagged with "nlp"

View All Tags

Document Recommendation

October 1, 2021 · 2 min read

Sparsh Agarwal

Principal Developer

/img/content-blog-raw-blog-document-recommendation-untitled.png

Introduction

Business objective

For the given user query, recommend relevant documents (BRM_ifam)

Technical objective

1-to-N mapping of given input text

Proposed Framework 1 — Hybrid Recommender System

Text → Vector (Universal Sentence Embedding with TF Hub)
Vector → Content-based Filtering Recommendation
Index → Interaction Matrix
Interaction Matrix → Collaborative Filtering Recommendation
Collaborative + Content-based → Hybrid Recommendation
Evaluation: Area-under-curve

Proposed Framework 2 — Content-based Recommender System

Find A most similar user → Cosine similarity
For each user in A, find TopK Most Similar Items → Map Argsort
For each item Find TopL Most Similar Items → Cosine similarity
Display
Implement an evaluation metric
Evaluate

Results and Discussion

build.py → this script will take the training data as input and save all the required files in the same working directory
recommend.py → this script will take the user query as input and predict top-K BRM recommendations

Variables (during recommendation, you will be asked 2–3 choices, the meaning of those choices are as following)

top-K — how many top items you want to get in recommendation
secondary items: this will determine how many similar items you would like to add in consideration, for each primary matching item
sorted by frequency: since multiple input queries might point to same output, therefore this option allows to take that frequence count of outputs in consideration and will move the more frequent items at the top.

Code

https://gist.github.com/sparsh-ai/4e5f06ba3c55192b33a276ee67dbd42c#file-text-recommendations-ipynb

Name & Address Parsing

October 1, 2021 · 4 min read

Sparsh Agarwal

Principal Developer

/img/content-blog-raw-blog-name-&-address-parsing-untitled.png

Introduction

Create an API that can parse and classify names and addresses given a string. We tried probablepeople and usaddress. These work well separately but need the functionality of these packages combined, and better accuracy than what probablepeople provides. For the API, I'd like to mimic this with some minor modifications. A few examples:

"KING JOHN A 5643 ROUTH CREEK PKWY #1314 RICHARDSON TEXAS 750820146 UNITED STATES OF AMERICA" would return type: person; first_name: JOHN; last_name: KING; middle: A; street_address: 5643 ROUTH CREEK PKWY #1314; city: RICHARDSON; state: TEXAS; zip: 75082-0146; country: UNITED STATES OF AMERICA.
"THRM NGUYEN LIVING TRUST 2720 SUMMERTREE CARROLLTON HOUSTON TEXAS 750062646 UNITED STATES OF AMERICA" would return type: entity; name: THRM NGUYEN LIVING TRUST; street_address: 2720 SUMMERTREE CARROLLTON; state: TEXAS; city: HOUSTON; zip: 75006-2646; country: UNITED STATES OF AMERICA.

Modeling Approach

List of Entities

List of Entities A - Person, Household, Corporation

List of Entities B - Person First name, Person Middle name, Person Last name, Street address, City, State, Pincode, Country, Company name

Endpoint Configuration

OOR Endpoint

Input Instance: ANDERSON, EARLINE 1423 NEW YORK AVE FORT WORTH, TX 76104 7522

Output Tags:-
<Type> - Person/Household/Corporation
<GivenName>, <MiddleName>, <Surname> - if Type Person/Household
<Name> - Full Name - if Type Person 
<Name> - Household - if Type Household
<Name> - Corporation - If Type Corporation
<Address> - Full Address
<StreetAddress>, <City>, <State>, <Zipcode>, <Country>
~~NameConfidence, AddrConfidence~~

Name Endpoint

Input Instance: ANDERSON, EARLINE

Output Tags:-

- <Type> - Person/Household/Corporation
- <GivenName>, <MiddleName>, <Surname> - if Type Person/Household
- <Name> - Full Name - if Type Person
- <Name> - Household - if Type Household
- <Name> - Corporation - If Type Corporation
- ~~NameConfidence~~

Address Endpoint

Input Instance: 1423 NEW YORK AVE FORT WORTH, TX 76104 7522

Output Tags:-

- <Address> - Full Address
- <StreetAddress>, <City>, <State>, <Zipcode>, <Country>
- ~~AddrConfidence~~

Process Flow

Pytorch Flair NER model
Pre trained word embeddings
Additional parsing models on top of name tags
Tagging of 1000+ records to create training data
Deployment as REST api with 3 endpoints - name parse, address parse and whole string parse

Framework

/img/content-blog-raw-blog-name-&-address-parsing-untitled-1.png

/img/content-blog-raw-blog-name-&-address-parsing-untitled-2.png

Tagging process

I used Doccano (https://github.com/doccano/doccano) for labeling the dataset. This tool is open-source and free to use. I deployed it with a one-click Heroku service (fig 1). After launching the app, log in with the provided credentials, and create a project (fig 2). Create the labels and upload the dataset (fig 3). Start the annotation process (fig 4). Now after enough annotations (you do not need complete all annotations in one go), go back to projects > edit section and export the data (fig 5). Bring the exported JSON file in python and run the model training code. The whole model will automatically get trained on the new annotations. To make the training faster, you can use Nvidia GPU support.

fig 1: screenshot taken from Doccano's github page

fig 2: Doccano's deployed app homepage

fig 3: create the labels. I defined these labels for my project

fig 5: export the annotations

Model

I first tried the Spacy NER blank model but it was not giving high-quality results. So I moved to the PyTorch Flair NER model. This model was a way faster (5 min training because of GPU compatibility comparing to 1-hour Spacy training time) and also much more accurate. F1 results for all tags were near perfect (score of 1). This score will increase further with more labeled data. This model is production-ready.

Inference

For OOR, I directly used the model's output for core tagging and created the aggregated tags like recipient (aggregation of name tags) and address (aggregation of address tags like city and state) using simple conditional concatenation. For only Name and only Address inference, I added the dummy address in name text and dummy name in address text. This way, I passed the text in same model and later on filtered the required tags as output.

API

I used Flask REST framework in Python to build the API with 3 endpoints. This API is production-ready.

Results and Discussion

0.99 F1 score on 6 out of 8 tags & 0.95+ F1 score on other 2 tags
API inference time of less than 1 second on single CPU

Semantic Similarity

October 1, 2021 · 2 min read

Sparsh Agarwal

Principal Developer

/img/content-blog-raw-blog-semantic-similarity-untitled.png

Introduction

Deliverable - Two paragraph-level distance outputs for L and Q, each has 35 columns.

For each paragraph, we need to calculate the L1 distance of consecutive sentences in this paragraph, and then generate the mean and standard deviation of all these distances for this paragraph. For example, say the paragraph 1 starts from sentence1 and ends with sentence 5. First, calculate the L1 distances for L1(1,2), L1(2,3), L1(3,4) and L1(4,5) and then calculate the mean and standard deviation of the 4 distances. In the end we got two measures for this paragraph: L1_m and L1_std. Similarly, we need to calculate the mean and standard deviation using L2 distance, plus a simple mean and deviation of the distances. We use 6 different embeddings: all dimensions of BERT embeddings, 100,200 and 300 dimensions of PCA Bert embeddings (PCA is a dimension reduction technique

In the end, we will have 35 columns for each paragraph : Paragraph ID +#sentences in the paragraph +(cosine_m, cosine_std,cossimillarity_m, cosimmilarity_std, L1_m, L1_std, L2_m, L2_std ) – by- ( all, 100, 200, 300)= 3+8*4.

Note: for paragraph that only has 1 sentence, the std measures are empty.

Modeling Approach

Process Flow for Use Case 1

Splitting paragraphs into sentences using 1) NLTK Sentence Tokenizer, 2) Spacy Sentence Tokenizer and, on two additional symbols : and ...
Text Preprocessing: Lowercasing, Removing Non-alphanumeric characters, Removing Null records, Removing sentence records (rows) having less than 3 words.
TF-IDF vectorization
LSA over document-term matrix
Cosine distance calculation of adjacent sentences (rows)

Process Flow for Use Case 2

Split paragraphs into sentences
Text cleaning
BERT Sentence Encoding
BERT PCA 100
BERT PCA 200
BERT PCA 300
Calculate distance between consecutive sentences in the paragraph
Distances: L1, L2 and Cosine and Cosine similarity
Statistics: Mean, SD

Experimental Setup

#IncrementalPCA
GPU to speed up
Data chunking
Calculate BERT for a chunk and store in disk

Vehicle Suggestions

October 1, 2021 · 14 min read

Sparsh Agarwal

Principal Developer

/img/content-blog-raw-blog-vehicle-suggestions-untitled.png

Introduction

The customer owns a franchise store for selling Tesla Automobiles. The objective is to predict user preferences using social media data.

Task 1 - Suggest the best vehicle for the given description

Task 2 - Suggest the best vehicle for the given social media id of the user

Customer queries

// car or truck or no mention of vehicle type means Cyber Truck
// SUV mention means Model X
const one = "I'm looking for a fast suv that I can go camping without worrying about recharging".;
const two = "cheap red car that is able to go long distances";
const three = "i am looking for a daily driver that i can charge everyday, do not need any extras";
const four = "i like to go offroading a lot on my jeep and i want to do the same with the truck";
const five = "i want the most basic suv possible";
const six = "I want all of the addons";
// mentions of large family or many people means model x
const seven = "I have a big family and want to be able to take them around town and run errands without worrying about charging";

Expected output

const oneJson = {
vehicle: 'Model X',
trim : 'adventure',
exteriorColor: 'whiteExterior',
wheels: "22Performance",
tonneau: "powerTonneau",
packages: "",
interiorAddons: "",
interiorColor: "blackInterior",
range: "extendedRange",
software: "",
}

const twoJSON = {
vehicle: 'Cyber Truck',
trim : 'base',
exteriorColor: 'whiteExterior',
wheels: "21AllSeason",
tonneau: "powerTonneau",
packages: "",
interiorAddons: "",
interiorColor: "blackInterior",
range: "extendedRange",
software: "",
}

const threeJSON = {
vehicle: 'Cyber Truck',
trim : 'base',
exteriorColor: 'whiteExterior',
wheels: "21AllSeason",
tonneau: "powerTonneau",
packages: "",
interiorAddons: "",
interiorColor: "blackInterior",
range: "standardRange",
software: "",
}

const fourJSON = {
vehicle: 'Cyber Truck',
trim : 'adventure',
exteriorColor: 'whiteExterior',
wheels: "20AllTerrain",
tonneau: "powerTonneau",
packages: "offroadPackage,matchingSpareTire",
interiorAddons: "",
interiorColor: "blackInterior",
range: "extendedRange",
software: "",
}

const fiveJSON = {
vehicle: 'Model X',
trim : 'base',
exteriorColor: 'whiteExterior',
wheels: "20AllTerrain",
tonneau: "manualTonneau",
packages: "",
interiorAddons: "",
interiorColor: "blackInterior",
range: "standardRange",
software: "",
}

const sixJSON = {
vehicle: 'Cyber Truck',
trim : 'adventure',
exteriorColor: 'whiteExterior',
wheels: "20AllTerrain",
tonneau: "powerTonneau",
packages: "offroadPackage,matchingSpareTire",
interiorAddons: "wirelessCharger",
interiorColor: "blackInterior",
range: "extendedRange",
software: "selfDrivingPackage",
}

const sevenJSON = {
vehicle: 'Model X',
trim : 'base',
exteriorColor: 'whiteExterior',
wheels: "21AllSeason",
tonneau: "powerTonneau",
packages: "",
interiorAddons: "",
interiorColor: "blackInterior",
range: "mediumRange",
software: "",
}

Vehicle model configurations

const configuration = {
meta: {
configurationId: '???',
storeId: 'US_SALES',
country: 'US',
version: '1.0',
effectiveDate: '???',
currency: 'USD',
locale: 'en-US',
availableLocales: ['en-US'],
},

defaults: {
basePrice: 50000,
deposit: 1000,
initialSelection: [
'adventure',
'whiteExterior',
'21AllSeason',
'powerTonneau',
'blackInterior',
'mediumRange',
],
},

groups: {
trim: {
name: { 'en-US': 'Choose trim' },
multiselect: false,
required: true,
options: ['base', 'adventure'],
},
exteriorColor: {
name: { 'en-US': 'Choose paint' },
multiselect: false,
required: true,
options: [
'whiteExterior',
'blueExterior',
'silverExterior',
'greyExterior',
'blackExterior',
'redExterior',
'greenExterior',
],
},
wheels: {
name: { 'en-US': 'Choose wheels' },
multiselect: false,
required: true,
options: ['21AllSeason', '20AllTerrain', '22Performance'],
},
tonneau: {
name: { 'en-US': 'Choose tonneau cover' },
multiselect: false,
required: true,
options: ['manualTonneau', 'powerTonneau'],
},
packages: {
name: { 'en-US': 'Choose upgrades' },
multiselect: true,
required: false,
options: ['offroadPackage', 'matchingSpareTire'],
},
interiorColor: {
name: { 'en-US': 'Choose interior' },
multiselect: false,
required: true,
options: ['greyInterior', 'blackInterior', 'greenInterior'],
},
interiorAddons: {
name: { 'en-US': 'Choose upgrade' },
multiselect: true,
required: false,
options: ['wirelessCharger'],
},
range: {
name: { 'en-US': 'Choose range' },
multiselect: false,
required: true,
options: ['standardRange', 'mediumRange', 'extendedRange'],
},
software: {
name: { 'en-US': 'Choose upgrade' },
multiselect: true,
required: false,
options: ['selfDrivingPackage'],
},
specs: {
name: { 'en-US': 'Specs overview *' },
attrs: {
description: {
'en-US':
"* Options, specs and pricing may change as we approach production. We'll contact you to review any updates to your preferred build.",
},
},
multiselect: false,
required: false,
options: ['acceleration', 'power', 'towing', 'range'],
},
},

options: {
base: {
name: { 'en-US': 'Base' },
attrs: {
description: { 'en-US': 'Production begins 2022' },
},
visual: true,
price: 0,
},
adventure: {
name: { 'en-US': 'Adventure' },
attrs: {
description: { 'en-US': 'Production begins 2021' },
},
visual: true,
price: 10000,
},

standardRange: {
name: { 'en-US': 'Standard' },
attrs: {
description: { 'en-US': '230+ miles' },
},
price: 0,
},
mediumRange: {
name: { 'en-US': 'Medium' },
attrs: {
description: { 'en-US': '300+ miles' },
},
price: 3000,
},
extendedRange: {
name: { 'en-US': 'Extended' },
attrs: {
description: { 'en-US': '400+ miles' },
},
price: 8000,
},

greenExterior: {
name: { 'en-US': 'Adirondack Green' },
attrs: {
imageUrl: '/public/images/configurationOptions/exteriorcolors/green.svg',
},
visual: true,
price: 2000,
},
blueExterior: {
name: { 'en-US': 'Trestles Blue' },
attrs: {
imageUrl: '/public/images/configurationOptions/exteriorcolors/blue.svg',
},
visual: true,
price: 1000,
},
whiteExterior: {
name: { 'en-US': 'Arctic White' },
attrs: {
imageUrl: '/public/images/configurationOptions/exteriorcolors/white.svg',
},
visual: true,
price: 0,
},
silverExterior: {
name: { 'en-US': 'Silver Gracier' },
attrs: {
imageUrl: '/public/images/configurationOptions/exteriorcolors/silver.svg',
},
visual: true,
price: 1000,
},
blackExterior: {
name: { 'en-US': 'Cosmic Black' },
attrs: {
imageUrl: '/public/images/configurationOptions/exteriorcolors/black.svg',
},
visual: true,
price: 1000,
},
redExterior: {
name: { 'en-US': 'Red Rocks' },
attrs: {
imageUrl: '/public/images/configurationOptions/exteriorcolors/red.svg',
},
visual: true,
price: 2000,
},
greyExterior: {
name: { 'en-US': 'Antracite Grey' },
attrs: {
imageUrl: '/public/images/configurationOptions/exteriorcolors/grey.svg',
},
visual: true,
price: 1000,
},

'21AllSeason': {
name: { 'en-US': '21" Cast Wheel - All Season' },
attrs: {
imageUrl: '/public/images/configurationOptions/wheels/twentyone.svg',
},
visual: true,
price: 0,
},
'20AllTerrain': {
name: { 'en-US': '20" Forged Wheel - All Terrain' },
attrs: {
imageUrl: '/public/images/configurationOptions/wheels/twenty.svg',
},
visual: true,
price: 0,
},
'22Performance': {
name: { 'en-US': '22" Cast Wheel - Performance' },
attrs: {
imageUrl: '/public/images/configurationOptions/wheels/twentytwo.svg',
},
visual: true,
price: 2000,
},

manualTonneau: {
name: { 'en-US': 'Manual' },
attrs: {
description: { 'en-US': 'Description here' },
},
price: 0,
},
powerTonneau: {
name: { 'en-US': 'Powered' },
attrs: {
description: { 'en-US': 'Description here' },
},
price: 0,
},

blackInterior: {
name: { 'en-US': 'Black' },
attrs: {
imageUrl: '/public/images/configurationOptions/interiorcolors/black.svg',
},
visual: true,
price: 0,
},
greyInterior: {
name: { 'en-US': 'Grey' },
attrs: {
imageUrl: '/public/images/configurationOptions/interiorcolors/grey.svg',
},
visual: true,
price: 1000,
},
greenInterior: {
name: { 'en-US': 'Green' },
attrs: {
imageUrl: '/public/images/configurationOptions/interiorcolors/green.svg',
},
visual: true,
price: 2000,
},

offroadPackage: {
name: { 'en-US': 'Off-Road' },
attrs: {
description: { 'en-US': 'Lorem ipsum dolor sit amet.' },
imageUrl: '/public/images/configurationOptions/packages/offroad.png',
},
visual: true,
price: 5000,
},
matchingSpareTire: {
name: { 'en-US': 'Matching Spare Tire' },
attrs: {
description: { 'en-US': 'Full sized tire' },
imageUrl: '/public/images/configurationOptions/packages/spare.png',
},
price: 500,
},

wirelessCharger: {
name: { 'en-US': 'Wireless charger' },
attrs: {
description: { 'en-US': 'Lorem ipsum dolor sit amet.' },
imageUrl: '/public/images/configurationOptions/packages/wireless.png',
},
price: 100,
},
selfDrivingPackage: {
name: { 'en-US': 'Autonomy' },
attrs: {
description: { 'en-US': 'Lorem ipsum dolor sit amet.' },
imageUrl: '/public/images/configurationOptions/packages/autonomy.png',
},
price: 7000,
},

acceleration: {
name: { 'en-US': '0 - 60 mph' },
attrs: {
units: { 'en-US': 'sec' },
decimals: 1,
},
value: 3.4,
},
power: {
name: { 'en-US': 'Horsepower' },
attrs: {
units: { 'en-US': 'hp' },
},
value: 750,
},
towing: {
name: { 'en-US': 'Towing' },
attrs: {
units: { 'en-US': 'lbs' },
},
value: 10000,
},
range: {
name: { 'en-US': 'Range' },
attrs: {
units: { 'en-US': 'mi' },
},
value: 400,
},
}
};

Public datasets

Instagram: 16539 images from 972 Instagram influencers (link)
TechCrunchPosts: (link)
Tweets: (link)

Primary (available for academic use only, need university affiliation for access)

A Dataset and Benchmarks for Multimedia Social Analysis

Secondary (low quality data, not sure if can be used at all)

Hacker News Posts
TechCrunch Posts Compilation
Instagram image data HowTo
Flikr Large with likes and comments
The Images of Groups Dataset
http://www.multimediaeval.org/datasets/
The InstaCities1M Dataset
Multimodal Meme Classification: Identifying Offensive Content in Image and Text
Understanding Police Social Media Usage Through Posts and Tweets
Topic clusters text
- Model X
  - I like model X
  - I want to buy model X
  - Model X is my favorite car
  - Tesla Modelx is my dream
  - modelx tesla love
- Cyber Truck
  - I like Cyber Truck
  - I want to buy Cyber Truck
  - Cyber Truck is my favorite car
  - Tesla Cyber Truck is my dream
  - CyberTruck tesla love
- Adventure
  - I like adventure
  - sports i play
  - i went on trip
  - I travels a lot
  - car adventure
- Exterior Color White
  - I like white color
  - White is my fav
  - white car love
  - I like white exterior
- Exterior Color Black
  - I like Black color
  - Black is my fav
  - Black car love
  - I like Black exterior
- Exterior Color Blue
  - I like Blue color
  - Blue is my fav
  - Blue car love
  - I like Blue exterior
- Exterior Color Green
  - I like Green color
  - Green is my fav
  - Green car love
  - I like Green exterior
- Exterior Color Red
  - I like Red color
  - Red is my fav
  - Red car love
  - I like Red exterior
- Exterior Color Grey
  - I like Grey color
  - Grey is my fav
  - Grey car love
  - I like Grey exterior
- Exterior Color Silver
  - I like Silver color
  - Silver is my fav
  - Silver car love
  - I like Silver exterior
- Self driving
  - I like self driving technology
  - selfDrivingPackage
  - selfDrivingtech love
  - self drive is my fav
  - self driving car is amazing
Celebs

Logical Reasoning

If I implicitly rate pictures of blue car, that means I might prefer a blue car.
If I like posts of self-driving, that means I might prefer a self-driving option.

Scope

Scope 1

/img/content-blog-raw-blog-vehicle-suggestions-untitled-2.png

Scope 2

media content categories: text and images

platforms: facebook, twitter and instagram

implicit rating categories: like, comment, share

columns: userid, timestamp, platform, type, content, rating

Model Framework

Model framework 1

Convert user's natural language query into vector using Universal Sentence Embedding model
Create a product specs binary matrix based on different categories
Find TopK similar query vectors using cosine distance
For each TopK vector, Find TopM product specs using interaction table weights
For each TopM specification, find TopN similar specs using binary matrix
Show all the qualified product specifications

Model framework 2

Seed data: 10 users with ground-truth persona, media content and implicit ratings
Inflated data: 10 users with media content and implicit ratings
media content → Implicit rating (A)
media content → feature vector (B) + (A) → weighted pooling → similar users (C)
media content → QA model → slot filling → global pooling → item associations (D)
(C) → content-based filtering → item recommendations → (D) → top-k recommendations

User selection

People who are connected to social media community of electric vehicles
Seed users are those who already have an electric vehicle
Inflated users are those who doesn't own an EV but inclined to purchase
Users having presense on all three sites or at least 2
List of common users https://www.facebook.com/gossman https://www.facebook.com/ryanm06 https://www.facebook.com/chad.turner.7146 https://www.facebook.com/cjacobs05 https://www.facebook.com/MafiaAllen https://www.facebook.com/rahul.mii.33 https://www.facebook.com/francisco.chavira.547 https://www.facebook.com/JayTheillest74 https://www.facebook.com/michael.creighton20 https://www.facebook.com/darryl.grigggardening https://www.facebook.com/4X4Aus/ https://www.instagram.com/minnyrc/ https://www.instagram.com/warnerbu7lt/
List of celebs
1. https://en.wikipedia.org/wiki/List_of_most-followed_Instagram_accounts
2. https://en.wikipedia.org/wiki/List_of_most-followed_Twitter_accounts
3. https://en.wikipedia.org/wiki/List_of_most-followed_Facebook_pages
  ['Jennifer Lopez', 'Virat Kohli', 'Ariana Grande', 'Dwayne Johnson', 'Kylie Jenner', 'Lionel Messi', 'LeBron James', 'Beyoncé', 'Justin Bieber', 'Akshay Kumar', 'Demi Lovato', 'Kendall Jenner', 'Nicki Minaj', 'Khloé Kardashian', 'Kim Kardashian', 'Gigi Hadid', 'Ellen DeGeneres', 'Deepika Padukone', 'Rihanna', 'Shakira', 'Cardi B', 'Eminem', 'Drake', 'Chris Brown', 'Maluma', 'Vin Diesel', 'Ronaldinho', 'Kevin Hart', 'Emma Watson', 'Shawn Mendes', 'Neymar', 'Justin Timberlake', 'Katy Perry', 'Donald Trump', 'Lady Gaga', 'Amitabh Bachchan', 'Selena Gomez', 'Lil Wayne', 'Elon Musk', 'Britney Spears', 'Jimmy Fallon', 'Bill Gates', 'Ariana Grande', 'Miley Cyrus', 'Oprah Winfrey', 'Cristiano Ronaldo', 'Salman Khan', 'Shah Rukh Khan', 'Niall Horan']

Model framework 3

User-User Similarity (clustering)

User → Media content → Embedding → Average pooling
Cosine Similarity of user's social vector with other user's social vector

User-Item Similarity (reranking)

User → Implicit Rating on media content M → M's correlation with item features
Item features: familySize
Cosine Similarity of user's social vector with item's feature vector

User-User Similarity (clustering)

User → Media content → Embedding → Average pooling
Cosine Similarity of user's social vector with other user's social vector

User-Item Similarity (reranking)

User → Implicit Rating on media content M → M's correlation with item features
Item features: familySize
Cosine Similarity of user's social vector with item's feature vector

Model framework 4

/img/content-blog-raw-blog-vehicle-suggestions-untitled-3.png

Text → Prepare → Vectorize → Average → Similar Users

Image → Prepare → Vectorize → Average → Similar Users

Text → Prepare → QA → Slot filling

Image → Prepare → VQA → Slot filling

Image → Similar Image from users → Detailed enquiry

Model framework 5

Topic Clusters Text
Topic Clusters Image
Fetch raw text and images
Combine, Clean and Store text in text dataframe
Vectorize Texts
Cosine similarities of texts with topic clusters
Vectorize Images
Cosine similarities of images with topic clusters

Experimental Setup

Experiment 1

import numpy as np
import pandas as pd
import tensorflow_hub as hub
from itertools import product
from sklearn.preprocessing import OneHotEncoder
from sklearn.metrics.pairwise import cosine_similarity

vehicle = ['modelX', 'cyberTruck']
trim = ['adventure', 'base']
exteriorColor = ['whiteExterior', 'blueExterior', 'silverExterior', 'greyExterior', 'blackExterior', 'redExterior', 'greenExterior']
wheels = ['20AllTerrain', '21AllSeason', '22Performance']
tonneau = ['powerTonneau', 'manualTonneau']
interiorColor = ['blackInterior', 'greyInterior', 'greenInterior']
range = ['standardRange', 'mediumRange', 'extendedRange']
packages = ['offroadPackage', 'matchingSpareTire', 'offroadPackage,matchingSpareTire', 'None']
interiorAddons = ['wirelessCharger', 'None']
software = ['selfDrivingPackage', 'None']

specs_cols = ['vehicle', 'trim', 'exteriorColor', 'wheels', 'tonneau', 'interiorColor', 'range', 'packages', 'interiorAddons', 'software']
specs = pd.DataFrame(list(product(vehicle, trim, exteriorColor, wheels, tonneau, interiorColor, range, packages, interiorAddons, software)),
                     columns=specs_cols)

enc = OneHotEncoder(handle_unknown='error', sparse=False)
specs = pd.DataFrame(enc.fit_transform(specs))

specs_ids = specs.index.tolist()

query_list = ["I'm looking for a fast suv that I can go camping without worrying about recharging",
              "cheap red car that is able to go long distances",
              "i am looking for a daily driver that i can charge everyday, do not need any extras",
              "i like to go offroading a lot on my jeep and i want to do the same with the truck",
              "i want the most basic suv possible",
              "I want all of the addons", 
              "I have a big family and want to be able to take them around town and run errands without worrying about charging"]

queries = pd.DataFrame(query_list, columns=['query'])
query_ids = queries.index.tolist()

const_oneJSON = {
'vehicle': 'modelX',
'trim' : 'adventure',
'exteriorColor': 'whiteExterior',
'wheels': "22Performance",
'tonneau': "powerTonneau",
'packages': "None",
'interiorAddons': "None",
'interiorColor': "blackInterior",
'range': "extendedRange",
'software': "None",
}

const_twoJSON = {
'vehicle': 'cyberTruck',
'trim' : 'base',
'exteriorColor': 'whiteExterior',
'wheels': "21AllSeason",
'tonneau': "powerTonneau",
'packages': "None",
'interiorAddons': "None",
'interiorColor': "blackInterior",
'range': "extendedRange",
'software': "None",
}

const_threeJSON = {
'vehicle': 'cyberTruck',
'trim' : 'base',
'exteriorColor': 'whiteExterior',
'wheels': "21AllSeason",
'tonneau': "powerTonneau",
'packages': "None",
'interiorAddons': "None",
'interiorColor': "blackInterior",
'range': "standardRange",
'software': "None",
}

const_fourJSON = {
'vehicle': 'cyberTruck',
'trim' : 'adventure',
'exteriorColor': 'whiteExterior',
'wheels': "20AllTerrain",
'tonneau': "powerTonneau",
'packages': "offroadPackage,matchingSpareTire",
'interiorAddons': "None",
'interiorColor': "blackInterior",
'range': "extendedRange",
'software': "None",
}

const_fiveJSON = {
'vehicle': 'modelX',
'trim' : 'base',
'exteriorColor': 'whiteExterior',
'wheels': "20AllTerrain",
'tonneau': "manualTonneau",
'packages': "None",
'interiorAddons': "None",
'interiorColor': "blackInterior",
'range': "standardRange",
'software': "None",
}

const_sixJSON = {
'vehicle': 'cyberTruck',
'trim' : 'adventure',
'exteriorColor': 'whiteExterior',
'wheels': "20AllTerrain",
'tonneau': "powerTonneau",
'packages': "offroadPackage,matchingSpareTire",
'interiorAddons': "wirelessCharger",
'interiorColor': "blackInterior",
'range': "extendedRange",
'software': "selfDrivingPackage",
}

const_sevenJSON = {
'vehicle': 'modelX',
'trim' : 'base',
'exteriorColor': 'whiteExterior',
'wheels': "21AllSeason",
'tonneau': "powerTonneau",
'packages': "None",
'interiorAddons': "None",
'interiorColor': "blackInterior",
'range': "mediumRange",
'software': "None",
}

historical_data = pd.DataFrame([const_oneJSON, const_twoJSON, const_threeJSON, const_fourJSON, const_fiveJSON, const_sixJSON, const_sevenJSON])

input_vec = enc.transform([specs_frame.append(historical_data.iloc[0], sort=False).iloc[-1]])
idx = np.argsort(-cosine_similarity(input_vec, specs.values))[0,:][:1]
rslt = enc.inverse_transform([specs.iloc[idx]])

interactions = pd.DataFrame(columns=['query_id','specs_id'])
interactions['query_id'] = queries.index.tolist()
input_vecs = enc.transform(specs_frame.append(historical_data, sort=False).iloc[-len(historical_data):])
interactions['specs_id'] = np.argsort(-cosine_similarity(input_vecs, specs.values))[:,0]

module_url = "https://tfhub.dev/google/universal-sentence-encoder/4" 
embed_model = hub.load(module_url)
def embed(input):
  return embed_model(input)
query_vecs = embed(queries['query'].tolist()).numpy()

_query = input('Please enter query: ') or 'i want the most basic suv possible'
_query_vec = embed([_query]).numpy()
_match_qid = np.argsort(-cosine_similarity(_query_vec, query_vecs))[0,:][:1]
_match_sid = interactions.loc[interactions['query_id']==_match_qid[0], 'specs_id'].values[0]
input_vec = enc.transform([specs_frame.append(historical_data.iloc[0], sort=False).iloc[-1]])
idx = np.argsort(-cosine_similarity([specs.iloc[_match_sid].values], specs.values))[0,:][:5]
results = []
for x in idx:
  results.append(enc.inverse_transform([specs.iloc[x]]))
_temp = np.array(results).reshape(5,-1)
_temp = pd.DataFrame(_temp, columns=specs_frame.columns)
print(_temp)

Experiment 2

Celeb Scraping

Facebook Scraping

/img/content-blog-raw-blog-vehicle-suggestions-untitled-4.png

Twitter Scraping

/img/content-blog-raw-blog-vehicle-suggestions-untitled-5.png

Dataframe

/img/content-blog-raw-blog-vehicle-suggestions-untitled-6.png

Insta Image Grid

/img/content-blog-raw-blog-vehicle-suggestions-untitled-7.png

User Text NER

/img/content-blog-raw-blog-vehicle-suggestions-untitled-8.png

Experiment 3

Topic model

Topic scores

/img/content-blog-raw-blog-vehicle-suggestions-untitled-9.png

JSON rules

/img/content-blog-raw-blog-vehicle-suggestions-untitled-10.png

Results and Discussion

API with 3 input fields - Facebook username, Twitter handle & Instagram username
The system will automatically scrap the user's publicly available text and images from these 3 social media platforms and provide a list of recommendations from most to least preferred product

Wellness tracker chatbot

October 1, 2021 · One min read

Sparsh Agarwal

Principal Developer

Problem Statement

A bot that logs daily wellness data to a spreadsheet (using the Airtable API), to help the user keep track of their health goals. Connect the assistant to a messaging channel—Twilio—so users can talk to the assistant via text message and Whatsapp.

Proposed Solution

RASA chatbot with Forms and Custom actions
Connect with Airtable API to log records in table database
Connect with Whatsapp for user interaction

Modeling

Delivery

https://github.com/sparsh-ai/chatbots/tree/master/wellnessTracker

Reference

https://www.udemy.com/course/rasa-for-beginners/learn/lecture/20746878#overview

Introduction​

Business objective​

Technical objective​

Proposed Framework 1 — Hybrid Recommender System​

Proposed Framework 2 — Content-based Recommender System​

Results and Discussion​

Code​

Introduction

Modeling Approach

List of Entities​

Endpoint Configuration​

Process Flow​

Framework

Tagging process

Model

Inference

API​

Results and Discussion

Introduction

Modeling Approach

Process Flow for Use Case 1​

Process Flow for Use Case 2​

Experimental Setup

Introduction

Customer queries​

Public datasets​

Logical Reasoning​

Scope

Scope 1​

Scope 2​

Model Framework

Model framework 1​

Model framework 2​

Model framework 3​

Model framework 4​

Model framework 5​

Experimental Setup

Experiment 2​

Facebook Scraping​

Twitter Scraping​

Dataframe​

Insta Image Grid​

User Text NER​

Experiment 3​

Topic scores​

JSON rules​

Results and Discussion

Problem Statement​

Proposed Solution​

Modeling​

Delivery​

Reference​

Introduction

Business objective

Technical objective

Proposed Framework 1 — Hybrid Recommender System

Proposed Framework 2 — Content-based Recommender System

Results and Discussion

Code

List of Entities

Endpoint Configuration

Process Flow

API

Process Flow for Use Case 1

Process Flow for Use Case 2

Customer queries

Public datasets

Logical Reasoning

Scope 1

Scope 2

Model framework 1

Model framework 2

Model framework 3

Model framework 4

Model framework 5

Experiment 2

Facebook Scraping

Twitter Scraping

Dataframe

Insta Image Grid

User Text NER

Experiment 3

Topic scores

JSON rules

Problem Statement

Proposed Solution

Modeling

Delivery

Reference