WS-NLP (Word & Sentence Natural Language Processing) Similarity Service


This research designs a Word & Sentence Natural Language Processing (WS-NLP) Similarity Service that uses WordNet lexical database as the knowledge graph to calculate the similarity between words, sentences, paragraphs, and documents.

About the Current Project

The similarity service first looks for which synset a given word belongs to in the WordNet. With the identified two synsets, the Word Similarity Calculation module uses Uniform-Cost Search to traverse the synsets in WordNet to find the shortest path. After the shortest path is found, the similarity between two words is the reciprocal of the edge-difference of the found shortest path.
A matrix represents the similarity between words in two sentences is calculated. The matrix is then sent to the Sentence Similarity Calculation module to determine the similarity between two sentences with Munkre’s Assignment Algorithm – a combinational optimization algorithm to find the optimal pairing between two sets, to match the most similar words in two sentences.

Services

Version 3.2

Demo version 3.2 includes a field to use the canonical method, which uses the root of each word (run instead of running or ran). Also, to speed up the English comparisons, costs are being precalculated. Checking this option will ignore words whose costs have not been calculated yet.

Read More

Version 3.0

Demo version 3.0 includes an email field to track system usage and the ability to use the N-gram/POS service, which will preprocess the sentences to extract valid N-grams and parts of speech before the similarity calculation is made.

Read More

Version 2.1

Demo version 2.1 accesses bridge version 2.1 so it can prepare a similarity calculation request that includes the use of Maximum Bipartite Matching Algorithm as well as the language English (en), French (fr), and Hindi (hi).

Read More

Version 2.0a

Demo version 2 (alpha) accesses bridge version 1 and implements maxBPM in JavaScript at client side .

Read More

Version 1.0

Demo version 1 accesses bridge version 1 and has no visual response but only JSON result in the console

Read More

Version 1.0a

Demo version 1 (alpha) accesses bridge v1.alpha and uses the extra data the bridge returns to showing progress indicator so users can know that the system is working.

Read More

Terms of Use

The VIP Research Group is a research group led by Prof. Maiga Chang (https://www.athabascau.ca/science-and-technology/our-people/maiga-chang.html) at School of Computing and Information Systems, Athabasca University. This "multi-sentence similarity calculation web service" (https://ws-nlp.vipresearch.ca/) is one of the research group's works. The research group does have follow-up research plan to improve it and further use it in other research projects.

Almost all of Prof. Chang's works are open access (or open source). The web service (https://ws-nlp.vipresearch.ca/) is now open access and there is no plan to make it open source. The web service is open access and running on a self-sponsored server, as all of other research projects (see http://maiga.athabascau.ca/#advanced) they will be always online, improving, and accessible as long as the cost can be affordable and covered by Prof. Chang.

Of course if in any case just like the access volume of the web service becoming high or any business/commercial takes advantage of using it to make money, then the term of using the web service may look for changes; for examples, donations, personal/academic/business license and subscription modes, etc. However, it is really too early to say that.

How to Access

Downloads

...
LORD Moodle Plugin

This plugin determines the similarity between all the learning activities in a course and uses the similarity to configure a network graph of the activities.

...
SAS Moodle Plugin

The ShortAnswerSimilarity plugin extracts the text from the answer provided by teacher and from the student's response. Once the two strings are extracted, the similarity between the two multi-sentences is calculated by the VIP Research Group's multi sentence similarity calculator web service.

About Us

...
Our Mission

Our research aims to bring a sentence similarity service which would measure the closeness of two or more sentence or paragraph using Natural Language Processing and WordNet

...
Our Supervisor

Dr. Maiga Chang is a Full Professor in the School of Computing and Information Systems at Athabasca University, Canada.

...
Research Goal

This research designs a Word & Sentence Natural Language Processing (WS-NLP) Similarity Service that uses WordNet lexical database as the knowledge graph to calculate the similarity between words, sentences, paragraphs, and documents.

Our Team

...
Bhavesh GANDHI
2021 (current)

Bhavesh Gandhi is an undergraduate student. He is pursuing Electrical and Electronics Engineering from Heritage Institute of Technology, India. His research interest lies in the domain of Machine Learning and Natural Language Processing.

...
Yash SRIVASTAVA
2021

Yash Srivastava is an undergraduate student studying computer engineering at the University of Alberta. Yash wants to pursue a career in Software Engineering and work on projects that utilize new technologies to improve people's lives. In his spare time Yash, loves to exercise and play basketball.

...
Theodore KRAHN
2019-2021

Theodore Krahn (Ted Krahn) received the BSc degree in Computing and Information Systems (BSc CIS) from Athabasca University in 2018 and started his MScIS study at AU in 2019. Ted is leading developer of LORD Moodle Plug-in.

...
Radomir WASOWSKI
2020

Radomir Wasowski is a Computer Engineering undergraduate student at the University of Alberta, Canada. He is most interested in interdisciplinary applications of digital technology, such as NLP

Videos


Presentation Video

Live demonstrations on a 12-weeks work outcome (June 2021~August 2021). This research uses Natural Language Processing basics with DBPedia to identify the valid n-gram words and important part-of-speech tags. The research outcome implements services that can take user's requests in JSON to help them verify valid part-of-speech tags and identify valid n-grams. The research outcome involve Python, PHP, JavaScript (AJAX and JSON), and DBPedia.

  1. Stage - 1: Automated System to extract and store Valid N-grams and their POS tags from DBpedia.
  2. Stage - 2: Developing the API service.


Stage 1: N-gram Extraction and Storage

Stage 1's major features include (but not limited to)

  1. To extract and store Valid N-grams and their POS tags from DBpedia.
  2. Cron jobs for the backend services.
  3. Dashboard that shows backend services' working progress.


Stage 2: The API Service.

Stage 2's major features include (but not limited to)

  1. Developing an API service.
  2. Using the stored N-grams and their POS make a service for users to get the desired information.

Frequently Asked Questions