About the Current Project
The similarity service first looks for which synset a given word belongs to in the WordNet. With the identified two synsets, the Word Similarity Calculation module uses Uniform-Cost Search to traverse the synsets in WordNet to find the shortest path. After the shortest path is found, the similarity between two words is the reciprocal of the edge-difference of the found shortest path.
A matrix represents the similarity between words in two sentences is calculated. The matrix is then sent to the Sentence Similarity Calculation module to determine the similarity between two sentences with Munkre’s Assignment Algorithm – a combinational optimization algorithm to find the optimal pairing between two sets, to match the most similar words in two sentences.
Services
Version 3.2
Demo version 3.2 includes a field to use the canonical method, which uses the root of each word (run instead of running or ran). Also, to speed up the English comparisons, costs are being precalculated. Checking this option will ignore words whose costs have not been calculated yet.
Read MoreVersion 3.0
Demo version 3.0 includes an email field to track system usage and the ability to use the N-gram/POS service, which will preprocess the sentences to extract valid N-grams and parts of speech before the similarity calculation is made.
Read MoreVersion 2.1
Demo version 2.1 accesses bridge version 2.1 so it can prepare a similarity calculation request that includes the use of Maximum Bipartite Matching Algorithm as well as the language English (en), French (fr), and Hindi (hi).
Read MoreVersion 2.0a
Demo version 2 (alpha) accesses bridge version 1 and implements maxBPM in JavaScript at client side .
Read MoreVersion 1.0
Demo version 1 accesses bridge version 1 and has no visual response but only JSON result in the console
Read MoreVersion 1.0a
Demo version 1 (alpha) accesses bridge v1.alpha and uses the extra data the bridge returns to showing progress indicator so users can know that the system is working.
Read MoreTerms of Use
The VIP Research Group is a research group led by Prof. Maiga Chang (https://www.athabascau.ca/science-and-technology/our-people/maiga-chang.html) at School of Computing and Information Systems, Athabasca University. This "multi-sentence similarity calculation web service" (https://ws-nlp.vipresearch.ca/) is one of the research group's works. The research group does have follow-up research plan to improve it and further use it in other research projects.
Almost all of Prof. Chang's works are open access (or open source). The web service (https://ws-nlp.vipresearch.ca/) is now open access and there is no plan to make it open source. The web service is open access and running on a self-sponsored server, as all of other research projects (see http://maiga.athabascau.ca/#advanced) they will be always online, improving, and accessible as long as the cost can be affordable and covered by Prof. Chang.
Of course if in any case just like the access volume of the web service becoming high or any business/commercial takes advantage of using it to make money, then the term of using the web service may look for changes; for examples, donations, personal/academic/business license and subscription modes, etc. However, it is really too early to say that.
How to Access
Downloads
LORD Moodle Plugin
This plugin determines the similarity between all the learning activities in a course and uses the similarity to configure a network graph of the activities.
SAS Moodle Plugin
The ShortAnswerSimilarity plugin extracts the text from the answer provided by teacher and from the student's response. Once the two strings are extracted, the similarity between the two multi-sentences is calculated by the VIP Research Group's multi sentence similarity calculator web service.
About Us
Our Mission
Our research aims to bring a sentence similarity service which would measure the closeness of two or more sentence or paragraph using Natural Language Processing and WordNet
Our Supervisor
Dr. Maiga Chang is a Full Professor in the School of Computing and Information Systems at Athabasca University, Canada.
Research Goal
This research designs a Word & Sentence Natural Language Processing (WS-NLP) Similarity Service that uses WordNet lexical database as the knowledge graph to calculate the similarity between words, sentences, paragraphs, and documents.
Our Team
Bhavesh GANDHI
2021 (current)
Bhavesh Gandhi is an undergraduate student. He is pursuing Electrical and Electronics Engineering from Heritage Institute of Technology, India. His research interest lies in the domain of Machine Learning and Natural Language Processing.
Yash SRIVASTAVA
2021
Yash Srivastava is an undergraduate student studying computer engineering at the University of Alberta. Yash wants to pursue a career in Software Engineering and work on projects that utilize new technologies to improve people's lives. In his spare time Yash, loves to exercise and play basketball.
Theodore KRAHN
2019-2021
Theodore Krahn (Ted Krahn) received the BSc degree in Computing and Information Systems (BSc CIS) from Athabasca University in 2018 and started his MScIS study at AU in 2019. Ted is leading developer of LORD Moodle Plug-in.
Radomir WASOWSKI
2020
Radomir Wasowski is a Computer Engineering undergraduate student at the University of Alberta, Canada. He is most interested in interdisciplinary applications of digital technology, such as NLP
Videos
Presentation Video
Live demonstrations on a 12-weeks work outcome (June 2021~August 2021). This research uses Natural Language Processing basics with DBPedia to identify the valid n-gram words and important part-of-speech tags. The research outcome implements services that can take user's requests in JSON to help them verify valid part-of-speech tags and identify valid n-grams. The research outcome involve Python, PHP, JavaScript (AJAX and JSON), and DBPedia.
- Stage - 1: Automated System to extract and store Valid N-grams and their POS tags from DBpedia.
- Stage - 2: Developing the API service.
Stage 1: N-gram Extraction and Storage
Stage 1's major features include (but not limited to)
- To extract and store Valid N-grams and their POS tags from DBpedia.
- Cron jobs for the backend services.
- Dashboard that shows backend services' working progress.
Stage 2: The API Service.
Stage 2's major features include (but not limited to)
- Developing an API service.
- Using the stored N-grams and their POS make a service for users to get the desired information.
Frequently Asked Questions
-
What can be the service used for ?
The service can be used for extracting and validating N-gram and the most frequent POS(part-of-speech) tags.
-
What does the sentence similarity service does ?
The sentence similarity actually calculates the similarity between two sentences and assign a score to the overall result.
-
How the current service is different from the other services ?
The current service uses the valid N-gram learning service to filter out the sentences, which means if there is some word which makes no sense, then it will be remove from the sentence preserving the position of the words.The service can also be useful to extract and check the grammatical correctness of the sentences (upto 4-grams).
-
How can i use this service ?
Please go to the HOW-to section and find the necessary documentation for each of the service and how to use them.
-
Can there be frequent updates in the services ?
Yes, there can be updates in the service or a new service , the webpage will also be updated with the latest information of the new service.