Welcome

Word Similarity algorithm for Merging Thai Herb Information from Heterogeneous Data Sources

Online Herbs Shopping Project

Word Similarity algorithm for Merging Thai Herb Information from Heterogeneous Data Sources

This paper proposes two processes fo r merging Thai Herb information obtained from heterogeneous data sources. The objective is to combine different formats of Thai herb information into one consistent representation. The processes are implemented in a Sourcing and Merging Agent (SMA) of a Multi-Agent Thai Herb Recommendation system (MA_THR). The first process aims to find and merge the same Thai herb with different names. The second process aims to find synonyms of symptoms. Experiments give 93% accuracy of merging Thai herb information using names and 97% accuracy of finding the similarity between symptoms.

Online Herbs Shopping Project

Thai herbs information can be found publicly on various data source [1-5]. However, these sources present the information in different details and formats. For example, most of the sources contain a name of a Thai herb, but some contain only a common name (or names) without providing a scientific name. More seriously, while most sources provide names of symptoms that can be treated by each Thai herb or a part of Thai herb, these lists of information (symptom names) are different. Since each symptom can be called by different names. The differences of these lists can be reduced by finding synonym of symptom’s names. This paper proposes two processes: one for merging Thai herb names and another for fm ding similarities between symptoms. We implement the proposed algorithm on our Multi-Agent Thai Herb Recommendation system (MA _ THR) [6, 7], which is briefly explained here. The MA _ THR system has components related to the proposed work as shown in figure 1. There are a number of WA agents (WA-Wrapper Agents) for retrieving Thai herb information from various databases. In addition, there is one WEA agent (WEA-Web Extraction Agents) for extracting Thai herb information from multiple websites. The information is sent to a SMA agent (SMA-Sourcing and Merging Agent) to merge into one knowledge-base, then store at a THMA (Thai Herb Management Agent). The CA (CA-Center Agent) is responsible for communication among these agents

This paper proposes the process of merging Thai herb names and finding similarity between symptoms from heterogeneous data sources. An exact string matching algorithm is used to merge Thai herb names from different sources with accuracy of 93%. To fm d similar symptoms, sub-organ similarity tables and symptom words similarity tables, together with list of symptoms affecting the same sub­organ, are used as references to calculate similarity. The symptom similarity calculation is implemented based on modified edit distance dynamic programming. When applying the algorithm to calculate 1139 symptoms (71,417 pairs), it gains accuracy of 97%. Code Shoppy

Leave a Reply

Your email address will not be published. Required fields are marked *