Please use this identifier to cite or link to this item:
https://scholar.ptuk.edu.ps/handle/123456789/156
cc-by
Full metadata record
DC Field | Value | Language |
---|---|---|
dc.contributor.author | Badawi, Dima | - |
dc.date.accessioned | 2018-12-03T12:23:14Z | - |
dc.date.available | 2018-12-03T12:23:14Z | - |
dc.date.issued | 2015-06 | - |
dc.identifier.citation | Termset Selection and Weighting in Binary Text Classification | en_US |
dc.identifier.uri | https://scholar.ptuk.edu.ps/handle/123456789/156 | - |
dc.description.abstract | In this dissertation, a new framework that is based on employing the joint occurrence statistics of terms is proposed for termset selection and weighting. Each termset is evaluated by taking into account the simultaneous and individual occurrences of the terms within the termset. Based on the idea that the occurrence of one term but not the others may also convey valuable information for discrimination, the conventionally used term selection schemes are adapted to be employed for termset selection. Similarly, the weight of a given termset is computed as a function of the terms that occur in the document under concern. This weight estimation scheme allows evaluation of the individual occurrences of the terms and their co-occurrences separately so as to compute the document-specific weight of each termset. The proposed termset-based representation is concatenated with the bag-of-word approach to construct the document vectors. As an extension to the proposed scheme, the use of cardinality statistics of the termsets is also considered for termset weight computation. More specifically, the cardinality statistics of the termsets that quantifies the number of member terms that occur in the document under concern is used for termset weighting. When employing termsets of length greater than two, cardinality-based weighting is observed to provide further improvements. | en_US |
dc.language.iso | en | en_US |
dc.publisher | Eastern Mediterranean University | en_US |
dc.subject | Co-occurrence features, Cardinality statistics, Termset selection, Termset weighting, Document representation, Binary text classification. | en_US |
dc.title | Termset Selection and Weighting in Binary Text Classification | en_US |
dc.type | Thesis | en_US |
Appears in Collections: | PH.D |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
BadawiDimaدكتوراه ديما بدوي العروب.pdf | 2.76 MB | Adobe PDF | View/Open |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.