INFORMATICA JOURNAL

Title	Selective Oversampling Approach for Strongly Imbalanced Data
Paper ID	JnuQP
Keywords	imbalanced data, oversampling, outlier detection, SMOTE
Abstract	Read More... Challenges posed by imbalanced data are encountered in many real-world applications. One of the possible approaches to improving the classifier performance on imbalanced data is oversampling. In this paper, we propose the new selective oversampling approach (SOA) that first isolates the most representative samples from minority classes by using outlier detection technique and then utilize these samples for synthetic oversampling. We show that the proposed approach improves the performance of two state-of-the-art oversampling methods, namely, synthetic minority oversampling technique and adaptive synthetic sampling. The prediction performance is evaluated on two synthetic datasets and one real-world bankruptcy dataset, and the proposed SOA methods always achieved the same or better performance than other considered existing oversampling methods.
Access Full Text (PDF)

Title	A Novel Integrated Hybrid MCDM Approach for Logistics Performance Index
Paper ID	s88bM
Keywords	logistics performance evaluation; multi-criteria decision making; group decision making; fuzzy logic, pythagorean fuzzy sets
Abstract	Read More... There is no doubt that technological changes and globalization have increased the importance of the logistics industry. The effectiveness of logistics services is directly related to expanding the trade network between countries, increasing foreign direct investments and growing economy. The World Bank evaluates the logistics competitiveness of the countries by using Logistics Performance Index (LPI). However, logistics performance evaluation includes multiple criteria and a realistic assessment, such as multi-criteria decision making (MCDM) technique is needed. In this study, comparative analysis of hybrid MCDM methods are utilized to evaluate the logistics performance of 160 OECD countries by using group decision making approach. The importance level of the logistics performance criteria within the scope of the study are calculated by using AHP, Fuzzy AHP and PFAHP methods and the logistics performances of the countries are analyzed with TOPSIS, VIKOR and CODAS methods. Finally, Borda Count Method (BCM) were implemented to results of different hybrid MCDM methods for final rankings.
Access Full Text (PDF)

Title	miRNA Identification from Jatropha curcas L. whole genome shotgun assembly sequences
Paper ID	aEUfD
Keywords	Jatropha curcas, miRNA, c-mii, PSRNA Target
Abstract	Read More... miRNAs are 20-25 nucleotide in length, endogenous in origin, and noncoding RNA molecules found in all the eukaryotic organism and takes part in the regulation of gene expression. Jatropha curcas L. oil has drawn attention as one of the most potential biofuels that can be a substitute of never-ending fuel crisis. Six miRNA candidates, each belonging to different miRNA families were predicted after screening of 41,790 whole genome shotgun assembly sequences by using the comparative genomics approach. The targets of identified miRNA were also predicted, the predicted target genes are belonging to diverse functions like in metabolism, transport, and genes involved in fatty acid metabolism. The predicted miRNAs show their targets which involve in the fatty acid metabolism, like miR414 targets the gene of acetyl-CoA carboxylase and O-acyltransferase WSD1, miR846 targets gene of FAD omega-3 fatty acid, FAD plastid fatty acid desaturase, miR5658 have the target gene long chain acyl-CoA synthetase 1, and miR407 have the target gene Trigalactosyldiacylglycerol 2, miR2938 GDSL esterase/lipase, glycolipid transfer protein 1.
Access Full Text (PDF)

Title	A Unified Entropic Pricing Framework of Option: Using Cressie-Read Family of Divergences
Paper ID	NyQa6
Keywords	Unified valuation framework, Generalized entropic pricing, Cressie-Read divergence, Risk-neutral distribution, Risk-neutral moment
Abstract	Read More... The entropy valuation of option (Stutzer, 1996) provides a risk-neutral probability distribution (RND, an equivalent martingale measure) as the pricing measure by minimizing the Kullback–Leibler (KL) divergence between the empirical probability distribution and its risk-neutral counterpart. This article establishes a unified entropic pricing framework by developing a class of generalized entropy pricing models based upon Cressie-Read (CR) family of divergences (nesting KL). The main contributions lie in: (1) this unified framework can readily incorporate a set of informative risk-neutral moments (RNMs) of underlying return from the option market which accurately captures the characteristics of the underlying distribution; (2) the classic KL-based entropy pricing model is extended and a unified entropic pricing framework is constructed upon a family of CR divergences. For each of the proposed family of models under the unified framework, the optimal RND is derived by employing the dual method. Simulations show that, compared to the true price, each model of the proposed family can produce high accuracy for option pricing. Meanwhile, the pricing biases among the models are different, and we hence conduct theoretical analysis and experimental investigations to explore the driving causes.
Access Full Text (PDF)

Title	Enhance the predictive accuracy of default credit cardholders using machine-learning techniques
Paper ID	ExdCT
Keywords	Credit card default. Machine learning. K-means algorithm. Artificial Neural Network
Abstract	Read More... In today\'s data and digital age, the credit card has become the most well-known mode of payment for both online and normal purchases. A credit card may be a flexible tool by which you will use bank money for a brief period. However, with the expanding use of credit card, the credit card defaults have also increased. So, enhance the studies of prediction credit card default are a very important issue for any financial organization. By analyzing the dataset and comparing the results of another researches, there are some default customers classified as non-default ones and vice versa. Clustering the dataset into two clusters (default and non-default customers) by unsupervised machine-learning technique after that use the results as a new label (target). After that applying the prediction model by Artificial Neural Network on the dataset with the original label and the new label. Finally, compare the results using seven accuracy metrics to get the best model. The update in the prediction model with the new label achieves the best accuracy ratio of 99.7%.
Access Full Text (PDF)