In the age of Big Data, all companies want to benefit from large amounts of data. These data can help them understand their internal and external environment and anticipate associated phenomena, as the data turn into knowledge that can be used for prediction later. Thus, this knowledge becomes a great asset in companies\' hands. This is precisely the objective of data mining. But with the production of a large amount of data and knowledge at a faster pace, we are now talking about Big Data mining. For this reason, our proposed works mainly aim at solving the problem of volume, veracity, validity and velocity when classifying big data using distributed and parallel processing techniques. So, the problem that we are raising in this work is how can we make machine learning algorithms work in a distributed and parallel way at the same time without losing the accuracy of classification results? To solve this problem, we propose a system called : Dynamic Distributed and Parallel Machine Learning algorithms (DDPML). To build it, we divided our work into two parts. In the first, we propose a distributed architecture that is controlled by Map-Reduce algorithm which in turn depends on random sampling technique. So, the distributed architecture that we designed is specially directed to handle big data processing that operates in a coherent and efficient manner with the sampling strategy proposed in this work. This architecture also helps us to actually verify the classification results obtained using the representative learning base (RLB). In the second part, we have extracted the representative learning base by sampling at two levels using the stratified random sampling method. This sampling method is also applied to extract the shared learning base (SLB) and the partial learning base for the first level (PLBL1) and the partial learning base for the second level (PLBL2). Our experimental results show the efficiency of our solution that we provided without significant loss of the classification results. Thus, in practical terms, our system DDPML is generally dedicated to big data mining processing, and works effectively in distributed systems with a simple structure, such as client-server networks.
This paper addresses how natural language processing (NLP) works with deep learning models to understand the meaning of words in text. In this work, vector space models representing words into continuous vector representations are employed for the identification of semantic and syntactic similarity between words in the text articles. The model is trained and evaluated on unlabeled news articles [29], [30]. The model is implemented with continuous bag-of-words (CBOW) and skip-gram (SG) architectures with negative sampling (NES) and hierarchical softmax (HS) techniques. The model is evaluated on word similarity task, analogy tasks and vector compositionality to identify the linear structure of word vectors representations. Computationally, the cost of training time and required memory for two architectures trained with two techniques are compared. It is observed that architectures trained with HS are expensive to train and more memory intensive than NES. Moreover, the findings of the evaluations on different tasks are presented representing both semantic and syntactic regularities in word embeddings.
Classical statistics is used so far deals with crisp or determinate types of data only but it fails if there is uncertainty or indeterminacy in data. Neutrosophic statistics is a generalization of classical as well as fuzzy statistics and the best substitute for classical as well as fuzzy statistics to dealing with such uncertainty or indeterminacy in data. In this manuscript, we have proposed a neutrosophic generalized class of estimators for estimating the population means under indeterminacy using neutrosophic subsidiary information, and neutrosophic observation is presented as with . We have also given neutrosophic ratio and product types estimators which are the same as the member class of estimators from the proposed estimator. The expressions for bias and mean square error (MSE) of the proposed and its member estimators have been derived to the first order of approximation. The comparison of the proposed estimator made over its member estimators and unbiased estimator through MSE criterion. Theoretical condition and supremacy of our proposed estimator over its member estimators and unbiased estimator have also been obtained. The empirical study on neutrosophic solar energy data and Monte-Carlo simulation study through R Studio have also delineated the soundness of our proposed estimator.
An overwhelming focus on the development of smart city services such as intelligent transportation, healthcare notification, air/water quality analysis, and so forth, has steered application developers away from the utilization of cloud-alone infrastructures to a cloud-edge combined infrastructure. While edge nodes laid low latency and improved data privacy in edge-enabled applications, there are differing perceptions for adopting efficient learning algorithms in terms of energy efficiency and collaborative learning aspects. This paper proposed a Blockchain Enabled Energy Aware Prediction Plugins (BEE-APP) framework for smart city applications. BEE-APP framework is multi-faceted: i) it enables edge nodes to collaboratively validate the efficient learning algorithms using blockchains and ii) it suggests efficient learning algorithms using Minikube plugins for edge nodes. Experimental results revealed the application of BEE-APP framework to predict water quality parameters of different locations in an energy efficient manner.
This paper uses finite element method to create 2D coupling turbulence model of investor sentiment for industry, and computes and analyzes flowing velocity and direction and space distribution of “informational cycle cascade” among industries, which can be used to describe and express the running theory of investor sentiment intuitively. There are five big up or down phases of investor sentiment contagion for industry average at whole period. Two typical phases of industry average are chosen to compute and analyze the flowing velocity and direction and space distribution of “informational cycle cascade”. Bad information contagion under the influence of global finance crisis and good information contagion under the influence of economic stimulating plan of big scale show that finite element is a good tool to express the running theory of investor sentiment contagion intuitively. Bad/good information contagion of “informational cycle cascade” is clockwise/anticlockwise and information contagion is multi directions at the same time.