Although the analysis of data is a task that has gained the interest of the statistical community in recent years and whose familiarity with the statistical computing environment, they encourage the current statistical community (to students and teachers of the area) to complete statistical analysis reproducible by means of the tool R. However for years there has been a gap between the calculation of matrices on a large scale and the term "big data", in this work the Normalized Cut algorithm for images is applied. Despite the expected, the R environment to do image analysis is poorly, in comparison with other computing platforms such as the Python language or with specialized software such as OpenCV.
Being well known the absence of such function, in this work we share an implementation of the Normalized Cut algorithm in the R environment with extensions to programs and processes performed in C ++, to provide the user with a friendly interface in R to segment images. The article concludes by evaluating the current implementation and looking for ways to generalize the implementation for a large scale context and reuse the developed code.
Key words: Normaliced Cut, image segmentation, Lanczos algorithm, eigenvalues and eigenvectors, graphs, similarity matrix, R (the statistical computing environment), open source, large scale and big data.
The Observatory of Public Spending (or ODP, in Portuguese) is a special unit of Brazil's Ministry of Transparency, Monitoring and Office of the Comptroller-General (or CGU, in Portuguese) responsible for monitoring public spending and gathering managerial and audit information to support the work of CGU internal auditors. One of the most important themes monitored by this unit is Public Procurements and Government Suppliers which have won these procurement processes. Image analysis of many of these suppliers headquarters revealed suspicious landscapes, such as rural areas, isolated places or slums. These landscapes could be an indication of fake suppliers with poor capacity of delivering public goods and services. However, checking thousands of landscapes in order to find these fake suppliers would be a very expensive task. Our objective then is to discover what are the possible groups of scenes involving government suppliers, given that these images were not previously labeled, as automatically as possible. For that reason, we used Places CNN, a pretrained convolutional neural network for scene recognition presented by Zhou et al., which was trained on 205 scene categories with 2.5 million images, for scene recognition on Brazilian Government Suppliers.
Comprensión de un estudio realizado en la mica-epoxi para placas de circuitos. El estudio consistió en pruebas de resistencia para medir el desgaste en el tiempo del material y así determinar su tiempo de vida aproximado.
In this paper, we evaluate a baseline word embedding model for a set of clinical notes derived from patient records. For our baseline, we extract features for this embedding using the Word2Vec module from the gensim package. We also build two models, a word2vec skipgram model with negative sampling and a positive point-wise mutual information (PPMI) model by training on the processed clinical notes. Our evaluation shows that both the PPMI and the skipgram models show improved results for medically-related terms when compared with the baseline model. PPMI shows the best result out of all three models.
This is an IEEE based template that can be used for presenting your work on the Open Science Data Cloud. Use it for the PIRE Workshop challenge and other submissions such as the Supercomputing 2014 conference.