This post is part of our ongoing series of guest posts from the students, mentors and organization administrators who participated in Google Summer of Code (GSoC). GSoC is a program that pairs university students with mentors for a summer where they apply their computer science skills to building open source software.
Orange Data Mining is a data mining suite with visual programming and interactive data analysis at its core. Orange was developed at Bioinformatics Lab at University of Ljubljana, Slovenia, it is written mainly in Python, and you can find it hosted on GitHub.
This was our third Google Summer of Code and we were given five slots and decided to select students based on two criteria: their proposal joined with their coding skills and the importance of the project to our organization.
Great work was done over the summer and we are proud to present our students’ projects!
Recommender Systems add-on by Salva Carrion
Salva independently implemented a new Orange3 add-on for recommender systems. He developed a scripting library for collaborative filtering for the core of the add-on, which includes a number of published matrix factorization algorithms. The scripting library is then further extended to include GUI-based widgets for visual programming.
Educational add-on by Primož Godec
Primož took on a task of developing a series of educational widgets for Orange3. The end result was a full Orange3-Educational add-on with four widgets that can be used to demonstrate key data mining and machine learning procedures in the classroom. These widgets are useful for helping beginners understand the inner workings of key algorithms in data mining, and for teachers to be able to visually explain the various methods. They include interactive and step-by-step visualizations of k-means, polynomial classification, and gradient descent.
Text add-on by Aliaxey Sukharevich
Orange3-Text add-on was already an active project before GSoC, but Aliaxey took it to another level. Twitter and Wikipedia public RESTful services were introduced as widgets to allow acquisition of data from new sources. Many widgets were boosted with new functionalities and methods (e.g. HDP, LDA and LSP methods in Topic Modelling widget). Preprocessing was redesigned and reimplemented such that it now handles n-grams and POS Tagging.
CN2 Rule Induction by Matevž Kren
The goal of this project was to implement a CN2 rule induction algorithm, and Orange widgets for learning and exploration of inferred classification rules. At the heart of the project is an implementation of a scripting library, which can be easily extended with additional divide and conquer algorithms or its components.
This was a gargantuan task and Sašo handled it beautifully. The goal was to consolidate Orange data structures and management routines to support data from Pandas. Sašo redesigned Orange data management core, did a massive amount of refactoring and improvements and removed legacy and unused code. The biggest challenge was of course preserving as much compatibility with the existing Orange interaction as possible while providing full Pandas flexibility. The result is a functional Pandas-based core Orange.
All contributions were committed on GitHub (Orange3, Orange3-Text, Orange3-Recommendation and Orange3-Educational repositories) and most of them are already pip-installable. The only contribution that has not yet been merged is the migration to Pandas, which will require adaptation and careful compatibility checking of other components of the system.
We are extremely grateful to be given the chance to participate in Google Summer of Code and to have had such amazing students at our lab. We can’t wait to apply again next year!
By Ajda Pretnar, Organization Administrator for Orange