Presentation at COPA 2019

Ola Spjuth, Co-PI in the HASTE project, presented two accepted HASTE-papers at the [8th Symposium on Conformal and Probabilistic Prediction with Applications](http://clrc.rhul.ac.uk/copa2019) in Varna, Bulgaria on 9-11 Sept 2019. The two papers below are now published in [Proceedings of Machine Learning Research (PMLR) volume 105](https://proceedings.mlr.press/v105/).

Paper 1: Split Knowledge Transfer in Learning Under Privileged Information Framework

Gauraha, N., Söderdahl, F. and Spjuth, O.
Split Knowledge Transfer in Learning Under Privileged Information Framework. 
Proceedings of Machine Learning Research (PMLR). 105, 43-52. (2019).
ABSTRACT
Learning Under Privileged Information (LUPI) enables the inclusion of additional (privileged) information when training machine learning models, data that is not available when making predictions. The methodology has been successfully applied to a diverse set of problems from various fields. SVM+ was the first realization of the LUPI paradigm which showed fast convergence but did not scale well. To address the scalability issue, knowledge transfer approaches were proposed to estimate privileged information from standard features in order to construct improved decision rules. Most available knowledge transfer methods use regression techniques and the same data for approximating the privileged features as for learning the transfer function. Inspired by the cross-validation approach, we propose to partition the training data into K folds and use each fold for learning a transfer function and the remaining folds for approximations of privileged features—we refer to this as split knowledge transfer. We evaluate the method using four different experimental setups comprising one synthetic and three real datasets. The results indicate that our approach leads to improved accuracy as compared to LUPI with standard knowledge transfer.

Paper 2: Combining Prediction Intervals on Multi-Source Non-Disclosed Regression Datasets

Spjuth O., Brännström R.C., Carlsson L. and Gauraha, N.
Combining Prediction Intervals on Multi-Source Non-Disclosed Regression Datasets.
Proceedings of Machine Learning Research (PMLR). 105, 53-65. (2019).
ABSTRACT
Conformal Prediction is a framework that produces prediction intervals based on the output from a machine learning algorithm. In this paper, we explore the case when training data is made up of multiple parts available in different sources that cannot be pooled. We here consider the regression case and propose a method where a conformal predictor is trained on each data source independently, and where the prediction intervals are then combined into a single interval. We call the approach Non-Disclosed Conformal Prediction (NDCP), and we evaluate it on a regression dataset from the UCI machine learning repository using support vector regression as the underlying machine learning algorithm, with a varying number of data sources and sizes. The results show that the proposed method produces conservatively valid prediction intervals, and while we cannot retain the same efficiency as when all data is used, efficiency is improved through the proposed approach as compared to predicting using a single arbitrarily chosen source.

Ankit Gupta joins HASTE team as PhD student

We welcome Ankit Gupta as new PhD Student in the Wählby Lab at the Department of Information Technology, Uppsala University.

Ankit obtained his Bachelor’s in Electrical Engineering at Indian Institute of Technology Indore in 2014. Then, he completed his Masters in Medical Imaging and Informatics at Indian Institute of Technology Kharagpur in 2017. Before moving to Uppsala, he was employed as Research Engineer at the University of Bern where he worked on developing a video-based instrument tracking system in stereoscopic laparoscopic surgery.

About the PhD project within HASTE:  

Within the project, he will work on developing measurements for the early detection of informative data from large-scale spatial and temporal experiments.

Successful HASTE ‘all hands’ at Uppsala (Nov 7-9)

Johan makes a start on the fika…
Everyone presented their latest work, and discussed the latest image datasets from AstraZeneca and Vironova. During the software workshop session, we discussed linking the HASTE cloud pipeline to the Vironova MiniTEM.

Thanks to: Carolina Wählby, Ola Spjuth, Andreas Hellander, Ida-Maria Sintorn, Alan Sabirsh, Ernst Ahlberg Helgee, Johan Karlsson, Håkan Wieslander, Philip Harrison, Salman Toor, Ben Blamey, Håkan Öhrn, Markus M. Hilscher, Niharika Gauraha, Magnus Larsson, Oliver Stein, Andy Ishak

Oliver Stein joins the HASTE team to work on intelligent operators placement and auto-scaling in streaming frameworks

Oliver’s MSc thesis will investigate intelligent ways to manage and position docker containers in a VM environment, in order to improve efficiency in physical resource usage and maintain performance. The implementation of such a controller system will be developed in coordination with the HarmonicIO streaming framework used in HASTE, which will help the automatic scaling of containers working in the system as well as evaluate the design with a real use case.

Phil Harrison joins the HASTE team to work on predictive modeling with confidence

We welcome Phil Harrison as new PhD Student in the Spjuth lab. Phil obtained his first PhD in marine biology in 2006 studying the population dynamics of grey seals. Between 2006-2016 he undertook several research projects modelling wildlife populations and analysing trends in biodiversity. In the HASTE project, Phil  will develop machine learning methods for online, large-scale analysis of microscopy image data based on statistical earning including e.g. conformal prediction and probabilistic prediction.

Ben Blamey joins the team to work on intelligent cloud services

We are nearing the end of an intensive recruitment period, looking for excellent established and emergent scientists to help us realize the goals of this interdisciplinary project.

This week we are very pleased to welcome Dr. Ben Blamey to the team. He will work in the Hellander lab, in close collaboration with Dr.  Salman Toor, and focus on computer science challenges in designing and developing smart and efficient systems for managing scientific data, and image data in particular, in distributed computing infrastructure such as  hybrid and fog cloud.

With a background on research in machine learning, natural language processing and in development of services in cloud infrastructure both in academia and in industry, Dr. Blamey brings critical experience to the team.

In the featured image Dr. Blamey (right) is busy discussing a potential design of an intelligent system to manage information hierarchies in distributed environments with Dr. Toor (left).

Håkan Wieslander starts a PhD position

We are happy to welcome Håkan Wieslander to the team and to PhD education at the department of Information Technology, Uppsala University!

Håkan grew up in Lund, Sweden and moved to Uppsala 2011 to study Engineering Physics. In 2017 he obtained a masters degree in computational science. The MSc thesis was about classification of malignant cells using deep learning. 

About the PhD project within HASTE:  

Collection of large amounts of data often results in high-quality, highly informative data intermixed with data that is either of poor quality or of little interest in relation to the question at hand. Wieslander’s thesis work will focus on development of computationally inexpensive measurements that will identify non-informative data early on in the analysis process; either online at data collection, or off-line prior to full data analysis. The challenge is to use minimal computational time and power to extract a broad range of informative measurements from spatial-, temporal-, and multi-parametric image data, useful as input for conformal predictions and efficient enough to work well in a streaming setting. 

Visit to AstraZeneca to learn about high-content imaging workflows

Lovisa Lugnegård, working with Andreas Hellander and Carolina Wählby, is working towards a cloud-based simulator of the data generation of high-content imaging experiments.  She recently visited Alan Sabirsh and Johan Karlsson at AstraZeneca to learn about their microscopy pipelines and about what parameters are important for the rate of data produced in experiments, as well as to collect an example dataset to drive her development.

Johan teaches Lovisa about their Image Express microscope, to figure out how different settings affect the resulting images.