Label Studio

We provide a set for the Label Studio project (data labelling for machine learning) with an ML Backend that suggests, at a minimum, whether or not an image is plankton or detritus.

There’s an example docker-compose.yml file in this project for building and running Label Studio and its backend.

The full configuration for the running service is in this private project <https://github.com/ukceh-rse/podman-host> - please contact a member of the RSE group if you would like access.

ML Backend notes

src/label_studio_cyto_ml/model.py contains our custom model code.

It runs two models

  • A ResNet (could be any deep learning model) that extracts embeddings from an image

  • A kmeans clustering model which fits the resulting embeddings with a specific label_studio_cyto_ml

Return format

It took a bit of figuring out to get the return format right. A single prediction needs returned as an array of results, like this:

PredictionValue(result=[{“id”: int(label), “text”: “test”, “type”: “Choices”}])

The ModelResponse is then an array of these PredictionValue objects

ModelResponse(predictions=predictions)

The prediction needs a type value which internally is a control tag <https://labelstud.io/tags/choices> - many types of these for different media, our checkbox / radio buttons are Choices

The input to the annotation task looks like this (defined when setting up the project)

{‘organism_type’: {‘type’: ‘Choices’, ‘to_name’: [‘image’], ‘inputs’: [{‘type’: ‘Image’, ‘valueType’: None, ‘value’: ‘image’}], ‘labels’: [‘Not-plankton’, ‘Plankton’, ‘Debris’], ‘labels_attrs’: {‘Not-plankton’: {‘value’: ‘Not-plankton’}, ‘Plankton’: {‘value’: ‘Plankton’}, ‘Debris’: {‘value’: ‘Debris’}}}, ‘morphology’: {‘type’: ‘Choices’, ‘to_name’: [‘image’], ‘inputs’: [{‘type’: ‘Image’, ‘valueType’: None, ‘value’: ‘image’}], ‘labels’: [‘Mucilage’, ‘Flagella’, ‘Cilia’, ‘Aerotopes’, ‘Akinetes’, ‘Heterocytes’, ‘Theca/test/exoskeletal structures’, ‘Eggs’, ‘Ephippia’], ‘labels_attrs’: {‘Mucilage’: {‘value’: ‘Mucilage’}, ‘Flagella’: {‘value’: ‘Flagella’}, ‘Cilia’: {‘value’: ‘Cilia’}, ‘Aerotopes’: {‘value’: ‘Aerotopes’}, ‘Akinetes’: {‘value’: ‘Akinetes’}, ‘Heterocytes’: {‘value’: ‘Heterocytes’}, ‘Theca/test/exoskeletal structures’: {‘value’: ‘Theca/test/exoskeletal structures’}, ‘Eggs’: {‘value’: ‘Eggs’}, ‘Ephippia’: {‘value’: ‘Ephippia’}}}, ‘life_form’: {‘type’: ‘Choices’, ‘to_name’: [‘image’], ‘inputs’: [{‘type’: ‘Image’, ‘valueType’: None, ‘value’: ‘image’}], ‘labels’: [‘Unicellular’, ‘Colony’, ‘Filament’], ‘labels_attrs’: {‘Unicellular’: {‘value’: ‘Unicellular’}, ‘Colony’: {‘value’: ‘Colony’}, ‘Filament’: {‘value’: ‘Filament’}}}, ‘shape’: {‘type’: ‘Choices’, ‘to_name’: [‘image’], ‘inputs’: [{‘type’: ‘Image’, ‘valueType’: None, ‘value’: ‘image’}], ‘labels’: [‘Spiky’, ‘Round’, ‘Rod-like’], ‘labels_attrs’: {‘Spiky’: {‘value’: ‘Spiky’}, ‘Round’: {‘value’: ‘Round’}, ‘Rod-like’: {‘value’: ‘Rod-like’}}}, ‘ta’: {‘type’: ‘TextArea’, ‘to_name’: [‘image’], ‘inputs’: [{‘type’: ‘Image’, ‘valueType’: None, ‘value’: ‘image’}], ‘labels’: [], ‘labels_attrs’: {}}}

`Troubleshooting pre-annotations <https://labelstud.io/guide/troubleshooting#Pre-annotations>`

Connection to Label Studio

Each Label Studio project needs configured to use an ML backend service.

This could be our custom one or a range of off-the-shelf options (like SAM for segmentation)

  • Navigate to Project/Settings/Model

  • Add the URL referring to the container by name, as it reads in docker-compose.yml

For example, our docker-compose.yml has three services, one is named ml-backend, so this is the URL that goes in the project settings:

http://ml-backend:9090/

Label Studio analytics

We’ve had some issues with Label Studio enabling analytics by default, then page loads stalling because the analytics service is throttling requests.

As of writing this needs a build from source as well as configuration options, but should be fixed when version 1.17.1 becomes the default docker build (see this issue https://github.com/HumanSignal/label-studio/issues/6430)

git clone https://github.com/HumanSignal/label-studio.git podman build -t heartexlabs/label-studio:latest .

Label Studio Account Management

One downside with the free edition is there’s no password reset option, and the only way to do this is via the commandline. When running in podman with a sqlite backend this involves starting a shell on the container,

Open a shell in the running container:

podman exec -it label-studio bash

Use the label-studio utility to change the password

label-studio reset_password –username <username> –password <new_password> to reset your password directly. 1