Voxel51
Published in

Voxel51

FiftyOne Aggregation Tips and Tricks — Nov 25, 2022

Welcome to our weekly FiftyOne tips and tricks blog where we give practical pointers for using FiftyOne on topics inspired by discussions in the open source community. In this Thanksgiving week installment, in the spirit of togetherness, we’ll cover aggregations.

Wait, What’s FiftyOne?

FiftyOne is an open source machine learning toolset that enables data science teams to improve the performance of their computer vision models by helping them curate high quality datasets, evaluate models, find mistakes, visualize embeddings, and get to production faster.

Ok, let’s dive into this week’s tips and tricks!

An Aggregations Primer

Datasets are the core data structure in FiftyOne, allowing you to represent your raw data, labels, and associated metadata. When you query and manipulate a Dataset object using dataset views, a DatasetView object is returned, which represents a filtered view into a subset of the underlying dataset’s contents.

Complementary to this data model, one is often interested in computing aggregate statistics about datasets, such as label counts, distributions, and ranges, where each Sample is reduced to a single quantity in the aggregate results.

The fiftyone.core.aggregations module offers a declarative and highly-efficient approach to computing summary statistics about your datasets and views. Continue reading for some tips and tricks to help you do just that.

One Aggregation; Multiple Datasets

If you want to compute a single aggregation on a single dataset, you can call the aggregation method directly on the dataset. For instance, to compute the Bounds aggregation on the uniqueness field, you could write:

dataset.bounds("uniqueness")

However, if you want to use the same aggregation method (on the same field) on multiple datasets or views, you can define the aggregation on its own, and then compute that aggregation using the aggregate() method:

Learn more about the aggregate() method in the FiftyOne Docs.

Multiple Aggregations

Conversely, if you want to compute multiple aggregations on the same dataset, not necessarily on the same field, you can do so efficiently by batching them in the aggregate method:

Learn more about batching aggregations in the FiftyOne Docs.

Unwinding Lists of Lists

If we want to compute aggregations that are not built into FiftyOne, such as the median, we can do so by first extracting the values for the relevant field and then applying our aggregation to this result. Due to the unstructured nature of computer vision data, where different samples may contain different numbers of detected objects, this result may be a list of lists of differing sizes.

For instance, if we get the prediction confidence values,

we can see that the first ten sublists all have different lengths by running:

print([len(p) for p in pred_confs_jagged[:10]])

If we wanted to compute the median from this jagged list of values, we would first need to flatten the list. However, FiftyOne does this for us when we pass the argument unwind = True into the values() method:

From there, we can pass the resulting flat array straight to numpy:

median_conf = np.median(pred_confs)

Learn more about unwinding and values() aggregation in the FiftyOne Docs.

Aggregations on Transformed Field Values

When we use the FiftyOne Aggregations class in conjunction with ViewField, we can easily perform aggregations over transformed field values. For instance, to compute the mean of squared prediction confidence values, we can write either of the following:

Learn more about expressions and ViewField in the FiftyOne Docs.

Going Beyond Aggregations

While aggregations over an entire Dataset or DatasetView are powerful, sometimes they are not sufficient to fully understand your model or your data. In fact, it is precisely this need for better transparency during data-model co-design that led to the creation of the FiftyOne App and FiftyOne Teams. Beyond dataset and view level aggregations, FiftyOne provides class-specific reports for classification and multi-class object detection techniques via the print_report() method.

For multi-class detection tasks, this looks like:

For a binary classification task, this looks like:

Learn more about print_report()and evaluating detection and classification tasks in the FiftyOne Docs.

What’s next?

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Jacob Marks

Machine Learning Engineer and Developer Evangelist @ Voxel51 | Stanford Theoretical Physics PhD | Ex-Google X https://www.linkedin.com/in/jacob-marks