# k-means clustering algorithm implementation written in Go

### Related tags

Machine Learning hacktoberfest

# kmeans

k-means clustering algorithm implementation written in Go

## What It Does

k-means clustering partitions a multi-dimensional data set into `k` clusters, where each data point belongs to the cluster with the nearest mean, serving as a prototype of the cluster.

## When Should I Use It?

• When you have numeric, multi-dimensional data sets
• You don't have labels for your data
• You know exactly how many clusters you want to partition your data into

## Example

```import (
"github.com/muesli/kmeans"
"github.com/muesli/clusters"
)

// set up a random two-dimensional data set (float64 values between 0.0 and 1.0)
var d clusters.Observations
for x := 0; x < 1024; x++ {
d = append(d, clusters.Coordinates{
rand.Float64(),
rand.Float64(),
})
}

// Partition the data points into 16 clusters
km := kmeans.New()
clusters, err := km.Partition(d, 16)

for _, c := range clusters {
fmt.Printf("Centered at x: %.2f y: %.2f\n", c.Center[0], c.Center[1])
fmt.Printf("Matching data points: %+v\n\n", c.Observations)
}```

## Complexity

If `k` (the amount of clusters) and `d` (the dimensions) are fixed, the problem can be exactly solved in time O(ndk+1), where `n` is the number of entities to be clustered.

The running time of the algorithm is O(nkdi), where `n` is the number of `d`-dimensional vectors, `k` the number of clusters and `i` the number of iterations needed until convergence. On data that does have a clustering structure, the number of iterations until convergence is often small, and results only improve slightly after the first dozen iterations. The algorithm is therefore often considered to be of "linear" complexity in practice, although it is in the worst case superpolynomial when performed until convergence.

## Options

You can greatly reduce the running time by adjusting the required delta threshold. With the following options the algorithm finishes when less than 5% of the data points shifted their cluster assignment in the last iteration:

`km, err := kmeans.NewWithOptions(0.05, nil)`

The default setting for the delta threshold is 0.01 (1%).

If you are working with two-dimensional data sets, kmeans can generate beautiful graphs (like the one above) for each iteration of the algorithm:

`km, err := kmeans.NewWithOptions(0.01, kmeans.SimplePlotter{})`

Careful: this will generate PNGs in your current working directory.

You can write your own plotters by implementing the `kmeans.Plotter` interface.

• #### Evaluation and assessment

Nice work on the kmeans implementation! Do you plan on adding functions for evaluating the clusters? Here is a wikipedia article that has some metrics that may be worth adding.

https://en.wikipedia.org/wiki/Cluster_analysis#Evaluation_and_assessment

opened by vosmith 6
• #### Add benchmarks for partitioning with different Cluster sizes

Based on issue #2.
I'm only benchmarking `Kmeans.Partition` in this PR. I consider also adding cases for `Clusters.Nearest` and `Points.Mean` but `Kmeans.Partition` already exercises those code paths. The choice of cluster sizes in these benchmarks -- 32, 512, 4096, 65536 -- is also somewhat arbitrary but I'm open to changing it if you can think of some more representative values.

opened by philangist 5
• #### feat: less dependencies

go-colorful is only used in the example packages, and its not needed anywhere else.

previous versions of go-colorful were also bringing a sql mock package as a dependency, which has been fixed... this also upgrades that dep, fwiw.

we can remove that dep by making the examples folder another go module, which is what this pr does.

Signed-off-by: Carlos A Becker [email protected]

opened by caarlos0 2
• #### Fix indirect dependency go-sdk

Currently (6 dec 2020) this module cannot be imported in a Go 1.15.5 based project using modules because the `go-sdk` indirect dependency (which I was not able to find as a dependency of this module's dependencies) does not appear to have a `v2.0.0` version, making `go mod` and `go get` refuse to work properly, even when generting a new `go.mod` for my project (paletter), whith the following error:

``````go: github.com/muesli/[email protected] requires
github.com/blend/[email protected]+incompatible: reading github.com/blend/go-sdk/go.mod at revision v2.0.0: unknown revision v2.0.0
``````

As a sidenote, `go-chart` v2 as a module has to imported as `github.com/wcharczuk/go-chart/v2`

bug
opened by Baldomo 1

In certain cases, we want to apply the algorithm to get the clusters but, once we have the observations, we want other information associated with the observations of the clusters.

Imagine that we have a lot of circles. Each circle has a unique colour and a radius. We want to apply clustering by the radius. In the end, we want to check in which cluster is the green circle.

Is it possible?

opened by luisfmelo 1
• #### Issue in plotter.go file

When I install lantern I get an error. I think in plotter.go, where the chart.Style struct is used, the first attribute must be changed because the "Show" attribute changed to "Hidden" in wcharczuk/go-chart.

opened by PhilVince96 1

# github.com/muesli/kmeans

../../../../muesli/kmeans/plotter.go:45:5: unknown field 'Show' in struct literal of type chart.Style ../../../../muesli/kmeans/plotter.go:58:4: unknown field 'Show' in struct literal of type chart.Style ../../../../muesli/kmeans/plotter.go:73:5: unknown field 'Show' in struct literal of type chart.Style ../../../../muesli/kmeans/plotter.go:78:5: unknown field 'Show' in struct literal of type chart.Style

opened by evanoberholster 1
• #### Random_state: Control of random state

Thank you very much for your work, I would like to ask if there is a parameter in the algorithm similar to sklearn random_state in Python, which can control the random state, because my output results are always inconsistent due to the randomness of the initial cluster center, thank you for taking the time to see my question!

opened by slomo28 1
• #### breaking: add seed to NewWithOptions

hello @muesli, I tried many approaches but did not see an obvious way to keep backwards compatibility, so I am proposing this breaking change. I am open to other approaches.

I was a bit torn on this one, because, although there is a constructor, the seed of the default PRNG is set each time that Partition is called (via clusters.new), so I wanted a way to keep the existing semantics. I came out with the idea of passing the seed to the NewWithOptions ctor and to use the zero value to mean default behavior.

If a client explicitly sets the seed, then each time it calls Partition() on a given Kmean struct, the PRG will be reset to that seed.

I have to say I am not completely convinced. Opinions?

Together with https://github.com/muesli/clusters/pull/1, closes #19.

opened by marco-m 0
• #### Feature Request: Allow deterministic random seed value on initialization.

Hi,

Currently the initialization of Clusters slice randomly seeds their initial positions.

I would like to request for a new initialization method with option to specify the seed value in order to obtain a deterministic result from the Partition function.

Thank you.

enhancement
opened by c2wei 3
• #### Feature Request: Allow observations with gaps

Hi,

do you mind if we add the option to your kmeans to ignore gaps in observations.

The main issue we have is: We have observations with a few hundred dimensions. Unfortunately, some of the observations have gaps but we still want to consider those observations. Currently, we just fill gaps with 0 what is actually not the right way because it falsifies the result. What we actually would like to have is, to ignore gaps, so we don't consider the particular dimension for that particular observation at all.

Does this make sense to you?

enhancement
opened by guenhter 1
• #### All observations the same point

Hi, I don't know if this is a bug but I think that it should not be the expected outcome.

If I have, for example, 10 unidimensional points with the same value (example: 5), and ask for 4 clusters (k = 4), the result is:

``````Cluster 0 with 10 data points
Cluster 1 with 1 data points
Cluster 2 with 1 data points
Cluster 3 with 1 data points
``````

So, We have in fact 4 clusters but there are 3 extra data points that may be part of more than 1 cluster.

The code you can use to simulate:

``````package main

import (
"fmt"
"github.com/muesli/clusters"
"github.com/muesli/kmeans"
)

func main() {
const k = 4

km := kmeans.New()
var o clusters.Observations
for i := 0; i < 10; i++ {
o = append(o, clusters.Coordinates{5})
}
c, err := km.Partition(o, k)
if err != nil {
fmt.Printf("error: %s", err.Error)
return
}

for i, cluster := range c {
fmt.Printf("\nCluster %v with %v data points", i, len(cluster.Observations))
}
}
``````
bug
opened by luisfmelo 1

## What's Changed

• ci: update lint workflows by @muesli in https://github.com/muesli/kmeans/pull/23
• feat: less dependencies by @caarlos0 in https://github.com/muesli/kmeans/pull/22

## New Contributors

• @caarlos0 made their first contribution in https://github.com/muesli/kmeans/pull/22

Full Changelog: https://github.com/muesli/kmeans/compare/v0.3.0...v0.3.1

Source code(tar.gz)
Source code(zip)

## Fixes

• Ensure another iteration after randomly assigning a point to a cluster
• Never panic in `Plot`, return an error instead
• Handle errors calling `Plot`

## Changes

• Prefix plot filename with current k
Source code(tar.gz)
Source code(zip)
• #### v0.1.0(Jun 2, 2018)

###### Christian Muehlhaeuser
Geek, Gopher, Software Developer, Maker, Opensource Advocate, Tech Enthusiast, Photographer, Board and Card Gamer
###### a simple & tiny scrapy clustering solution, considered a drop-in replacement for scrapyd

scrapyr a very simple scrapy orchestrator engine that could be distributed among multiple machines to build a scrapy cluster, under-the-hood it uses r

50 Nov 24, 2021
###### A native Go clean room implementation of the Porter Stemming algorithm.

Go Porter Stemmer A native Go clean room implementation of the Porter Stemming Algorithm. This algorithm is of interest to people doing Machine Learni

183 Jan 3, 2023
###### Golang implementation of the Paice/Husk Stemming Algorithm

##Golang Implementation of the Paice/Husk stemming algorithm This project was created for the QUT course INB344. Details on the algorithm can be found

29 Sep 27, 2022
###### Fast (linear time) implementation of the Gaussian Blur algorithm in Go.

Song2 Fast (linear time) implementation of the Gaussian Blur algorithm in Go.

52 Oct 25, 2022
###### a* pathfinding algorithm written in go

astar a* (a-star) pathfinding algorithm written in go Wikipedia: EN: A* search algorithm DE: A*-Algorithmus Install go get github.com/jpierer/[email protected]

26 Mar 21, 2022
###### Genetic Algorithm and Particle Swarm Optimization

evoli Genetic Algorithm and Particle Swarm Optimization written in Go Example Problem Given f(x,y) = cos(x^2 * y^2) * 1/(x^2 * y^2 + 1) Find (x,y) suc

26 Dec 22, 2022
###### Golang Genetic Algorithm

goga Golang implementation of a genetic algorithm. See ./examples for info on how to use the library. Overview Goga is a genetic algorithm solution wr

177 Dec 19, 2022
###### An iterative algorithm to generate high-quality triangulated images.

An iterative algorithm to generate high quality triangulated images. Introduction Triangula uses a modified genetic algorithm to triangulate images. I

3.8k Dec 29, 2022
###### An implementation of Neural Turing Machines

Neural Turing Machines Package ntm implements the Neural Turing Machine architecture as described in A.Graves, G. Wayne, and I. Danihelka. arXiv prepr

398 Sep 13, 2022
###### Implementation of E(n)-Equivariant Graph Neural Networks, in Pytorch

EGNN - Pytorch Implementation of E(n)-Equivariant Graph Neural Networks, in Pytorch. May be eventually used for Alphafold2 replication.

257 Dec 23, 2022
###### A high performance go implementation of Wappalyzer Technology Detection Library

wappalyzergo A high performance port of the Wappalyzer Technology Detection Library to Go. Inspired by https://github.com/rverton/webanalyze. Features

378 Jan 8, 2023
###### Go implementation of the yolo v3 object detection system

Go YOLO V3 This repository provides a plug and play implementation of the Yolo V3 object detection system in Go, leveraging gocv. Prerequisites Since

58 Dec 14, 2022
###### Golang k-d tree implementation with duplicate coordinate support

Golang k-d tree implementation with duplicate coordinate support

46 Nov 9, 2022
###### Genetic Algorithms library written in Go / golang

Description Genetic Algorithms for Go/Golang Install \$ go install git://github.com/thoj/go-galib.git Compiling examples: \$ git clone git://github.com

193 Sep 27, 2022
###### Neural Networks written in go

gobrain Neural Networks written in go Getting Started The version 1.0.0 includes just basic Neural Network functions such as Feed Forward and Elman Re

531 Dec 20, 2022
###### A recommender system service based on collaborative filtering written in Go

Language: English | 中文 gorse: Go Recommender System Engine Build Coverage Report GoDoc RTD Demo gorse is an offline recommender system backend based o

6.4k Dec 29, 2022
###### A Naive Bayes SMS spam classifier written in Go.

Ham (SMS spam classifier) Summary The purpose of this project is to demonstrate a simple probabilistic SMS spam classifier in Go. This supervised lear

13 Sep 9, 2022
###### A simple utility, written in Go, for interacting with Salesforce.

Salesforce CLI A simple utility, written in Go, for interacting with Salesforce. Currently only specific functionality is implemented, and the output

0 Dec 14, 2021
###### A simple yet customisable program written in go to make hackerman-like terminal effects.

stuntman a simple program written in go to make you look like a hackerman Demo stuntman -binar -width 90 -color cyan stuntman -text -width 90 -vertgap

10 Aug 4, 2022