Concurrency Lab examples on Go

Overview

Concurrency Lab 1

If you're stuck look at examples on Go by Example

Using the lab sheet

There are two ways to use the lab sheet, you can either:

Each question is rated to help you balance your work:

  • 🔴 ⚪ ⚪ ⚪ ⚪ - Easy, strictly necessary.
  • 🔴 🔴 ⚪ ⚪ ⚪ - Medium, still necessary.
  • 🔴 🔴 🔴 ⚪ ⚪ - Hard, necessary if you're aiming for higher marks.
  • 🔴 🔴 🔴 🔴 ⚪ - Hard, useful for coursework extensions.
  • 🔴 🔴 🔴 🔴 🔴 - Hard, beyond what you need for any part of the coursework.

Question 1 - Median Filter 🔴 🔴 ⚪ ⚪ ⚪

This lab has been designed to take two weeks. You are encouraged to focus on Q1 in the first week, and Q2 in the second week.

Open the images in the filter directory. As you can see, they all have 'salt and pepper' noise. Now open medianFilter.go. It's a single-threaded implementation of a Median Filter, which removes the salt and pepper noise. It runs through the image pixel by pixel and replaces each value with the median of neighbouring values. In this implementation, the algorithm has hardcoded radius 2 and therefore it looks at 24 neighbours (as well as its own value).

Question 1a

Usage of ./filter:
  -in string
    	Specify the input file. (default "ship.png")
  -out string
    	Specify the output file. (default "out.png")
  -threads int
    	Specify the number of worker threads to use. (default 1)

Read the source code, run the filter on the provided images and verify that the noise has been removed.

Make sure you run this program using either go build or go run. IDE autorun tools may not read the image files correctly.

Question 1b

The median filter is an example of a problem that is "embarrassingly parallelisable" Consider having 4 worker threads. We could split the image into 4 parts and ask each worker to process their chunk in parallel to other workers. At the end, we can collect the result and output the final image.

Your task will be to write code that supports 2, 4, 8 and 16 worker threads.

Sending

The first problem we encounter when parallelising is how to send the workers their respective chunks of the image:

We could use channels of type chan [][]uint8 to send parts of the image to the worker as pure 2D slices. For example using the notation image[0:128]. This is not a good solution. Recall from lab 1 that passing a slice to a function simply passes a pointer. If we passed the same slice to multiple workers we could end up concurrently modifying the slice causing an incorrect resulting image.

We could use channels of type chan uint8 to pass uint8 values one by one, rather than slices. While this is a valid and safe solution, it isn't particularly fast.

Instead, today we will explore how to create a closure that will make our slice immutable.

The variable immutableData in the function filter() is a closure. It is not a slice or any other data type. The function makeImmutableMatrix(...) returns a function. It stores a reference to a 2D slice and given y and x coordinates it returns a uint8 from the slice that it wraps. Such a function with hidden state is called a closure. This mechanism is similar to an object with a getter in java.

The use of a closure means that the slice it wraps effectively becomes immutable. As a programmer, you now have no direct access to the pointer and therefore no way of modifying the slice. This will allow us to pass the closure to multiple goroutines without causing any potential race conditions since concurrent read operations are perfectly safe - concurrent writes or reads when a write may be happening almost always aren't.

Receiving

The second problem is receiving and reconstructing the image back in the function filter():

We could use channels of type chan uint8 to pass uint8 values one by one. While this is a valid and safe solution, it isn't particularly fast.

We could use a channel of type chan func(y, x int) uint8 to send back a closure. However, to put together the final image in a single 2D slice we will need to use append, which is not supported by a closure. We would, therefore, end up extracting the uint8 values one by one which is slow.

Our solution will use a channel of type chan [][]uint8 to send the resulting image back to filter(). We will send a slice (~pointer) over a channel, but in this case, there are no race conditions, because the worker exits immediately after sending the slice. This does not invalidate the memory (like it could in C). As a result, the slice will only be owned by a single goroutine (the filter() one) and there will be no race conditions.

Task

Start parallelising the median filter by creating a worker(...) function. Given a closure and some y and x bounds it should apply the median filter between these bounds and 'return' the newly created slice.

Hint 1

The signature of the worker function could be:

func worker(startY, endY, startX, endX int, data func(y, x int) uint8, out chan<- [][]uint8) {

}
Hint 2

You only need two lines of code in the worker. One to execute the median filter and one to send the resulting slice back on the out channel.

Question 1c

Now that we've created a worker we need to change how filter() works. It needs to distribute the image between the correct number of workers, wait for them to finish, and then reconstruct the image in a single 2D slice so that it can be saved to a file.

Change filter() so that if more than 1 worker is requested:

  • It starts the requested number of workers with the go keyword (using a for loop).
  • It collects the resulting parts into a single 2D slice (using a for loop and append).

Your solution needs to work with 2, 4, 8 and 16 workers. You only need to support the ship.png input image.

Hint 1

You may find it helpful to start by hardcoding 4 worker threads - and then expanding to support 2, 8 and 16.

Hint 2

Each worker needs a channel to send their output. In case of 4 workers, you need to create a slice of 4 channels - one for each worker.

You need to make a slice of type []chan [][]uint8 and then, in a for loop, make individual channels of type chan [][]uint8.

Hint 3

In case of 4 workers, you will need to start 4 worker goroutines using the go command. For an image of size 512x512 (such as ship.png) they need to work on following y-coordinates:

  • Worker 1: 0-128
  • Worker 2: 128-256
  • Worker 3: 256-384
  • Worker 3: 384-512

x-coordinates would be 0-512 for all workers

(The ranges here are same as for the slices - e.g for range 128-256, 128 is included but 256 isn't.)

Hint 4

To reconstruct the image in a single 2D slice you can collect all the parts from the workers and use append to attach the matrices together. E.g.:

newData = append(newData, part...)

Where both newData and part are 2D slices. Note that ... is part of the syntax.

Run the filter with different values for the -threads flag and make sure the resulting images are all correct. They should look like this:

Parallel median filter result

Question 1d

When parallelising, our main aim is to make the processing faster. Using benchmarks you can measure just how fast your program is. A benchmark can be considered to be a type of test. In a unit test the result is pass or fail. In a benchmark, the result is the time taken to execute it. Unit tests allow us to check correctness, benchmarks can be used to measure and compare performance.

There is one benchmark function defined for you in medianFilter_test.go. To run this benchmark in Go use the command go test -bench .. Example results for the median filter are below:

BenchmarkFilter/1_workers-8         	       3	 377014480 ns/op
BenchmarkFilter/2_workers-8         	       5	 218908324 ns/op
BenchmarkFilter/4_workers-8         	       7	 144866283 ns/op
BenchmarkFilter/8_workers-8         	       9	 124873239 ns/op
BenchmarkFilter/16_workers-8        	       9	 121746826 ns/op

The benchmark function we defined runs 5 sub benchmarks. Each line of the output represents one sub-benchmark. For example:

BenchmarkFilter/4_workers-8         	       7	 144866283 ns/op
  • BenchmarkFilter is the name of the main benchmark function.
  • 4_workers is the name of the sub-benchmark. Our benchmark function runs 5 sub-benchmarks, each with a different number of worker threads.
  • 8 is the number of logical cores of your CPU.
  • 7 is the number of times this sub-benchmark was repeated to obtain a stable result. Here it was automatically decided by the Go runtime, later we will learn how to change this ourselves.
  • 144866283 is the average time taken to run this sub-benchmark (not the total time to run 7 iterations). This is always reported in nanoseconds. It's often useful to convert this back to seconds.

We will now use benchmarks to determine the performance improvement associated with adding more worker threads. We will use Python to visualise the results. First, we need to setup the required Go and Python libraries. We recommend doing this on a lab machine. We will not be helping you with Python installations on your local machines.

Download the benchstat Go library.

GO111MODULE=off go get golang.org/x/perf/cmd/benchstat

If you are working on a lab machine, active the Python 3 Anaconda environment.

module add anaconda/2020.07

(You can also install Anaconda on your own machine by following the official guide. This will install the required dependencies: seaborn, numpy, matplotlib and pandas.)

We are now ready to run our Median Filter benchmark. This time we will use two new flags. This will repeat each sub-benchmark 5 times, but each result will be reported individually.

go test -run ^$ -bench . -benchtime 1x -count 5 | tee results.out

The results should look like this.

BenchmarkFilter/1_workers-12         	       1	 294462262 ns/op
BenchmarkFilter/1_workers-12         	       1	 293813452 ns/op
BenchmarkFilter/1_workers-12         	       1	 293680576 ns/op
BenchmarkFilter/1_workers-12         	       1	 294184722 ns/op
BenchmarkFilter/1_workers-12         	       1	 295367359 ns/op
BenchmarkFilter/2_workers-12         	       1	 171195123 ns/op
BenchmarkFilter/2_workers-12         	       1	 172171790 ns/op
BenchmarkFilter/2_workers-12         	       1	 171684903 ns/op
BenchmarkFilter/2_workers-12         	       1	 171855058 ns/op
BenchmarkFilter/2_workers-12         	       1	 190077089 ns/op
BenchmarkFilter/4_workers-12         	       1	 137330430 ns/op
BenchmarkFilter/4_workers-12         	       1	 117784086 ns/op
BenchmarkFilter/4_workers-12         	       1	 125862648 ns/op
BenchmarkFilter/4_workers-12         	       1	 124865701 ns/op
BenchmarkFilter/4_workers-12         	       1	 121237712 ns/op
BenchmarkFilter/8_workers-12         	       1	  98534965 ns/op
BenchmarkFilter/8_workers-12         	       1	 127489070 ns/op
BenchmarkFilter/8_workers-12         	       1	  97566680 ns/op
BenchmarkFilter/8_workers-12         	       1	  98990243 ns/op
BenchmarkFilter/8_workers-12         	       1	 136017044 ns/op
BenchmarkFilter/16_workers-12        	       1	  89829133 ns/op
BenchmarkFilter/16_workers-12        	       1	 134415442 ns/op
BenchmarkFilter/16_workers-12        	       1	 121134810 ns/op
BenchmarkFilter/16_workers-12        	       1	  89106639 ns/op
BenchmarkFilter/16_workers-12        	       1	  91232779 ns/op

We can now use our benchstat library to convert raw benchmark output to a 'Comma Separated Values' (CSV) file.

go run golang.org/x/perf/cmd/benchstat -csv results.out | tee results.csv

The output should look like this.

name,time/op (ns/op),±
Filter/1_workers-12,2.94302E+08,0%
Filter/2_workers-12,1.71727E+08,0%
Filter/4_workers-12,1.25416E+08,9%
Filter/8_workers-12,1.11720E+08,22%
Filter/16_workers-12,1.05144E+08,28%

Finally, we are ready to plot a bar chart of our results using Python.

python plot.py

You should see a bar chart similar to this.

Bar chart

Question 1e

Write about 200 words explaining and analysing how your Median Filter implementation scales as more workers are used.

It is not enough to state what you have done. Remember that you are trying to demonstrate your in-depth understanding. You must explain why the effect your are observing occurs. Use your knowledge of concurrency and computer architecture to do this.

Question 1f

Use the code from main() in ping.go to generate a trace of your parallel filter. Verify that the filter is processing the image via the worker goroutines.

OPTIONAL Question 1g

Go has built-in support for collecting CPU execution profiles. Collect a CPU profile of your median filter. Use the results to understand which functions take the longest to execute. Explain why the filter execution time doesn't halve when number of workers is doubled.

$ go test -bench /8_ -benchtime 1x -count 20 -cpuprofile cpu.prof
$ go tool pprof -pdf -nodefraction=0 -unit=ms cpu.prof

OPTIONAL Question 1h

If your processed images have black lines like the ones we have shown above it is because you divided the original input image exactly into the number of parts required. For each given pixel of the image, the filter needs neighbours in radius 2 - e.g. given bounds 0-128 it will only process pixels in bounds 2-126.

Fix your code so that the resulting images look the same as it used to with the single-threaded filter.

OPTIONAL Question 1i

Go traces are quite powerful. You can define tasks and regions in your code and log messages.

Read this article and experiment with logging messages, and defining tasks and regions in either your filter or ping-pong program. The trace generated from the example program from the article looks like this:

(note the Say Hello task, and sayHello/sayGoodbye regions in the 2 goroutines)

Annotated trace

Try to achieve something similar in ping.go or medianFilter.go.

Question 2 - Parallel tree reduction 🔴 🔴 🔴 ⚪ ⚪

So far we explored concurrency with only a handful of goroutines. In this question, you will try using a very large number of goroutines and you will analyse any costs and benefits of doing so. For example, the trace below illustrates over 8000 goroutines working on sorting a slice of size 10,000,000:

Merge sort trace

Question 2a

Open merge.go. It's a working merge sort program. Your task will be to parallelise the merge sort in the parallelMergeSort() function.

Write a parallel merge sort that creates 2 new goroutines at each split point. Run the main() function and verify that the printed slice is correctly sorted.

Note 1: If you wish to modify the signature of the parallelMergeSort() function, make sure you also modify the calls to that function that happen in merge_test.go.

Note 2: Doing parallel merge sort on a slice will involve concurrent writes. Normally, this is not recommended. However, in this particular problem, each goroutine will have own section of the slice to work on. Hence passing the slice is correct but still very bug-prone. You have to make sure that there is no overlap between the sections that the goroutines are working on. We provided you with the merge() function which was carefully written to avoid bugs and race conditions.

Hint 1

Start by copying the sequential merge sort into parallelMergeSort().

Hint 2

You have to wait for both workers to finish before calling merge(...). This can be done using channels of type chan bool or with WaitGroups.

Question 2b

Now compare your parallel implementation against the sequential one. We provided you with benchmarks as well as a slightly modified Python script. Use them to to plot a bar chart of Array Length against Time.

Question 2c

Click here to see our bar chart

Parallel vs Sequential

You may have noticed that the parallel version is slower than the sequential one! While goroutines are quite lightweight, this experiment shows that they still have an associated overhead. We can make the parallel merge sort faster by reducing the number of goroutines created.

Firstly, at every split, only one new goroutine is needed rather than two. This concept is illustrated below. It's a parallel tree reduction where the operation is addition (rather than a merge sort) on 8 elements.

Note: Although the animation shows 8 threads for clarity all odd threads are redundant as they only pass a message to a thread to their left and don't perform any computation.

Parallel tree reduction

Run benchmarks and analyse the performance of your new algorithm. Given a slice of size n state how many goroutines your first version would've used and how many your new version now uses.

Hint

When splitting right start a new goroutine. When splitting left, do a simple recursive function call. Make sure you do the splitting in that order (new goroutine first and function call second).

Question 2d

Modify your parallelMergeSort() so that below a certain length of the slice it stops spawning new goroutines and instead calls the sequential mergeSort().

Experiment with different thresholds and try to empirically find an optimal one that provides the biggest speed-up.

You should use benchmarks and plot graphs to find the optimal value.

Hint 1

Fix the array length to 1 << 22 (4194304).

Vary the size of the limit from 1 to 1 << 22, doubling the limit on each iteration.

Plot bar charts of Limit against Time and find the most appropriate limit.

Hint 2

If you define your sub-benchmark name as fmt.Sprintf("%d_%d", size, limit), you can extract the limit length in Python using:

benchmark_data['limit'] = benchmark_data['name'].str.extract('_(\d+)')
Our analysis...

In our experiments on an 8 core machine we have found that for an array of length 1 << 22 (4194304), limits between 1 << 9 (512) and 1 << 19 (524288) performed very well, with small differences likely due to noise. The limit of 1 << 13 (8192) seems like the best candidate. The Intel Core i5 processor that we ran the merge sort on has one block of 32KiB L1 cache for each core. This means we can fit exactly 8192 32-bit integers in that space. However, to make this conclusion, we would need to measure performance more thoroughly, especially with different array sizes.

Our results are presented below. Yours may be quite different, due to differences in implementation and hardware as well as random noise.

Thresholds lower than the optimum (left side of the graph) cause a bottleneck due to goroutine creation and communication. Thresholds greater than the optimum (right side of the graph) cause more cache misses and eventually reduce the level of parallelism. When the limit is equal to the size of the initial slice the algorithm stops being parallel.

Graph

Golang/Gin codebase containing real world examples that adheres to the RealWorld spec and API.

Golang/Gin codebase containing real world examples (CRUD, auth, advanced patterns, etc) that adheres to the RealWorld spec and API. This codebase was

null 0 Dec 3, 2021
Concurrency Lab 2 Go Example

Concurrency Lab 2 If you're stuck look at examples on Go by Example Using the lab sheet There are two ways to use the lab sheet, you can either: creat

null 0 Oct 12, 2021
Concurrency Lab 1 Go Example

Concurrency Lab 1 If you're stuck look at examples on Go by Example Using the lab sheet There are two ways to use the lab sheet, you can either: creat

null 0 Oct 11, 2021
A faster RWLock primitive in Go, 2-3 times faster than RWMutex. A Go implementation of concurrency control algorithm in paper

Go Left Right Concurrency A Go implementation of the left-right concurrency control algorithm in paper <Left-Right - A Concurrency Control Technique w

wangyi 40 Apr 4, 2022
Go-concurrency-patterns - Sample concurrency patterns with Goroutines

About This sample project provides some concurrency pattern examples in Go using

George Batagiannis ⚡ 1 Feb 21, 2022
Concurrency in Go video course with in depth explanations & examples

Concurrency in Go Summary Coding Examples Introduction to Concurrency Go Routines Channels Select Concurrency Patterns Atomics Wait Groups - sync.Wait

Go Basics 167 Jun 26, 2022
Distributed Lab 2: RPC in Go

Distributed Lab 2: RPC in Go Using the lab sheet There are two ways to use the lab sheet, you can either: create a new repo from this template - this

null 0 Oct 18, 2021
Distributed Lab 2: RPC in Go

Distributed Lab 2: RPC in Go Using the lab sheet There are two ways to use the lab sheet, you can either: create a new repo from this template - this

null 0 Oct 22, 2021
Distributed Lab 3: Message Broker in Go

Distributed Lab 3: Message Broker in Go Using the lab sheet There are two ways to use the lab sheet, you can either: create a new repo from this templ

null 0 Oct 29, 2021
TNO MPC Lab - Shamir Secret Sharing

TNO MPC Lab - Shamir Secret Sharing The TNO MPC lab consists of generic software components, procedures, and functionalities developed and maintained

TNO - MPC Lab 1 Jun 26, 2022
TNO MPC Lab - Paillier

TNO MPC Lab - Paillier The TNO MPC lab consists of generic software components, procedures, and functionalities developed and maintained on a regular

TNO - MPC Lab 0 Nov 3, 2021
A simple and sussy project is an implementation of SOMMIP Lab 1 written in Golang

SOMMIP Lab 1 Isac Arthur Table of Contents About The Project Getting Started Prerequisites Installation Supported commands About The Project This very

Arthur 1 Nov 10, 2021
A go tasks for students lab

Go tasks one first tasks for students lab Palindrome Given the string, check if it is a palindrome. Example For inputString = "aabaa", the output shou

Denis 0 Nov 20, 2021
Kong and Consul Lab For Golang

Kong and Consul Lab Prerequisites docker Deploy Consul Server docker run -d -p 8500:8500 -p 8600:8600/udp --name=consul-server consul agent -server -u

Ridwan Dwi Siswanto 1 Nov 25, 2021
Gophercon lab "Get Go-ing with GitHub Actions"

GitHub Action Using Go This is a starting point for a GitHub Action based in Go. This repo provides all the structure needed to build a robust GitHub

Doug Graiser 0 Dec 8, 2021
Explores GitHub Actions in Go Lab from GopherCon 2021

Gopher A Tweet An action that tweets. Gopher A Tweet was created based on GopherCon 2021s Gophers of Microsoft: GitHub Action in Go Lab to explore bui

Autumn Martin 0 Dec 10, 2021
mit 6.824 lab

lab-6.824 0. How to run? go版本: 1.13+ 在高于1.11的版本中报unexpected directory layout, 但是用较低版本goLand无法调试... unexpected dir layout 原因是不支持相对路径包引入, 遇到时在import 删掉.

JinyuChata 1 Dec 9, 2021
The kitchen part of the Restaurant simulation of the first lab at the Network Programming course

KitchenServerGo This is the kitchen part of the Restaurant simulation of the first lab at the Network Programming course. The dining hall part of the

null 0 Oct 13, 2021
learn mit 6.824 lab

MIT6.824 lab1 MapReduce timeout: command not found -> brew install coreutils panic data race -> 加锁 内层变量会屏蔽外部同名变量 test1 word-count 测试基本功能 test2 indexer

GoodbyeYesterDay 0 Jan 5, 2022
Os-signal-lab - Experiments with os signals and process trees

os-signal-lab Experiments with os signals and process trees There's a lot to rea

carl-fredrik grimberg 0 Feb 5, 2022
Concurrency-safe Go caching library with expiration capabilities and access counters

cache2go Concurrency-safe golang caching library with expiration capabilities. Installation Make sure you have a working Go environment (Go 1.2 or hig

Christian Muehlhaeuser 1.8k Jul 5, 2022
A simple logging interface that supports cross-platform color and concurrency.

WLog Package wlog creates simple to use UI structure. The UI is used to simply print to the screen. There a wrappers that will wrap each other to crea

Will Dixon 57 May 31, 2022
Concurrency-safe Go caching library with expiration capabilities and access counters

cache2go Concurrency-safe golang caching library with expiration capabilities. Installation Make sure you have a working Go environment (Go 1.2 or hig

Christian Muehlhaeuser 1.8k Jul 3, 2022
Code Generation for Functional Programming, Concurrency and Generics in Golang

goderive goderive derives mundane golang functions that you do not want to maintain and keeps them up to date. It does this by parsing your go code fo

Walter Schulze 987 Jun 29, 2022
A sync.WaitGroup with error handling and concurrency control

go-waitgroup How to use An package that allows you to use the constructs of a sync.WaitGroup to create a pool of goroutines and control the concurrenc

Pieter Claerhout 31 Jun 17, 2022
golang worker pool , Concurrency limiting goroutine pool

golang worker pool 中文说明 Concurrency limiting goroutine pool. Limits the concurrency of task execution, not the number of tasks queued. Never blocks su

xxj 398 Jun 21, 2022
gpool - a generic context-aware resizable goroutines pool to bound concurrency based on semaphore.

gpool - a generic context-aware resizable goroutines pool to bound concurrency. Installation $ go get github.com/sherifabdlnaby/gpool import "github.c

Sherif Abdel-Naby 84 Feb 21, 2022
Structured Concurrency in Go

nursery: structured concurrency in Go RunConcurrently( // Job 1 func(context.Context, chan error) { time.Sleep(time.Millisecond * 10)

null 46 Jul 4, 2022
Concurrency limiting goroutine pool

workerpool Concurrency limiting goroutine pool. Limits the concurrency of task execution, not the number of tasks queued. Never blocks submitting task

Andrew Gillis 842 Jun 28, 2022