Test step
- Create and attach a 1-replica volume.
- Create 2 snapshots with large amount of data so that rebuilding each snapshot would take some time.
- Disable the scheduling for the nodes so that there is one node could accept new replicas of the volume.
- Update the replica count to 2 for the volume and wait for the rebuilding start.
- While syncing the 1st snapshot file, create a directory with the name of another snapshot meta file. Later the rebuilding replica will fail to create this meta file then error out.
- Verify there is no NPE issue (no following log) in the instance manager pod when the failure mentioned above is triggered.
2021/12/24 16:29:02 http: panic serving 10.42.2.251:42464: runtime error: invalid memory address or nil pointer dereference goroutine 88514 [running]: net/http.(*conn).serve.func1(0xc00032e000) /usr/local/go/src/net/http/server.go:1772 +0x139 panic(0xd73a40, 0x168d100) /usr/local/go/src/runtime/panic.go:975 +0x3e3 github.com/longhorn/sparse-tools/sparse/rest.(*SyncServer).close(0xc00041c050, 0x108cf60, 0xc00015c000, 0xc000364200) /go/src/github.com/longhorn/longhorn-engine/vendor/github.com/longhorn/sparse-tools/sparse/rest/handlers.go:119 +0x63 net/http.HandlerFunc.ServeHTTP(0xc0001f6830, 0x108cf60, 0xc00015c000, 0xc000364200) /usr/local/go/src/net/http/server.go:2012 +0x44 github.com/gorilla/mux.(*Router).ServeHTTP(0xc0004f6000, 0x108cf60, 0xc00015c000, 0xc00020a300) /go/src/github.com/longhorn/longhorn-engine/vendor/github.com/gorilla/mux/mux.go:212 +0xe2 net/http.serverHandler.ServeHTTP(0xc000428000, 0x108cf60, 0xc00015c000, 0xc00020a300) /usr/local/go/src/net/http/server.go:2807 +0xa3 net/http.(*conn).serve(0xc00032e000, 0x10915a0, 0xc0003ae240) /usr/local/go/src/net/http/server.go:1895 +0x86c created by net/http.(*Server).Serve /usr/local/go/src/net/http/server.go:2933 +0x35c
- Verify the rebuilding will be restarted and succeed.
- Verify the data of the volume.
GitHub Issue: https://github.com/longhorn/longhorn/issues/2820