Golang 高级编程总结

紫藤庄园2025年1月10日...大约 11 分钟

Golang 高级编程总结

基础编程

https://gopl-zh.github.io/ch1/ch1-01.html GO语言圣经

常用数据结构

性能测试

1秒（s）= 1000毫秒（ms）= 1000000微秒（μs）= 1000000000纳秒（ns）‌
https://github.com/yangliangxii/golangspace.git 测试用例代码仓库地址。

执行包内以Fib结尾的函数，cpu分别使用2核和4核

go test -bench='Fib$' -cpu=2,4 .

goos: windows
goarch: amd64
pkg: golangspace/aaron
cpu: 11th Gen Intel(R) Core(TM) i7-11700 @ 2.50GHz
BenchmarkFib-2               375           3165111 ns/op
BenchmarkFib-4               379           3137674 ns/op
PASS
ok      golangspace/aaron    3.161s

执行包内以Fib结尾的函数，时长5s ：

go test -bench='Fib$' -benchtime=5s .

goos: windows
goarch: amd64
pkg: golangspace/aaron
cpu: 11th Gen Intel(R) Core(TM) i7-11700 @ 2.50GHz
BenchmarkFib-16             1878           3170029 ns/op
PASS
ok      golangspace/aaron    6.423s

执行包内以Fib结尾的函数，调用50次：

go test -bench='Fib$' -benchtime=50x .

goos: windows
goarch: amd64
pkg: golangspace/aaron
cpu: 11th Gen Intel(R) Core(TM) i7-11700 @ 2.50GHz
BenchmarkFib-16               50           3100336 ns/op
PASS
ok      golangspace/aaron    0.290s

执行包内以Fib结尾的函数，时长5s，执行3轮：

go test -bench='Fib$' -benchtime=5s -count=3 .

goos: windows
goarch: amd64
pkg: golangspace/aaron
cpu: 11th Gen Intel(R) Core(TM) i7-11700 @ 2.50GHz
BenchmarkFib-16             1876           3163577 ns/op
BenchmarkFib-16             1930           3140367 ns/op
BenchmarkFib-16             1890           3126634 ns/op
PASS
ok      golangspace/aaron    19.019s

执行包内包含Generate的函数：

go test -bench='Generate' .

goos: windows
goarch: amd64
pkg: golangspace/aaron
cpu: 11th Gen Intel(R) Core(TM) i7-11700 @ 2.50GHz
BenchmarkGenerateWithCap-16           73          14738748 ns/op
BenchmarkGenerate-16                  67          20847016 ns/op
PASS
ok      golangspace/aaron    2.677s

执行包内包含Generate的函数，并打印内存情况：

go test -bench='Generate' -benchmem .

goos: windows
goarch: amd64
pkg: golangspace/aaron
cpu: 11th Gen Intel(R) Core(TM) i7-11700 @ 2.50GHz
BenchmarkGenerateWithCap-16           72          14573425 ns/op         8003613 B/op          1 allocs/op
BenchmarkGenerate-16                  49          20905673 ns/op        41678229 B/op         38 allocs/op
PASS
ok      golangspace/aaron    2.261s

执行包内所有函数，显示时间复杂度线性相关O(1)，O(n)，O(n^2) ：

go test -bench .

goos: windows
goarch: amd64
pkg: golangspace/aaron
cpu: 11th Gen Intel(R) Core(TM) i7-11700 @ 2.50GHz
BenchmarkFib-16                      378           3184793 ns/op
BenchmarkGenerateWithCap-16           82          14581967 ns/op
BenchmarkGenerate-16                  62          21137939 ns/op
BenchmarkGenerate1000-16           49927             23592 ns/op
BenchmarkGenerate10000-16           6822            177876 ns/op
BenchmarkGenerate100000-16           649           1903959 ns/op
BenchmarkGenerate1000000-16           51          21438167 ns/op
PASS
ok      golangspace/aaron    11.058s

执行BenchmarkFibWithTimeSleep，发现有耗时项干扰：

go test -bench='BenchmarkFibWithTimeSleep' -benchtime=50x .

goos: windows
goarch: amd64
pkg: golangspace/aaron
cpu: 11th Gen Intel(R) Core(TM) i7-11700 @ 2.50GHz
BenchmarkFibWithTimeSleep-16                  50          63144668 ns/op
PASS
ok      golangspace/aaron    6.321s

执行BenchmarkFibResetTimer，屏蔽耗时干扰，发现每次时间恢复正常：

go test -bench='BenchmarkFibResetTimer' -benchtime=50x .

goos: windows
goarch: amd64
pkg: golangspace/aaron
cpu: 11th Gen Intel(R) Core(TM) i7-11700 @ 2.50GHz
BenchmarkFibResetTimer-16             50           3152508 ns/op
PASS
ok      golangspace/aaron    6.347s

执行BenchmarkBubbleSort，使用计时器截取掉耗时部分，只测试排序方法：

go test -bench='Sort$' .
goos: windows
goarch: amd64
pkg: golangspace/aaron
cpu: 11th Gen Intel(R) Core(TM) i7-11700 @ 2.50GHz
BenchmarkBubbleSort-16                24          58052742 ns/op
PASS
ok      golangspace/aaron    1.589s

总结：

进行性能测试时，尽可能保持测试环境的稳定

实现 benchmark 测试
• 位于 _test.go 文件中
• 函数名以 Benchmark 开头
• 参数为 b *testing.B
• b.ResetTimer() 可重置定时器
• b.StopTimer() 暂停计时
• b.StartTimer() 开始计时

执行 benchmark 测试
• go test -bench . 执行当前测试
• b.N 决定用例需要执行的次数
• -bench 可传入正则，匹配用例
• -cpu 可改变 CPU 核数
• -benchtime 可指定执行时间或具体次数
• -count 可设置 benchmark 轮数
• -benchmem 可查看内存分配量和分配次数

CPU 性能分析

未安装GraphViz工具会提示下面：

go tool pprof -http=:9999 cup.pprof

Serving web UI on http://localhost:9999
Failed to execute dot. Is Graphviz installed?
exec: "dot": executable file not found in %PATH%
Failed to execute dot. Is Graphviz installed?
exec: "dot": executable file not found in %PATH%

安装GraphViz工具并配置环境变量后：

go tool pprof -http=:9999 cup.pprof

Serving web UI on http://localhost:9999
open C:\Users\YANGLI~1\AppData\Local\Temp\go-build1575250374\b001\exe\main.exe: The system cannot find the path specified.

http://localhost:9999/ui/ 图形化分析各种性能问题

Memory 性能分析

安装工具包：go get github.com/pkg/profile

go run memeory.go

2025/01/21 16:12:29 profile: cpu profiling enabled, C:\Users\YANGLI~1\AppData\Local\Temp\profile480846933\cpu.pprof
2025/01/21 16:12:29 profile: cpu profiling disabled, C:\Users\YANGLI~1\AppData\Local\Temp\profile480846933\cpu.pprof

总结：

性能分析类型

CPU 性能分析，runtime 每隔 10 ms 中断一次，记录此时正在运行的 goroutines 的堆栈信息
内存性能分析，记录堆内存分配时的堆栈信息，忽略栈内存分配信息，默认每 1000 次采样 1 次
阻塞性能分析，GO 中独有的，记录一个协程等待一个共享资源花费的时间
锁性能分析，记录因为锁竞争导致的等待或延时
CPU 性能分析

使用原生 runtime/pprof 包，通过在 main 函数中添加代码运行可生成性能分析报告：

pprof.StartCPUProfile(os.Stdout)
defer pprof.StopCPUProfile()
可通过 go tool pprof -http=:9999 cpu.pprof 在 web 页面查看分析数据

可通过 go tool pprof cpu.prof 交互模式查看分析数据，可使用 help 查看支持的命令和选项

内存性能分析

使用 pkg/profile 库，通过在 main 函数中添加代码运行可生成性能分析报告：

defer profile.Start(profile.MemProfile, profile.MemProfileRate(1)).Stop()
同样可通过 web 页面或交互模式查看分析数据

benchmark 生成 profile

可通过在 go test  中添加参数 -cpuprofile=$FILE,-memprofile=$FILE,-blockprofile=$FILE 生成相应的 profile 文件
生成的 profile 文件同样可通过 web 页面或交互模式查看分析数据

不同字符串拼接性能分析

go test -bench="Concat$" -benchmem .

goos: windows
goarch: amd64
pkg: golangspace/aaron/string
cpu: 11th Gen Intel(R) Core(TM) i7-11700 @ 2.50GHz
BenchmarkPlusConcat-16                        25          46436144 ns/op        530998230 B/op     10027 allocs/op
BenchmarkSpritfConcat-16                      13          98884962 ns/op        833461521 B/op     34162 allocs/op
BenchmarkBuilderConcat-16                  10000            109509 ns/op          514801 B/op         23 allocs/op
BenchmarkBuilderAdvConcat-16               27162             38357 ns/op          106496 B/op          1 allocs/op
BenchmarkBuffConcat-16                     15727             77942 ns/op          368577 B/op         13 allocs/op
BenchmarkByteConcat-16                     13934             82607 ns/op          621298 B/op         24 allocs/op
BenchmarkPreByteConcat-16                  28651             42797 ns/op          212993 B/op          2 allocs/op
PASS
ok      golangspace/aaron/string     10.974s

go test -run='TestBuilderConcat' . -v

=== RUN   TestBuilderConcat
16 32 64 128 256 512 896 1408 2048 3072 4096 5376 6912 9472 12288 16384 21760 28672 40960 57344 73728 98304 131072 --- PASS: TestBuilderConcat (0.00s)
PASS
ok      golangspace/aaron/string     0.171s

总结：

字符串最高效的拼接方式是结合预分配内存方式 Grow 使用 string.Builder
当使用 + 拼接字符串时，生成新字符串，需要开辟新的空间
当使用 strings.Builder，bytes.Buffer 或 []byte 的内存是按倍数申请的，在原基础上不断增加
strings.Builder 比 bytes.Buffer 性能更快，一个重要区别在于 bytes.Buffer 转化为字符串重新申请了一块空间存放生成的字符串变量；而 strings.Builder 直接将底层的 []byte 转换成字符串类型返回

切片性能分析

go test -run=^TestLastChars  -v

=== RUN   TestLastCharsBySlice
    slice_test.go:48: 100.24 MB
--- PASS: TestLastCharsBySlice (0.21s)
=== RUN   TestLastCharsByCopy
    slice_test.go:52: 3.24 MB
--- PASS: TestLastCharsByCopy (0.21s)
PASS
ok      golangspace/aaron/advance    0.583s

大量切片部分截取时使用copy比引用原始切片性能更好

go test -run=^TestLastChars  -v
=== RUN   TestLastCharsBySlice
    slice_test.go:50: 100.24 MB
--- PASS: TestLastCharsBySlice (0.24s)
=== RUN   TestLastCharsByCopy
    slice_test.go:54: 0.24 MB
--- PASS: TestLastCharsByCopy (0.22s)
PASS
ok      golangspace/aaron/advance    0.637s

添加垃圾回收后，效果更加明显。仅仅用了0.24M

当切片容量发生改变时，才会分配新的内存空间，切片底层指针会指向新的空间。如果容量不变，传递给函数的参数作用效果是可以体现在原切片中的。【容量一变，我是我，你是你，你我没关系】

For -Range性能分析

go test -bench=IntSlice$ .

[1 4 9]
16 32 64 128 256 512 896 1408 2048 3072 4096 5376 6912 9472 12288 16384 21760 28672 40960 57344 73728 98304 131072 goos: windows
goarch: amd64
pkg: golangspace/aaron/advance
cpu: 11th Gen Intel(R) Core(TM) i7-11700 @ 2.50GHz
BenchmarkForIntSlice-16             5202            228768 ns/op
BenchmarkRangeIntSlice-16           5077            236501 ns/op
PASS
ok      golangspace/aaron/advance    4.410s

go test -bench=Struct$ .

[1 4 9]
16 32 64 128 256 512 896 1408 2048 3072 4096 5376 6912 9472 12288 16384 21760 28672 40960 57344 73728 98304 131072 goos: windows
goarch: amd64
pkg: golangspace/aaron/advance
cpu: 11th Gen Intel(R) Core(TM) i7-11700 @ 2.50GHz
BenchmarkForStruct-16                    4892674               233.4 ns/op
BenchmarkRangeIndexStruct-16             5064908               236.2 ns/op
BenchmarkRangeStruct-16                     4838            249716 ns/op
PASS
ok      golangspace/aaron/advance    4.688s

从最终的结果可以看到，遍历 []int 类型的切片，for 与 range 性能几乎没有区别。
对于复杂的struct结构，仅遍历下标的情况下，for 和 range 的性能几乎是一样的。
对于遍历struct结构中的数据时，for 的性能大约是 range (同时遍历下标和值) 的 2000 倍。

总结
range在迭代过程中返回的是迭代值的拷贝，如果每次迭代的元素的内存占用很低，那么for和
range的性能几乎是一样，例如 []int。但是如果迭代的元素内存占用较高，例如一个包含很
多属性的struct结构体，那么for的性能将显著地高于range，有时候甚至会有上千倍的性能差异。对于这种场景，建议使用 for，如果使用 range，建议只迭代下标，通过下标访问迭代值，这种使用方式和 for 就没有区别了。如果想使用 range 同时迭代下标和值，则需要将切片/数组的元素改为指针，才能不影响性能。

反射性能分析

go test -bench='Benchmark.*Reflect.*' -benchmem .

[1 4 9]
16 32 64 128 256 512 896 1408 2048 3072 4096 5376 6912 9472 12288 16384 21760 28672 40960 57344 73728 98304 131072 goos: windows
goarch: amd64
pkg: golangspace/jikett/advance
cpu: 11th Gen Intel(R) Core(TM) i7-11700 @ 2.50GHz
BenchmarkNoReflectNew-16        46277370                25.08 ns/op           64 B/op          1 allocs/op
BenchmarkReflect-16             36037658                33.16 ns/op           64 B/op          1 allocs/op
PASS
ok      golangspace/jikett/advance      3.028s

go test -bench='Benchmark.*Set.*' -benchmem .

[1 4 9]
16 32 64 128 256 512 896 1408 2048 3072 4096 5376 6912 9472 12288 16384 21760 28672 40960 57344 73728 98304 131072 goos: windows
goarch: amd64
pkg: golangspace/jikett/advance
cpu: 11th Gen Intel(R) Core(TM) i7-11700 @ 2.50GHz
BenchmarkReflectSet-16                  1000000000               0.2202 ns/op          0 B/op          0 allocs/op
BenchmarkReflectByFieldSet-16           92530477                12.26 ns/op            0 B/op          0 allocs/op
BenchmarkReflectByFiledNameSet-16        6076742               196.8 ns/op            32 B/op          4 allocs/op
PASS
ok      golangspace/jikett/advance      3.405s

总结一下，对于一个普通的拥有 4 个字段的结构体 Config 来说，使用反射给每个字段赋值，相比直接赋值，性能劣化约 100 - 1000 倍。其中，FieldByName 的性能相比 Field 劣化 10 倍。
也就是说，在反射的内部，字段是按顺序存储的，因此按照下标访问查询效率为 O(1)，而按照 Name 访问，则需要遍历所有字段，查询效率为 O(N)。结构体所包含的字段(包括方法)越多，那么两者之间的效率差距则越大。

go test -bench='Benchmark.*Set.*' -benchmem .

[1 4 9]
16 32 64 128 256 512 896 1408 2048 3072 4096 5376 6912 9472 12288 16384 21760 28672 40960 57344 73728 98304 131072 goos: windows
goarch: amd64
pkg: golangspace/jikett/advance
cpu: 11th Gen Intel(R) Core(TM) i7-11700 @ 2.50GHz
BenchmarkReflectSet-16                          1000000000               0.2176 ns/op          0 B/op          0 allocs/op
BenchmarkReflectByFieldSet-16                   92516209                12.13 ns/op            0 B/op          0 allocs/op
BenchmarkReflectByFiledNameSet-16                5964534               199.2 ns/op            32 B/op          4 allocs/op
BenchmarkReflectByFieldNameCacheSet-16          32516446                35.65 ns/op            0 B/op          0 allocs/op
PASS
ok      golangspace/jikett/advance      4.583s

消耗时间从原来的 10 倍，缩小到了 2 倍。

结构体性能分析

空结构体 struct{} 实例不占据任何的内存空间。因此被广泛作为各种场景下的占位符使用。一是节省资源，二是空结构体本身就具备很强的语义，即这里不需要任何值，仅作为占位符。

fmt.Println("空结构体不占内存", unsafe.Sizeof(struct{}{}))

go run main.go

&{Name:global_server IP:10.0.0.1 URL:geektutu.com Timeout:} 
空结构体不占内存 0

一个结构体实例所占据的空间等于各字段占据空间之和，再加上内存对齐的空间大小。

内存对齐对性能的影响

CPU 访问内存时，并不是逐个字节访问，而是以字长（word size）为单位访问。比如 32 位的 CPU ，字长为 4 字节，那么 CPU 访问内存的单位也是 4 字节。
简言之：合理的内存对齐可以提高内存读写的性能，并且便于实现变量操作的原子性。
当 struct{} 作为其他 struct 最后一个字段时，需要填充额外的内存保证安全。

并发编程

编译优化

减小编译体积

go build -o server.exe main.go
生成的文件大小是9.83M

go build -ldflags="-s -w" -o server1.exe main.go
生成的文件大小是6.85M

go build -o server.exe main.go 
upx.exe -9 server.exe
先后用上面两条命令生成的文件大小是5.34M

go build -ldflags="-s -w" -o server3.exe main.go
upx.exe -9 server.exe
先后用上面两条命令生成的文件大小是2.49M

https://github.com/upx/upx.git upx库支持windows macos liunx等平台。
windows 安装后配置环境变量。cmd中执行即可。
如果对编译后的体积没什么要求的情况下，可以不使用 upx 来压缩。一般在服务器端独立运行的后台服务，无需压缩体积。

逃逸分析对性能的影响

-gcflags=-m 编译时加上这个参数就可以看到逃逸分析情况。

go build -o ./compile/demo.exe -ldflags="-s -w" -gcflags=-m main.go

# command-line-arguments
./main.go:16:12: inlining call to fmt.Printf
./main.go:17:13: inlining call to fmt.Println
./main.go:19:13: inlining call to fmt.Println
./main.go:21:7: inlining call to advance.Set.Add
./main.go:22:7: inlining call to advance.Set.Add
./main.go:23:34: inlining call to advance.Set.Has
./main.go:23:13: inlining call to fmt.Println
./main.go:24:34: inlining call to advance.Set.Has
./main.go:24:13: inlining call to fmt.Println
./main.go:29:13: inlining call to fmt.Println
./main.go:35:25: inlining call to compile.CreateDemo
./main.go:36:13: inlining call to fmt.Println
./main.go:16:12: ... argument does not escape
./main.go:17:13: ... argument does not escape
./main.go:17:14: "=====上面是反射=====" escapes to heap
./main.go:19:13: ... argument does not escape
./main.go:19:14: "空结构体不占内存" escapes to heap
./main.go:19:55: unsafe.Sizeof(struct{}{}) escapes to heap
./main.go:20:11: make(advance.Set) does not escape
./main.go:23:13: ... argument does not escape
./main.go:23:14: "有张3? = " escapes to heap
./main.go:23:34: ~R0 escapes to heap
./main.go:24:13: ... argument does not escape
./main.go:24:14: "有李4? = " escapes to heap
./main.go:24:34: ~R0 escapes to heap
./main.go:29:13: ... argument does not escape
./main.go:29:14: "=====上面是空Struct=====" escapes to heap
./main.go:35:25: new(compile.Demo) does not escape
./main.go:36:13: ... argument does not escape
./main.go:36:15: d.Name escapes to heap