栈堆

栈一般由操作系统来分配和释放，堆由程序员通过编程语言来申请创建与释放。

Stacks are generally allocated and deallocated by the operating system, while heaps are created and released by programmers through programming languages.
栈用来存放函数的参数、返回值、局部变量、函数调用时的临时上下文等，堆用来存放全局变量。Stacks are used to store function parameters, return values, local variables, and temporary contexts during function calls, while heaps are used to store global variables.
栈的访问速度相对比堆快。Stacks have faster access speed compared to heaps.
一般来说，每个线程分配一个stack，每个进程分配一个heap，也就是说，stack 是线程独占的，heap 是线程共用的。each thread is allocated a stack, while each process is allocated a heap.
stack 创建的时候，大小是确定的，数据超过这个大小，就发生stack overflow 错误，而heap的大小是不确定的，需要的话可以不断增加。When a stack is created, its size is fixed. If data exceeds this size, a stack overflow error occurs. On the other hand, the size of a heap is not fixed and can be increased as needed.
栈是由高地址向低地址增长的，而堆是由低地址向高地址增长的。Stacks grow from high addresses to low addresses, while heaps grow from low addresses to high addresses.

Go的堆栈分配

只要有对变量的引用，变量就会存在，而它存储的位置与语言的语义无关。如果可能，变量会被分配到其函数的栈，但如果编译器无法证明函数返回之后变量是否仍然被引用，就必须在堆上分配该变量，采用垃圾回收机制进行管理，从而避免指针悬空。此外，局部变量如果非常大，也会存在堆上。

在编译器中，如果变量具有地址，就作为堆分配的候选，但如果逃逸分析可以确定其生存周期不会超过函数返回，就会分配在栈上。

总之，分配在堆还是栈完全由编译器确定。

如果变量都分配到堆上，堆不像栈可以自动清理。它会引起Go频繁地进行垃圾回收，而垃圾回收会占用比较大的系统开销。

变量分配在栈上需要能在编译期确定它的作用域，否则会分配到堆上。

不要盲目使用变量的指针作为函数参数，虽然它会减少复制操作。但其实当参数为变量自身的时候，复制是在栈上完成的操作，开销远比变量逃逸后动态地在堆上分配内存少的多。Avoid blindly using pointers to variables as function parameters, even though it reduces copying operations. In fact, when the parameter is the variable itself, the copying is done on the stack, which is much less expensive than dynamically allocating memory on the heap after the variable escapes.

内存逃逸

Go 中变量分配在栈还是堆上完全由编译器决定，而原本看起来应该分配在栈上的变量，如果其生命周期获得了延长，被分配在了堆上，就说它发生了逃逸。In Go, the allocation of variables on the stack or heap is entirely determined by the compiler. Variables that would normally be allocated on the stack may be allocated on the heap if their lifetime is extended. This is known as “escaping” the stack.

可以通过在编译时使用 -gcflags=-m 参数来查看 Go 代码中的逃逸分析情况。编译器将输出逃逸分析的相关信息，帮助开发者优化代码。

如果工程师能够精准地为每一个变量分配合理的空间，那么整个程序的运行效率和内存使用效率一定是最高的，但是手动分配内存会导致如下的两个问题：

不需要分配到堆上的对象分配到了堆上 — 浪费内存空间；
需要分配到堆上的对象分配到了栈上 — 悬挂指针、影响内存安全

在编译器优化中，逃逸分析是用来决定指针动态作用域的方法。Go 语言的编译器使用逃逸分析决定哪些变量应该在栈上分配，哪些变量应该在堆上分配，其中包括使用 new、make 和字面量等方法隐式分配的内存，Go 语言的逃逸分析遵循以下两个不变性：

指向栈对象的指针不能存在于堆中；A pointer to a stack object cannot exist in the heap.
指向栈对象的指针不能在栈对象回收后存活；A pointer to a stack object cannot survive after the stack object is deallocated.

package main

import "fmt"

func main() {
	var a [1]int
	c := a[:]
	fmt.Println(c)
}

$ go tool compile -m main.go
main.go:8:13: inlining call to fmt.Println
main.go:6:6: moved to heap: a  # 第 6 行的变量 a 分配到了堆上
main.go:8:13: c escapes to heap # 变量 c 逃逸到了堆上
main.go:8:13: []interface {} literal does not escape
<autogenerated>:1: .this does not escape
<autogenerated>:1: .this does not escape

# 逃逸分析
go build -gcflags=-m main.go

如果变量具有地址，就作为堆分配的候选，但如果逃逸分析可以确定其生存周期不会超过函数返回，就会分配在栈上

变量发生逃逸的情况可以总结

方法内返回局部变量指针 返回时被外部引用，因此其生命周期大于栈，则溢出
发送指针或带有指针的值到 channel 中 编译时，是没有办法知道哪个 goroutine 会在 channel 上接收数据。所以编译器没法知道变量什么时候才会被释放。
在一个切片上存储指针或带指针的值 导致切片的内容逃逸。尽管其后面的数组可能是在栈上分配的，但其引用的值一定是在堆上。
slice append 时可能会超出其容量( cap ) 它最开始会在栈上分配。如果切片背后的存储要基于运行时的数据进行扩充，就会在堆上分配
在 interface 类型上调用方法。 在 interface 类型上调用方法都是动态调度的 —— 方法的真正实现只能在运行时知道。Printf Sprintf等等

Goroutine调度 GMP

G: Goroutine，即我们在 Go 程序中使用 go 关键字创建的执行体；Goroutine，which is created using the go keyword in Go programs. It represents a lightweight thread of execution.
M: Machine，或 worker thread，即传统意义上进程的线程；which is a traditional thread in the operating system. It is responsible for executing Go code.
P: Processor，代表 Go 代码片段执行所需的上下文环境，P 的主要作用是负责 Goroutine 的调度和管理。P 的最大数量决定了 Go 程序的并发规模，由 runtime.GOMAXPROCS 变量决定。Processor, which represents the context in which Go code is executed. The maximum number of P determines the concurrency level of a Go program and is controlled by the runtime.GOMAXPROCS variable.
Seched 代表着一个调度器它维护有存储空闲的M队列和空闲的P队列，可运行的G队列，自由的G队列以及调度器的一些状态信息等。The scheduler maintains idle M and P queues, runnable G queues, free G queues, and some status information of the scheduler.

只有当 M 与一个 P 关联后才能执行 Go 代码。除非 M 发生阻塞或在进行系统调用时间过长时，没有与之关联的 P。A G can only execute when it is associated with an M and a P. Unless an M is blocked or waiting for a system call, it will not have an associated P.

最多只会有 GOMAXPROCS 个活跃线程能够正常运行。在默认情况下，运行时会将 GOMAXPROCS 设置成当前机器的核数，我们也可以在程序中使用 runtime.GOMAXPROCS 来改变最大的活跃线程数。At most, there can be GOMAXPROCS active threads running concurrently. By default, the runtime sets GOMAXPROCS to the number of cores on the machine. We can also use runtime.GOMAXPROCS in our program to change the maximum number of active threads.

自旋线程：处于运行状态但是没有可执行 goroutine 的线程，数量最多为 GOMAXPROC，若是数量大于 GOMAXPROC 就会进入休眠。The thread is in a running state but without any executable goroutines. The number of threads is limited to GOMAXPROC, and if the number exceeds GOMAXPROC, the extra threads will go into sleep mode.

非自旋线程：处于运行状态有可执行 goroutine 的线程。

自旋本质是在运行，线程在运行却没有执行 g，就变成了浪费CPU，销毁线程可以节约CPU资源不是更好吗？实际上，创建和销毁CPU都是浪费时间的，我们希望当有新 goroutine 创建时，立刻能有 m 运行它，如果销毁再新建就增加了时延，降低了效率。当然也考虑了过多的自旋线程是浪费 CPU，所以系统中最多有 GOMAXPROCS 个自旋的线程，多余的没事做的线程会让他们休眠（函数：notesleep() 实现了这个思路）。When a thread is running without executing any goroutines, it indeed becomes a waste of CPU resources. However, destroying threads can also be time-consuming and inefficient. The creation and destruction of threads both incur overhead. Instead of constantly creating and destroying threads, it is more efficient to manage and reuse existing threads. This way, we can minimize the overhead and maximize the utilization of CPU resources.

复用线程：协程本身就是运行在一组线程之上，所以不需要频繁的创建、销毁线程，而是对线程进行复用。在调度器中复用线程还有2个体现Coroutines themselves run on a group of threads, so there is no need to frequently create and destroy threads. Instead, threads are reused in the scheduler. There are two aspects of thread reuse in the scheduler：

work stealing，当本线程无可运行的 G 时，尝试从其他线程绑定的 P 偷取 G，而不是销毁线程。When the current thread has no runnable coroutines (G), it tries to steal coroutines from other threads bound to P, instead of terminating the thread
hand off，当本线程因为 G 进行系统调用阻塞时，线程释放绑定的 P，把 P 转移给其他空闲的线程执行。When the current thread blocks due to a system call while executing a coroutine (G), the thread releases the bound P and hands it off to another idle thread for execution.

利用并行：GOMAXPROCS 设置 P 的数量，当 GOMAXPROCS 大于 1 时，就最多有 GOMAXPROCS 个线程处于运行状态，这些线程可能分布在多个 CPU 核上同时运行，使得并发利用并行。另外，GOMAXPROCS 也限制了并发的程度，比如 GOMAXPROCS = 核数/2，则最多利用了一半的 CPU 核进行并行。

Goroutine 可能处于以下 9 种状态

状态	描述
`_Gidle`	刚刚被分配并且还没有被初始化
`_Grunnable`	没有执行代码，没有栈的所有权，存储在运行队列中
`_Grunning`	可以执行代码，拥有栈的所有权，被赋予了内核线程 M 和处理器 P
`_Gsyscall`	正在执行系统调用，拥有栈的所有权，没有执行用户代码，被赋予了内核线程 M 但是不在运行队列上
`_Gwaiting`	由于运行时而被阻塞，没有执行用户代码并且不在运行队列上，但是可能存在于 Channel 的等待队列上
`_Gdead`	没有被使用，没有执行代码，可能有分配的栈
`_Gcopystack`	栈正在被拷贝，没有执行代码，不在运行队列上
`_Gpreempted`	由于抢占而被阻塞，没有执行用户代码并且不在运行队列上，等待唤醒
`_Gscan`	GC 正在扫描栈空间，没有执行代码，可以与其他状态同时存在

入队列

runtime.runqput 会将 Goroutine 放到运行队列上，这既可能是全局的运行队列，也可能是处理器本地的运行队列：

当 next 为 true 时，将 Goroutine 设置到处理器的 runnext 作为下一个处理器执行的任务；
当 next 为 false 并且本地运行队列还有剩余空间时，将 Goroutine 加入处理器持有的本地运行队列；
当处理器的本地运行队列已经没有剩余空间时就会把本地队列中的一部分 Goroutine 和待加入的 Goroutine 通过 runtime.runqputslow 添加到调度器持有的全局运行队列上；

处理器本地的运行队列是一个使用数组构成的环形链表，它最多可以存储 256 个待执行任务。

出队列

为了保证公平，当全局运行队列中有待执行的 Goroutine 时，通过 schedtick 保证有一定几率会从全局的运行队列中查找对应的 Goroutine；
从处理器本地的运行队列中查找待执行的 Goroutine；
如果前两种方法都没有找到 Goroutine，会通过 runtime.findrunnable 进行阻塞地查找 Goroutine
从本地运行队列、全局运行队列中查找；
从网络轮询器中查找是否有 Goroutine 等待运行；
通过 runtime.runqsteal 尝试从其他随机的处理器中窃取待运行的 Goroutine，该函数还可能窃取处理器的计时器；

优先级为本地 > 全局 > 网络 > 偷取。

触发调度

主动挂起

runtime.park_m 会将当前 Goroutine 的状态从 _Grunning 切换至 _Gwaiting，调用 runtime.dropg 移除线程和 Goroutine 之间的关联，在这之后就可以调用 runtime.schedule 触发新一轮的调度了。

当 Goroutine 等待的特定条件满足后，运行时会调用 runtime.goready 将因为调用 runtime.gopark 而陷入休眠的 Goroutine 唤醒。

系统调用 syscall
协作式调度 runtime.Gosched 函数会主动让出处理器，允许其他 Goroutine 运行。
I/O，select
channel
等待锁
runtime.Gosched()

当 G 中存在一些 I/O 系统调用阻塞了 M时，P 将会断开与 M 的联系，从调度器空闲 M 队列中获取一个 M 或者创建一个新的 M 组合执行，保证 P 中可执行 G 队列中其他 G 得到执行，且由于程序中并行执行的 M 数量没变，保证了程序 CPU 的高利用率。

GC

标记清除

标记阶段 — 从根对象出发查找并标记堆中所有存活的对象；Starting from the root object, it searches and marks all live objects in the heap
清除阶段 — 遍历堆中的全部对象，回收未被标记的垃圾对象并将回收的内存加入空闲链表It traverses all objects in the heap, collects the unmarked garbage objects, and adds the recovered memory to the free list

从根对象出发依次遍历对象的子对象并将从根节点可达的对象都标记成存活状态，不可达被当做垃圾。Starting from the root object, it sequentially traverses the child objects of each object and marks the objects reachable from the root node as live, while the unreachable objects are considered garbage.

三色标记

白色对象 — 潜在的垃圾，其内存可能会被垃圾收集器回收；
灰色对象 — 活跃的对象，因为存在指向白色对象的外部指针，垃圾收集器会扫描这些对象的子对象；
黑色对象 — 活跃的对象，包括不存在任何引用外部指针的对象以及从根对象可达的对象；

首先创建三个集合：白、灰、黑。将所有对象放入白色集合中。然后从根节点开始遍历所有对象，把遍历到的对象从白色集合放入灰色集合。之后遍历灰色集合，将灰色对象引用的对象从白色集合放入灰色集合，之后将此灰色对象放入黑色集合重复 4 直到灰色中无任何对象

增量垃圾收集 — 增量地标记和清除垃圾，降低应用程序暂停的最长时间；incrementally mark and sweep garbage to minimize the longest pause time of the application.
并发垃圾收集 — 利用多核的计算资源，在用户程序执行时并发标记和清除垃圾；use the computing resources of multiple cores to concurrently mark and sweep garbage while the user program is executing.

因为增量和并发两种方式都可以与用户程序交替运行，所以我们需要使用屏障技术保证垃圾收集的正确性；与此同时，应用程序也不能等到内存溢出时触发垃圾收集，因为当内存不足时，应用程序已经无法分配内存，这与直接暂停程序没有什么区别，增量和并发的垃圾收集需要提前触发并在内存不足前完成整个循环，避免程序的长时间暂停。

触发

堆内存的分配达到达控制器计算的触发堆大小 Heap memory allocation reaches the trigger heap size calculated by the controller
如果当前没有开启垃圾收集/一定时间内没有触发则触发新的循环 If garbage collection is not currently enabled or there is no trigger within a certain period of time, a new cycle is triggered

copy

深拷贝

拷贝的是数据本身，创造一个样的新对象，新创建的对象与原对象不共享内存，新创建的对象在内存中开辟一个新的内存地址，新对象值修改时不会影响原对象值。既然内存地址不同，释放内存地址时，可分别释放。

值类型的数据，默认全部都是深复制，Array、Int、String、Struct、Float，Bool。
浅拷贝

拷贝的是数据地址，只复制指向的对象的指针，此时新对象和老对象指向的内存地址是一样的，新对象值修改时老对象也会变化。释放内存地址时，同时释放内存地址。

引用类型的数据，默认全部都是浅复制，Slice，Map。channel function interface

nil slices vs non-nil slices vs empty slices

nil and empty slices 是不一样的，但是表面的行为是一样的。

都有内置len()和cap()函数
使用 for range 遍历
可以对其切片，You can slice them (by not violating the restrictions outlined at Spec: Slice expressions; so the result will also be an empty slice)
Since their length is 0, you can’t change their content (appending a value creates a new slice value)

var s1 []int         // nil slice
s2 := []int{}        // non-nil, empty slice
s3 := make([]int, 0) // non-nil, empty slice

fmt.Println("s1", len(s1), cap(s1), s1 == nil, s1[:], s1[:] == nil)
fmt.Println("s2", len(s2), cap(s2), s2 == nil, s2[:], s2[:] == nil)
fmt.Println("s3", len(s3), cap(s3), s3 == nil, s3[:], s3[:] == nil)

for range s1 {}
for range s2 {}
for range s3 {}


s1 0 0 true [] true
s2 0 0 false [] false
s3 0 0 false [] false

A slice value is represented by a struct defined in reflect.SliceHeader:

type SliceHeader struct {
    Data uintptr
    Len  int
    Cap  int
}

对切片进行nil切片会导致nil切片，对非nil切片进行切片会导致非nil切片。

nil slice,所有字段都是对应的零值（ this struct will have its zero value which is all its fields will be their zero value, that is: 0.）
non-nil slice with both capacity and length equal to 0, Len and Cap fields will most certainly be 0,（cap 和 len都是0，但是数组指针不一样，和nil 不同是大小为0不同类型的值有相同的内存地址。） but the Data pointer may not be. It will not be, that is what differentiates it from the nil slice. It will point to a zero-sized underlying array.

获取数组指针值

var s1 []int
s2 := []int{}
s3 := make([]int, 0)

fmt.Printf("s1 (addr: %p): %+8v\n",
    &s1, *(*reflect.SliceHeader)(unsafe.Pointer(&s1)))
fmt.Printf("s2 (addr: %p): %+8v\n",
    &s2, *(*reflect.SliceHeader)(unsafe.Pointer(&s2)))
fmt.Printf("s3 (addr: %p): %+8v\n",
    &s3, *(*reflect.SliceHeader)(unsafe.Pointer(&s3)))


s1 (addr: 0x1040a130): {Data:       0 Len:       0 Cap:       0}
s2 (addr: 0x1040a140): {Data: 1535812 Len:       0 Cap:       0}
s3 (addr: 0x1040a150): {Data: 1535812 Len:       0 Cap:       0}

All slices (slice headers) have different memory addresses
The nil slice has 0 data pointer
s2 and s3 slices do have the same data pointer, sharing / pointing to the same 0-sized memory value

控制协程(goroutine)的并发数量

并发过高导致程序崩溃，简而言之，系统的资源被耗尽了。

应用程序来主动限制并发的协程数量

利用 channel 的缓存区

// main_chan.go
func main() {
	var wg sync.WaitGroup
	ch := make(chan struct{}, 3)
	for i := 0; i < 10; i++ {
		ch <- struct{}{}
		wg.Add(1)
		go func(i int) {
			defer wg.Done()
			log.Println(i)
			time.Sleep(time.Second)
			<-ch
		}(i)
	}
	wg.Wait()
}

每秒钟只并发执行了 3 个任务，达到了协程并发控制的目的。

第三方库协程池

性能调优

Go 语言高性能编程