Beyang Liu for the GopherCon 2019 Liveblog on July 26, 2019
Presenter: Mike Seplowitz
Liveblogger: Beyang Liu
Using a service-oriented architecture instead of a monolithic architecture adds flexibility but also increases complexity. How will we keep track of which parts of our system depend on each other? To help solve this problem, I've been working on a project to analyze Go code (using Go!) to detect inter-process dependencies in our system. In this talk, I'll give a brief introduction to the static analysis libraries I used and share a few tips about how to use them effectively. I'll discuss obstacles I encountered along the way and my attempts to overcome them.
Mike works at Bloomberg on the Deployment Solutions team. Slides for this talk are online here.
Outline of this talk:
When Mike joined the Deployment Solutions team 2 years ago, they made the decision to start using Go and also move to a microservices architecture. Their microservice only had a single endpoint, so perhaps it was more a "nanoservice".
He embarked a project to accomplish the following:
Determine the dependencies of a single service
See inter-service dependencies across the entire system
Here was his initial plan:
What are some example "interesting" calls?
net.Dial, net/http.Get, (*net/http.Client).Get
Higher level:
database/sql.OpenDB
google.golang.org/grpc.Dial
github.com/go-redis/redis.NewClient
github.com/streadway/amqp.Dial (RabbitMQ)
github.com/Shopify/sarama.NewAsyncProducer (Kafka)
(*os/exec.Cmd).Run, os/exec.FindProcess, os.Pipe
os.Create, os.Open, os.Mkdir
Who (which services) are we calling?
http.Get("https:#/myservice.example.com/path/to/resource")
http.Get(myserviceURL)
url #: os.Getenv("MYSERVICE_URL")
...
http.Get(url)
http.Get(app.Config.MyService.URL)
So which call do we "mark" (as indicating an external process dependency)?
You can have a direct call (super straightforward):
You can have a specific helper (i.e., 1 level removed):
You can also have generic helpers:
So you are looking at the call chain:
The question is where in the call chain do you place the marker.
You can look at where the last place where the service URL appears. And you mark that with a comment:
To take a more complex example, you want to look at the call graph and determine where to place the
markers. If you mark func A
in the example below, you've covered all the ingoing paths to func A
,
and you can disregard all outgoing paths, as well. Then you need to mark other functions that haven't yet been "covered":
Source diving... He did a quick search of GoDoc for "call graph" and found the following packages:
golang.org/x/tools/go/callgraph
golang.org/x/tools/go/callgraph/cha
golang.org/x/tools/go/callgraph/rta
golang.org/x/tools/go/callgraph/static
Here are the key functions in each package, all of which take a *ssa.Program
pointer:
A lot of packages in golang.org/x/tools
that are prefixed with analysis/
:
"The analysis package defines the interface between a modular static analysis and an analysis driver program."
A "Pass" is running an Analyzer on a single package. There were a few such reusable passes:
analysis/passes/inspect -> *inspector.Inspector
analysis/passes/ctrlflow -> *cfg.CFG (basically)
analysis/passes/buildssa -> *ssa.Package, []*ssa.Function
The Plan: For each package Pass,
This section is a quick primer on static analysis packages in the Go compiler and standard library.
Relevant packages:
go/
token
ast
parser
golang.org/x/tools/go/
packages
ssa
callgraph
analysis
Here's the rough control flow in the Go compiler:
Here's the data structure that represents locations in code:
ASTs and parsing:
Here's an example AST for a simple code snippet:
Here are the key structures in the go/ast
package:
go/parser
parses Go expressions and returns an AST.
golang.org/x/tools/go/packages
loads packages (this replaces the old golang/x/tools/go/loader
package)
SSA is the intermediate code representation. A lot of analysis programs build the SSA representation and then analyze that:
The callgraph
package constructs call graph data structures from SSA structures.
When you're traversing a call graph, use recursion, and maps are useful to keep track of what you've seen.
The analysis package defines the interface between a modular static analysis and an analysis driver program:
You can store a "Fact"s for later use.
The plan: For each package Pass,
There were a few bumps:
If you don't declare Facts, you don't get some things:
analysis/passes/buildssa
hiccups:
He hit a point of return: singlecheck.Main()
calls os.Exit
, so you can't report results after an invocation to that function.
He tried some workarounds:
if pass.Pkg.Name #= "main" { ...
stdout
or a file as you go (locked with sync.Mutex
)But it was just not working well:
Rebuild from the ground up:
Simple driver:
packages.Load
ssautil.AllPackages and (*ssa.Program).Build)
pointer.Analyze
Did it work? Well... he got this:
And this:
It's a lot of stuff, but it's useful and interesting! He started going through this call graph. Some observations:
There were a few false positives, because call graph analyses don't precisely follow the actual runtime execution path of the code:
It looks like readSource
and ParseFile
use a file, but the example shows we don't actually use a file.
Another class of false positive is what he calls the (*sync.Once).Do
problem. (*sync.Once).Do
invokes a function exactly once. There are many functions that invoke (*sync.Once).Do
:
In callgraph, there is a 1-1 mapping:
So the previous call graph really looks like this:
And because everything calls os.Getenv
, you have a loop:
But if you special case handling of (*sync.Once).Do
, you can rewrite the call graph to look like
what it should:
Aside from a handful of these false positives, it works pretty well. The example below highlights all marked functions in red.
They ran it on real code:
A couple more caveats:
If you're going to do static analysis, consider the following: