Macos Cl Link For Libc Rust Cargo Build

That's the basic functionality. You could also use the build script to specify which library to link, instead of having it in your code (rustc-link-lib).I prefer this because then the two configurations are right next to each other. The libc crate is only FFI declarations basically in Rust. It does not provide docs partially because the docs vary so much between platforms, even for standardized functions. The canonical source for these docs are in man pages either locally or online. Jul 27, 2018 This should instruct Rust, whenever the target is set to -target=x8664-unknown-linux-musl, to use the executable “x8664-linux-musl-gcc” to link the compiled objects. But it seems to be the case that if you have any C code compiled by a Rust build script, you also have to set environment variables like TARGETCC to get it working.

Macos Cl Link For Libc Rust Cargo Builders
Mac Os Cl Link For Libc Rust Cargo Build Up Download
Mac Os Cl Link For Libc Rust Cargo Build Up Crossword
Macos Cl Link For Libc Rust Cargo Building
Mac Os Cl Link For Libc Rust Cargo Build Up 2
Macos Cl Link For Libc Rust Cargo Buildings

The cycle of development we’re most familiar with is: write code, compile your code, then run this code on the same machine you were writing it on. On most desktop OSes, you pick up a compiler by downloading one from your package manager. Xcode and Visual Studio are toolchains (actually IDEs) that leverage being platform-specific, each including tools tailored around the platform your code will run on and heavily showcasing the parent OS’s design language.

Yet you can also write code that runs on platforms you aren’t simultaneously coding on. Every modern computer architecture supports a C compiler you can download and run on your PC, usually a binary + utils for a new gcc or llvm backend. In practice, using these tools means setting several non-obvious environment variables like CC and searching the internet for magic command line arguments (it takes a lot of work to convince a Makefile not to default to running gcc). Installing a compiler for another machine is easy, but getting a usable result takes trial and error.

If you’ve picked up Rust and are learning systems programming, you might ask: Does Rust, a language whose design addresses C’s inadequacies for developing secure software, also address its shortcomings in generating code on other platforms?

Let’s start with the tiered platform support system Rust maintains to track which platforms it supports and how full that support is (from “it actually might not build at all” to “we test it each release”). On its own, this is a useful reference of the target identifiers of popular consumer OSes, embedded platforms, and some experimental ones (like Redox!). The majority of these different platforms don’t actually support running rustc or cargo from the command line though.

Rust makes up for this by advertising a strong cross-compilation story. Quoting from the announcement post for rustup target:

In practice, there are a lot of ducks you have to get in a row to make [cross-compilation] work: the appropriate Rust standard library, a cross-compiling C toolchain including linker, headers and binaries for C libraries, and so on. This typically involves pouring over various blog posts and package installers to get everything “just so”. And the exact set of tools can be different for every pair of host and target platforms.

The Rust community has been hard at work toward the goal of “push-button cross-compilation”. We want to provide a complete setup for a given host/target pair with the run of a single command.

This is an excellent goal given the infrastructure and design challenges ahead. And I wanted to learn more about this part of the article:

Cross-compilation is an imposing term for a common kind of desire:

…
You want to write, test and build code on your Mac, but deploy it to your Linux server.
…

This is exactly the scenario I’m in! But… understandably, the article doesn’t actually include an example of how to do this, because cross-compiling for another OS requires making several assumptions about the target platform that maybe not apply to everyone. Here is a recap of how I made it work for my project.

In my case, the need to build binaries for Linux came up while working on my project edit-text, a collaborative test editor for the web written in Rust. I regularly test changes out in a sandbox environment since you can’t rely on testing code locally to catch behavior that might only appear in production. Yet the issue I kept running into was how long it was taking deploy to my $10 DigitalOcean server. I spent a long time rereading the same compiler logs before it actually dawned on me—I was compiling on my web server and not my laptop. And that’s really slow.

If you have a githook that takes new source code pushed via git and loads it into a Docker container, deploying via git just sends up your source directory and points at a rustc compiler. On each new deploy, your server has to rebuild from your Dockerfile from scratch, and unless you configure it to support caching this throws away the benefit of quickly iterating on your code. If you want the benefit of faster builds by having cargo incrementally cache compiled files between builds, you’ll find it’s complex to get right in your Dockerfile configuration but entirely natural to manage this in your local development environment.

The approach I have the most experience with is to take the compilation environment and just run it locally on my machine. With Docker, we have easy way to run Linux environments (even on Mac) and to pin it to the same development environment I have on my server. Since Docker on my machine will be running Linux in a hypervisor, local performance should beat what I can do on my server even with the overhead of not being the host OS.

Did you know Rust has a first-party story for cross-compiling for Linux using Docker? The Rust cross-compilation imagerustlang/rust:nightly can help you generate binaries for the Linux target compiled with nightly Rust and can be invoked on demand from the command line using docker run . I developed this set of command line arguments to get cross-compilation with caching working:

The binaries this produced, amazingly, worked when I copied them to and ran them on my Linux server. Compiling locally was marginally faster. But there were drawbacks to this approach for cross-compilation:

I had to manage and run Docker locally, which on macOS requires a Docker daemon be running on macOS.
I had to cache each cargo and rust directory individually, which, managed separate from rustup on my machine, seemingly never got garbage collected. I accrued huge directories of cached files just for compiling for Linux.
Intermediate artifacts from successive builds seemed less likely to be cached between builds, meaning building took longer than they should on my machine.

I like that Docker gives a reproducible environment to build in—building Debian binaries on the Debian kernel makes things easy—but Rust’s cross-compilation support might allow me to manage all the compilation artifacts that make modern Rust compile times tolerable.

“rustup target add”

So far I’ve only mentioned the Rust compiler’s support for cross-compiling. There are a actually handful of components you need to make cross-compiling work:

Compiler that support your target
Library headers to link your program against (if any)
Shared library files to link against (if any)
A “linker” for the target platform

Let’s start with compiler support. Passing --target when running cargo build|run changes the assembly your CPU outputs and bundled in object files supported by that OS. But we have to install the new toolchain adding that capability to Cargo. This is done with the command rustup target add.Installing support for another “target” is something you do once on your machine, and from then on cargo can support it via the --target argument.

To install a Linux target on my Mac, I ran:

Macos Cl Link For Libc Rust Cargo Builders

This installs the new compilation target based on its target triplet, which means:

We’re compiling for the x84-64 processor set (AMD64)
from an unknown (generic) vendor, targeting the Linux OS
compiled with the musl toolchain.

A note on musl: the alternative to musl is GNU, as in GNU libc—which almost every Linux environment has installed anyway. So why choose musl? musl is designed to be statically linked into a binary rather than dynamically linked; this means I could compile a single binary I could deploy to any server without any requirements as to what libraries were installed on that OS. This might even mean we can skip steps 2) and 3) above. And I could back to GNU if it didn’t work out.

Finally, we need a program that links our compiled objects together. On macOS, you can use brew to download a community-provided binary for Linux + musl cross-compilation. Just run this to install the toolchain including the command “x86_64-linux-musl-gcc”:

At this point we need to tell Rust about the linker. The official way to do this is to add a new file named .cargo/config in the root of your project and set its content to something similar:

This should instruct Rust, whenever the target is set to --target=x86_64-unknown-linux-musl, to use the executable “x86_64-linux-musl-gcc” to link the compiled objects. But it seems to be the case that if you have any C code compiled by a Rust build script, you also have to set environment variables like TARGET_CC to get it working. So when my code started throwing linking errors, I just ran the following in my shell:

Thankfully, this made the compilation steps with linker errors work consistently.

Libraries and Linking

musl doesn’t support shared libraries, that is, libraries that are independently installed and versioned on your system via a package manager. Shared libraries are ones your program links to these at runtime (once the program starts), rather than literally embedding them in their binary at compile time (static libraries).

Sometimes you can work around the constraint of requiring static libraries without leaving the cargo ecosystem: some radical crates support a “bundled” feature, as in libsqlite3-sys, which compiles a static library during your build step and links it into your project. For example, the SQLite drivers I was relying on had no problem being compiled with musl if I enabled the “bundled” feature; I didn’t have to apt-get install libsqlite3 on the remote platform, nor did I have to find headers that matched it. An app with only this requirement would be a solid usecase for deploying binaries compiled with musl.

If your project depends on openssl, though, you’ll see this type of error midway during cargo build:

No need to check the exit code, this clearly isn’t building correctly. The error says “Could not find directory of OpenSSL installation”. This doesn’t mean I didn’t have OpenSSL installed on my computer (I did); it means it can’t find its headers and source code to compile. Compiling a static binary is with musl is now much more complicated if I need to download and compile an arbitrary OpenSSL dependency.

Why I need OpenSSL as a dependency to begin with: one of the libraries I depend on in edit-text’s software stack,reqwest, relies on native-tls to support encryption for download data over HTTPS. reqwest is a programmatic HTTP client for Rust, and it uses native-tls to link against the OS’s native SSL implementation and expose an agnostic interface to it in order to support HTTPS. I can imagine in the future a reqwest feature that substitutes a rust-tls backend instead of native-tls, allowing me to compile all my crypto could without needing to touch gcc. But for now, since I don’t want the heavy lift of compilng OpenSSL myself, dynamic linking looks like the only way forward.

Debian Packaging

New plan: compile against the GNU toolchain and use dynamic linking. If we don’t want to cross-compile libraries ourselves, then we have to find a source of pre-compiled libraries and headers (which are sometimes distinct things). Since we’re moving away from musl, we’ll even need to bring our own copy of libc!

Luckily, we just have to recreate the same environment as a Linux compiler. This turns out to be straightforward. When I’m compiling code in Linux and need to, say, link against OpenSSL, I can run the following:

Now I can compile any binary which relies on OpenSSL headers, because they were installed to my system. Where are these files? One way to find them is to run dpkg-query -L libssl-dec to list which files were installed by my package manager. In this case, most of our header files are inserted into /usr/include and the Linux libraries in /usr/lib. If we have the .deb file itself, we can actually confirm this by dumping its archive contents:

Where aes.h is the header we might require linking against.

We can essentially reuse these packages on other platforms. Package managers extract files to specific locations on your machine. If can extract these same archives locally, we tell the compiler to look in these folders for headers instead of OS folders.

Let’s describe which archives we want: First, my choice of a broadly-accessible Linux distribution that has good tooling is Debian, of which Ubuntu is a fork and which has a straightforward packaging system found with apt and its .deb package format. Second, we need to pick a sufficiently old enough version of Debian that would support ABI-compatible libraries. I chose /jessie/, the version of Debian immediately prior its current stable release, /stretch/.

We can’t just fetch a .deb archive via ‘apt-get install’ on a Mac though. Downloading library headers directly means navigating a hyperlink survey of computer architectures and CDNs. I poked around for a while to see if there was an obvious way to compute the URL of any Debian package, but it looks like to retrieve the package URL you basically need to reimplement all of aptitude (the package manager used by Debian). Because there were no brew formulae for libapt, and no standalone Rust bindings either, I assumed any solution would be more complicated than just referencing the direct URL. As such, the build script fetches each package URL in sequence and extracts them into a local folder:

You can see the dependencies my build script relies on. Note that we only install the packages we need to build with: backtrace-rs requires the libc library headers, and openssl-sys on Rust requires not only the headers in libssl-dev but also the shared library in libssl1.1. Other than that, these are all the packages I required when cross-compiling.

Mac Os Cl Link For Libc Rust Cargo Build Up Download

Building Linux binaries on macOS

We again need to install a linker, this time one that targets GNU/Linux. Again this is made easy with brew thanks to another community contribution:

Now the executable “x86_64-unknown-linux-gnu-gcc” is available on our PATH.

We next make a series of environment variable updates:

This specifies the linker headers and shared library locations, and some OpenSSL-specific flags required by openssl-sys. Take note of -isystem, which changes where gcc looks for system headers. Because we are using only Debian packages, the OpenSSL-specific build flags refer to the same folders as our other system libraries.

Now we can run the Cargo build command to cross-compile for Linux:

The “standalone” feature is part of the project, and configures everything that can be built without relying on system libraries (like SQLite).

Now in my project’s ./target/debug/x86_64-unknown-linux-gnu/ folder, I can run file on the edit-server binary:

It says it’s a shared object (in this case, an executable) and mentions we compiled it for GNU/Linux. Next, I created a example Dockerfile based on Debian that, when the binary is placed in the same directory, just launches it:

I tried it out withdocker run on my machine, and saw the server successfully boot up:

This is Debian, running locally on my machine, successfully running the binary we compiled on my Mac. Since this is the same Dockerfile we send to the server, this means the server will be able to deploy it too!

Macos cl link for libc rust cargo buildings

There is one more step here: the binary now has to be sent to the server along with each deploy. Checking a large binary into Git just so Dokku could receive it via git push performs very poorly, and really is not what Git is built for. What worked for me: I switched to creating an archive of my Dockerfile’s directory and piping into ssh running dokku tar:in - on the remote server. This Dokku command loads a tarball from stdin (in this case) and deploys it, making it possible to push new code to the server without needing to check in anything to git each time.

Rust advantages in webdev

And the result: it is now much faster to update code running on a server. Compilation speed improved immensely between compiling it remotely, where each compile felt as slow as a full rebuild—and compiling it locally, where cargo’s incremental cache makes builds feel as fast as targeting your default OS. It’s fast enough I can deploy new code to a remote test server when it’s too annoying to set up a local server. Yet Rust’s cross-compilation story can’t eliminate by itself the clumsy ritual of setting arbitrary environment variables in order to get compilation to succeed.

If rustup target is a blueprint for the future, I imagine an ecosystem of cross-compilation tools will inevitably spring up that makes bundling for other OSes straightforward and configurable. Even though Rust isn’t an interpreted language, if deploying code no longer means compiling code on your server, and local recompilation is fast, it makes deployment in Rust feel much more like modern web development. Cross-compilation support is an undersold factor in Rust’s webdev story.

[русский]

Go has good support for calling into assembly, and a lot of the fast cryptographic code in the stdlib is carefully optimized assembly, bringing speedups of over 20 times.

However, writing assembly code is hard, reviewing it is possibly harder, and cryptography is unforgiving. Wouldn't it be nice if we could write these hot functions in a higher level language?

This post is the story of a slightly-less-than-sane experiment to call Rust code from Go fast enough to replace assembly. No need to know Rust, or compiler internals, but knowing what a linker is would help.

Why Rust

I'll be upfront: I don't know Rust, and don't feel compelled to do my day-to-day programming in it. However, I know Rust is a very tweakable and optimizable language, while still more readable than assembly. (After all, everything is more readable than assembly!)

Go strives to find defaults that are good for its core use cases, and only accepts features that are fast enough to be enabled by default, in a constant and successful fight against knobs. I love it for that. But for what we are doing today we need a language that won't flinch when asked to generate stack-only functions with manually hinted away safety checks.

So if there's a language that we might be able to constrain enough to behave like assembly, and to optimize enough to be as useful as assembly, it might be Rust.

Finally, Rust is safe, actively developed, and not least, there's already a good ecosystem of high-performance Rust cryptography code to tap into.

Why not cgo

Go has a Foreign Function Interface, cgo. cgo allows Go programs to call C functions in the most natural way possible—which is unfortunately not very natural at all. (I know more than I'd like to about cgo, and I can tell you it's not fun.)

By using the C ABI as lingua franca of FFIs, we can call anything from anything: Rust can compile into a library exposing the C ABI, and cgo can use that. It's awkward, but it works.

We can even use reverse-cgo to build Go into a C library and call it from random languages, like I did with Python as a stunt. (It was a stunt folks, stop taking me seriously.)

But cgo does a lot of things to enable that bit of Go naturalness it provides: it will setup a whole stack for C to live in, it makes defer calls to prepare for a panic in a Go callback.. this ~~could be~~ will be a whole post of its own.

As a result, the performance cost of each cgo call is way too high for the use case we are thinking about--small hot functions.

Linking it together

So here's the idea: if we have Rust code that is as constrained as assembly, we should be able to use it just like assembly, and call straight into it. Maybe with a thin layer of glue.

We don't have to work at the IR level: the Go compiler converts both code and high-level assembly into machine code before linking since Go 1.3.

This is confirmed by the existence of 'external linking', where the system linker is used to put together a Go program. It's how cgo works, too: it compiles C with the C compiler, Go with the Go compiler, and links it all together with clang or gcc. We can even pass flags to the linker with CGO_LDFLAGS.

Underneath all the safety features of cgo, we surely find a cross-language function call, after all.

It would be nice if we could figure out how to do this without patching the compiler, though. First, let's figure out how to link a Go program with a Rust archive.

I could not find a decent way to link against a foreign blob with go build (why should there be one?) except using #cgo directives. However, invoking cgo makes .s files go to the C compiler instead of the Go one, and my friends, we will need Go assembly.

Thankfully go/build is nothing but a frontend! Go offers a set of low level tools to compile and link programs, go build just collects files and invokes those tools. We can follow what it does by using the -x flag.

I built this small Makefile by following a -x -ldflags '-v -linkmode=external '-extldflags=-v' invocation of a cgo build.

This compiles a simple main package composed of a Go file (hello.go) and a Go assembly file (hello.s).

Now, if we want to link in a Rust object we first build it as a static library..

.. and then just tell the external linker to link it together.

Jumping into Rust

Alright, so we linked it, but the symbols are not going to do anything just by sitting next to each other. We need to somehow call the Rust function from our Go code.

We know how to call a Go function from Go. In assembly the same call looks like CALL hello(SB), where SB is a virtual register all global symbols are relative to.

If we want to call an assembly function from Go we make the compiler aware of its existence like a C header, by writing func hello() without a function body.

I tried all combinations of the above to call an external (Rust) function, but they all complained that they couldn't find either the symbol name, or the function body.

But cgo, which at the end of the day is just a giant code generator, somehow manages to eventually invoke that foreign function! How?

I stumbled upon the answer a couple days later.

That looks like an interesting pragma! //go:linkname just creates a symbol alias in the local scope (which can be used to call private functions!), and I'm pretty sure the byte trick is only cleverness to have something to take the address of, but //go:cgo_import_static.. this imports an external symbol!

Armed with this new tool and the Makefile above, we have a chance to invoke this Rust function (hello.rs)

(The no-mangle-pub-extern incantation is from this tutorial.)

from this Go program (hello.go)

with the help of this assembly snippet. (hello.s)

CALL was a bit too smart to work, but using a simple JMP..

Well, it crashes when it tries to return. Also that $2048 value is the whole stack size Rust is allowed (if it's even putting the stack in the right place), and don't ask me what happens if Rust tries to touch a heap.. but hell, I'm surprised it works at all!

Calling conventions

Now, to make it return cleanly, and take some arguments, we need to look more closely at the Go and Rust calling conventions. A calling convention defines where arguments and return values sit across function calls.

The Go calling convention is described here and here. For Rust we'll look at the default for FFI, which is the standard C calling convention.

To keep going we're going to need a debugger. (LLDB supports Go, but breakpoints are somehow broken on macOS, so I had to play inside a privileged Docker container.)

![Zelda dangerous to go alone](/content/images/2017/08/zelda-2.png)

The Go calling convention

The Go calling convention is mostly undocumented, but we'll need to understand it to proceed, so here is what we can learn from a disassembly (amd64 specific). Let's look at a very simple function.

foo has 256 (0x100) bytes of local frame, 16 bytes of arguments, 8 bytes of return value, and it returns its first argument.

Mac Os Cl Link For Libc Rust Cargo Build Up Crossword

The caller, seen above, does very little: it places the arguments on the stack in reverse order, at the bottom of its own frame (rsp to 16(rsp), remember that the stack grows down) and executes CALL. The CALL will push the return pointer to the stack and jump. There's no caller cleanup, just a plain RET.

Notice that rsp is fixed, and we have movqs, not pushs.

The first 4 and last 2 instructions of the function are checking if there is enough space for the stack, and if not calling runtime.morestack. They are probably skipped for NOSPLIT functions.

Then there's the rsp management, which subtracts 0x108, making space for the entire 0x100 bytes of frame in one go, and the 8 bytes of frame pointer. So rsp points to the bottom (the end) of the function frame, and is callee managed. Before returning, rsp is returned to where it was (just past the return pointer).

Finally the frame pointer, which is effectively pushed to the stack just after the return pointer, and updated at rbp. So rbp is also callee saved, and should be updated to point at where the caller's rbp is stored to enable stack trace unrolling.

Finally, from the body itself we learn that return values go just above the arguments.

Virtual registers

The Go docs say that SP and FP are virtual registers, not just aliases of rsp and rbp.

Indeed, when accessing SP from Go assembly, the offsets are adjusted relative to the real rsp so that SP points to the top, not the bottom, of the frame. That's convenient because it means not having to change all offsets when changing the frame size, but it's just syntactic sugar. Naked access to the register (like MOVQ SP, DX) accesses rsp directly.

The FP virtual register is simply an adjusted offset over rsp, too. It points to the bottom of the caller frame, where arguments are, and there's no direct access.

Note: Go maintains rbp and frame pointers to help debugging, but then uses a fixed rsp and omit-stack-pointer-style rsp offsets for the virtual FP. You can learn more about frame pointers and not using them from this Adam Langley blog post.

The C calling convention

'sysv64', the default C calling convention on x86-64, is quite different:

The arguments are passed via registers: RDI, RSI, RDX, RCX, R8, and R9.
The return value goes to RAX.
Some registers are callee-saved: RBP, RBX, and R12–R15.
- We care little about this, since in Go all registers are caller-saved.
The stack must be aligned to 16-bytes.
- (I think this is why JMP worked and CALL didn't, we failed to align the stack!)

Frame pointers work the same way (and are generated by rustc with -g).

Gluing them together

Building a simple trampoline between the two conventions won't be hard. We can also look at asmcgocall for inspiration, since it does approximately the same job, but for cgo.

We need to remember that we want the Rust function to use the stack space of our assembly function, since Go ensured for us that it's present. To do that, we have to rollback rsp from the end of the stack.

⬇⬇

CALL on macOS

CALL didn't quite work on macOS. For some reason, there the function call was replaced with an intermediate call to _cgo_thread_start, which is not that incredible considering we are using something called cgo_import_static and that CALL is virtual in Go assembly.

We can bypass that 'helper' by using the full //go:linkname incantation we found in the standard library to take a pointer to the function, and then calling the function pointer, like this.

Is it fast?

The point of this whole exercise is to be able to call Rust instead of assembly for cryptographic operations (and to have fun). So a rustgo call will have to be almost as fast as an assembly call to be useful.

Benchmark time!

We'll compare incrementing a uint64 inline, with a //go:noinline function, with the rustgo call above, and with a cgo call to the exact same Rust function.

Rust was compiled with -g -O, and the benchmarks were run on macOS on a 2.9GHz Intel Core i5.

rustgo is 11% slower than a Go function call, and almost 15 times faster than cgo!

Macos Cl Link For Libc Rust Cargo Building

The performance is even better when run on Linux without the function pointer workaround, with only a 2% overhead.

A real example

For a real-world demo, I picked the excellent curve25519-dalek library, and specifically the task of multiplying the curve basepoint by a scalar and returning its Edwards representation.

The Cargo benchmarks swing widely between executions because of CPU frequency scaling, but they suggest the operation will take 22.9µs ± 17%.

On the Go side, we'll expose a simple API.

On the Rust side, it's not different from building an interface for normal FFI.

I'll be honest, it took me forever to figure out enough Rust to make this work.

To build the .a we use cargo build --release with a Cargo.toml that defines the dependencies, enables frame pointers, and configures curve25519-dalek to use its most efficient math and no standard library.

Finally, we need to adjust the trampoline to take two arguments and return no value.

The result is a transparent Go call with performance that closely resembles the pure Rust benchmark, and is almost 6% faster than cgo!

For comparison, similar functionality is provided by github.com/agl/ed25519/edwards25519, and that pure-Go library takes almost 3 times as long.

Packaging up

Now we know it actually works, that's exciting! But to be usable it will have to be an importable package, not forced into package main by a weird build process.

This is where //go:binary-only-package comes in! That annotation allows us to tell the compiler to ignore the source of the package, and to only use the pre-built .a library file in $GOPATH/pkg.

If we can manage to build a .a file that works with Go's native linker (cmd/link, referred to also as the internal linker), we can redistribute that and it will let our users import the package as if it was a native one, including cross-compiling (provided we included a .a for that platform)!

The Go side is easy, and pairs with the assembly and Rust we already have. We can even include docs for go doc's benefit.

The Makefile will have to change quite a bit—since we aren't building a binary anymore we don't get to keep using go tool link.

A .a archive is just a pack of .o object files in an ancient format with a symbol table. If we could get the symbols from the Rust libed25519_dalek_rustgo.a library into the edwards25519.a archive that go tool compile made, we should be golden.

.a archives are managed by the ar UNIX tool, or by its Go internal counterpart, cmd/pack (as in go tool pack). The two formats are ever-so-subtly different, of course. We'll need to use the platform ar for libed25519_dalek_rustgo.a and the Go cmd/pack for edwards25519.a.

(For example, the platform ar on my macOS uses the BSD convention of calling files #1/LEN and then embedding the filename of length LEN at the beginning of the file, to exceed the 16 bytes max file length. That was confusing.)

To bundle the two libraries I tried doing the simplest (read: hackish) thing: extract libed25519_dalek_rustgo.a into a temporary folder, and then pack the objects back into edwards25519.a.

Imagine my surprise when it worked!

With the .a in place it's just a matter of making a simple program using the package.

And running go build!

Well, it almost worked. We cheated. The binary would not compile unless we linked it to libresolv. To be fair, the Rust compiler tried to tell us. (But who listens to everything the Rust compiler tells you anyway?)

Now, linking against system libraries would be a problem, because it will never happen with internal linking and cross-compilation..

But hold on a minute, libresolv?! Why does our no_std, 'should be like assembly', stack only Rust library want to resolve DNS names?

I really meant `no_std`

The problem is that the library is not actually no_std. Look at all that stuff in there! We want nothing to do with allocators!

So how do we actually make it no_std? This turned out to be an entire side-quest, but I'll give you a recap.

If any dependency is not no_std, your no_std flag is nullified. One of the curve25519-dalek dependencies had this problem, cargo update fixed that.
Actually making a no_stdstaticlib (that is, an library for external use, as opposed to for inclusion in a Rust program) is more like making a no_stdexecutable, which is much harder as it must be self-contained.
The docs on how to make a no_stdexecutable are sparse. I mostly used an old version of the Rust book and eventually found this section in the lang_items chapter. This blog post was useful.
For starters, you need to define 'lang_items' functions to handle functionality that is normally in the stdlib, like panic_fmt.
Then you are without the Rust equivalents of compiler-rt, so you have to import the crate compiler_builtins. (rust-lang/rust#43264)
Then there's a problem with rust_begin_unwind being unexported, which don't ask me why but is solved by marking panic_fmt as no_mangle, which the linter is not happy about. (rust-lang/rust#38281)
Then you are without memcpy, but thankfully there's a native Rust reimplementation in the rlibc crate. Super useful learning that nm -u will tell you what symbols are missing from an object.

This all boils down to a bunch of arcane lines at the top of our lib.rs.

And with that, go build works (!!!) on macOS.

Mac Os Cl Link For Libc Rust Cargo Build Up 2

Linux

On Linux nothing works.

External linking complains about fmax and other symbols missing, and it seems to be right.

A friend thankfully suggested making sure that I was using --gc-sections to strip dead code, which might reference things I don't actually need. And sure enough, this worked. (That's three layers of flag-passing right there.)

But umh, in the Makefile we aren't using a linker at all, so where do we put --gc-sections? The answer is to stop hacking .as together and actually reading the linker man page.

We can build a .o containing a given symbol and all the symbols it references with ld -r --gc-sections -u $SYMBOL. -r makes the object reusable for a later link, and -u marks a symbol as needed, or everything would end up garbage collected. $SYMBOL is scalar_base_mult in our case.

Why wasn't this a problem on macOS? It would have been if we linked manually, but the macOS compiler apparently does dead symbol stripping by default.

This is also the part where we learn painfully that the macOS platform prepends a _ to all symbol names, because reasons.

So here's the Makefile portion that will work with external linking out of the box.

The last missing piece is internal linking on Linux. In short, it was not linking the Rust code, even if the compilation seemed to succeed. The relocations were not happening and the CALL instructions in our Rust function left pointing at meaningless addresses.

At that point I felt like it had to be a silent linker bug, the final boss in implementing rustgo, and reached out to people much smarter than me. One of them was guiding me in debugging cmd/link (which was fascinating!) when Ian Lance Taylor, the author of cgo, helpfully pointed out that //cgo:cgo_import_static is not enough for internal linking, and that I also wanted //cgo:cgo_import_dynamic.

I still have no idea why leaving it out would result in that issue, but adding it finally made our rustgo package compile both with external and internal linking, on Linux and macOS, out of the box.

Redistributable

Now that we can build a .a, we can take the suggestion in the //go:binary-only-package spec, and build a tarball with .as for linux_amd64/darwin_amd64 and the package source, to untar into a GOPATH to install.

Once installed like that, the package will be usable just like a native one, cross-compilation included (as long as we packaged a .a for the target)!

The only thing we have to worry about is that if we build Rust with -Ctarget-cpu=native it might not run on older CPUs. Thankfully benchmarks (and the curve25519-dalek authors) tell us that the only real difference is between post and pre-Haswell processors, so we only have to make a universal build and a Haswell one.

As the cherry on top, I made the Makefile obey GOOS/GOARCH, converting them as needed into Rust target triples, so if you have Rust set up for cross-compilation you can even cross-compile the .a itself.

Here's the result: github.com/FiloSottile/ed25519-dalek-rustgo/edwards25519. It's even on godoc.

Turning it into a real thing

Well, this was fun.

But to be clear, rustgo is not a real thing that you should use ~~in production~~. For example, I suspect I should be saving g before the jump, the stack size is completely arbitrary, and shrinking the trampoline frame like that will probably confuse the hell out of debuggers. Also, a panic in Rust might get weird.

To make it a real thing I'd start by calling morestack manually from a NOSPLIT assembly function to ensure we have enough goroutine stack space (instead of rolling back rsp) with a size obtained maybe from static analysis of the Rust function (instead of, well, made up).

It could all be analyzed, generated and built by some 'rustgo' tool, instead of hardcoded in Makefiles and assembly files. cgo itself is little more than a code-generation tool after all. It might make sense as a go:generate thing, but I know someone who wants to make it a cargo command. (Finally some Rust-vs-Go fighting!) Also, a Rust-side collection of FFI types like, say, GoSlice would be nice.

Or maybe a Go or Rust adult will come and tell us to stop before we get hurt.

In the meantime, you might want to follow me on Twitter.

Macos Cl Link For Libc Rust Cargo Buildings

EDIT: It was pointed out to me that if we simply named the Rust object file libed25519_dalek_rustgo.syso, we could skip all the go tool invocations and simply use go build which automatically links .syso files found in the package. But what's the fun in that.

Thanks (in no particular order) to David, Ian, Henry, Isis, Manish, Zaki, Anna, George, Kaylyn, Bill, David, Jess, Tony and Daniel for making this possible. Don't blame them for the mistakes and horrors, those are mine.

Calling Rust from Go without cgo.
Is it fast? Yes. Should you do it? Probably not. Was it fun to hack? Extremely.https://t.co/kcvbnYcDl5pic.twitter.com/2Vv0dMC3Ob

— Filippo Valsorda (@FiloSottile) 15 August 2017

P.S. Before anyone tries to compare this to cgo (which has many more safety features) or pure Go, it's not meant to replace either. It's meant to replace manually written assembly with something much safer and more readable, with comparable performance. Or better yet, it was meant to be a fun experiment.