S01 E07
21 Jun 2021/01:06:13

What's a bug? What's a debugger?


Oxide and Friends Twitter Space: June 21, 2021

What’s a bug? What’s a debugger?
We’ve been holding a Twitter Space weekly on Mondays at 5p for about an hour. Even though it’s not (yet?) a feature of Twitter Spaces, we have been recording them all; here is the recording for our Twitter Space for June 21, 2021.
In addition to Bryan Cantrill and Adam Leventhal, speakers on June 21st included Dan Cross, Sean Klein, Aram Hăvărneanu, and the mononymous Nate. (Did we miss your name and/or get it wrong? Drop a PR!)
Some of the topics we hit on, in the order that we hit them:
  • Adam’s toddler (being chased by a rooster) > Don’t get me wrong, some of my best friends are three-year-olds.
  • [@3:12](https://youtu.be/UOucW3F7nCg?t=192) Sy Brand’s tutorial Writing a Debugger
  • [@4:34](https://youtu.be/UOucW3F7nCg?t=274) Bryan’s debuggers
    • MDB Modular Debugger > Adam: I think people are using cargo-cult debugging, rather than getting to the root cause > of these things, or being satisfied until they get to the root cause.
      > Bryan: I think with software systems, it’s really hard to know what they’re actually doing.
    • Procedure Linkage Table aka “the plits”
    • “Runtime Performance Analysis of the M-to-N Scheduling Model” (pdf) 1996 undergrad thesis (Brown CS dept website)

    • [@6:29](https://youtu.be/UOucW3F7nCg?t=389) Threadmon website and 1997 paper (a retooling of the ’96 paper) > When I built that tooling, it revealed this thing > is not doing at all what anyone thought it was doing.
    • TNF Trace Normal Form > Part of the problem with debuggers… debuggers are historically written by compiler folks, > and not system folks. As a result, debuggers are designed to debug the problem that > compiler folks have the most familiarity with, and that’s a compiler.
      > Debuggers are designed for reproducible problems, way too frequently.
  • I view in situ breakpoint debugging as one sliver of debugging that’s useful for one particular and somewhat unusual class of bugs. That’s actually not the kind of debugger I want to use most of the time.
  • Software breakpoints
  • [@11:59](https://youtu.be/UOucW3F7nCg?t=719) > libdis was my intern project in 2000. The idea was to take the program text, > and interpret it in some structural form, and try to infer different things about the program.
  • [@14:59](https://youtu.be/UOucW3F7nCg?t=899) I meant this question earnestly, what is a debugger?
    • The first bug > The term is somewhat regrettable… It implies a problem, when there may not be a problem. > It may just be I want to understand how the system is operating, independent of whether > it’s doing it badly.
  • Wikipedia on Observability (control theory)
  • Oxide’s embedded OS and companion debugger: Hubris and Humility
  • [@19:01](https://youtu.be/UOucW3F7nCg?t=1141) Using DTrace to help customers understand their systems. > If you strings the DTrace binary, > you’re not gonna find any mention of raincoats.
  • [@22:13](https://youtu.be/UOucW3F7nCg?t=1333) Cardinal rule of debuggers: Don’t kill the patient! (see also: Do No Harm) > Not killing the patient is really important, > this was always an Ur principle for us.
  • The notion that the debugger has now become load bearing in the execution of the program, is a pretty grave responsibility.
  • [@26:54](https://youtu.be/UOucW3F7nCg?t=1614) Post-mortem debugging > It is a tragedy of our domain that we do not debug post-mortem, routinely.
  • Heisenbug (when the act of observing the problem, hides the problem)
  • [@31:11](https://youtu.be/UOucW3F7nCg?t=1871) > What’s going on in the system? It’s not crashing, there’s no core dump. > But the system is behaving in a way I didn’t expect it to, and I want to know why.
  • [@33:51](https://youtu.be/UOucW3F7nCg?t=2031) Pre-production reliability techniques > All of our pre-production work has gotten way better than it was, and I think that’s > compensation for the fact we can’t understand these systems when we deploy them.
  • [@37:58](https://youtu.be/UOucW3F7nCg?t=2278) > The move to testing has in fact obviated some of the need for > what we consider traditional debuggers.
     > (Bryan audibly cringes)
  • [@39:08](https://youtu.be/UOucW3F7nCg?t=2348) Automated and Algorithmic Debugging conference AADEBUG 2003
    • HOPL History of Programming Languages > There was a test suite of excellence when it comes to automated program debugging. > And it was some pile of C programs with known bugs, and you would throw your new > paper at it, and it would find 84% of the bugs, and there would be a lot of > slapping each other on the back on that. Really focused on the simplest of simple bugs.
    • [@43:15](https://youtu.be/UOucW3F7nCg?t=2595) Bryan’s Postmortem Object Type Identification paper > Who is my neighbor in memory? Because my neighbor just burned down my house basically.
    • mdb’s ::kgrep > I need to pause you there because it’s so crazy, and I want to emphasize that > he means what he’s saying. We look for the 64 bit value, and see where we find it. > This is a game of bingo across the entire address space.
  • We can follow the pointers and propagate types.
  • [@48:49](https://youtu.be/UOucW3F7nCg?t=2929) printf/println debugging – everyone’s doing it > I think it’s a mistake for people to denigrate printf debugging. > If you’ve got a situation that you can debug quickly with printf, you should do that.
    • Early, sometimes student-friendly IDEs > These poor students are weeping in the Sun lab at two in the morning because they > can’t debug their programs, because they’re not allowed to use printf
  • [@54:14](https://youtu.be/UOucW3F7nCg?t=3254) Research on statistical debugging from Ben Liblit
  • [@57:32](https://youtu.be/UOucW3F7nCg?t=3452) > The disposition towards tooling changes once you’ve found your first bug with it.
  • “I’m dealing with a house fire right now, it’s not time for me to learn something new, my house is burning and I want to focus on that.”
  • NOVA hypervisor debugging by inspecting registers > There’s nothing quite like driving one of these unknown issues to the root cause; so satisfying.
  • [@1:02:10](https://youtu.be/UOucW3F7nCg?t=3730) > I buy the argument that some of the lack of observability has been one of the strong motivators > for rooting out some of these problems earlier with CI/CD and test-driven development.
  • [@1:03:04](https://youtu.be/UOucW3F7nCg?t=3784) > Dynamically instrumenting dynamic languages effectively requires VM cooperation.
  • Adam’s ten-year prediction: the end of Moore’s Law will precipitate a culture of observability and debugging.
If we got something wrong or missed something, please file a PR! Our next Twitter space will likely be on Monday at 5p Pacific Time; stay tuned to our Twitter feeds for details. We’d love to have you join us, as we always love to hear from new speakers!