Lossless Semantic Trees: The key to restoring creativity to software engineering

Jonathan Schneider
January 17, 2023
Be free!

Key Takeaways

Regardless of our role, we are always looking to accomplish more with the finite resources and time at our disposal. As a software engineer, I could roughly divide the style of work I do into creative work (feature development) and maintenance activities (migrations, chores).

When we first learn to write software it is close to 100% a creative activity. Everything is new and interesting. As we mature in our careers, this slides to maybe 70/30 and sometimes even worse.

The maintenance activity has long been frustratingly immune to automation, though. I used to think St. Louis Cardinals baseball was the answer. It could occupy only about half of my attention, and so could maintenance activity. Together they were bearable.

Now, I have a better solution.

What is a Lossless Semantic Tree?

We are familiar with the shortcomings of text of course. What engineer hasn't written large sequences of regexes to perform some menial task en masse and felt like the master of everything?

Abstract Syntax Tree manipulation feels more promising, but:

  • Typically ASTs don't preserve formatting
  • ASTs may or may not have type awareness of the syntax
  • They are heavy and not designed for round-trip serialization.

The Lossless Semantic Tree is a further evolution geared toward IDE support, preserving formatting, but also optimizing for error-tolerance since the code we type in the IDE is often in a grammatically incomplete state.

The Lossless Semantic Tree has a different set of characteristics:

  • It preserves formatting. In fact, it infers the local and global formatting preferences from observation.
  • It preserves the entire type awareness that the compiler possesses in its intermediate representation before producing its final output.
  • It is designed for round-trip serialization.
  • It is not optimized for grammatical error-tolerance, but is designed for incomplete type information.

The Lossless Semantic Tree is the data model upon which we can build recipes for code transformations. Recipes progressively encapsulate our desired outcomes from simple method renames all the way to full major framework migrations! OpenRewrite is an Apache-licensed Lossless Semantic Tree and set of recipes that are paving the way for engineers to focus on creative rather than mundane tasks!

The Moderne platform takes the Lossless Semantic Tree to another level, enabling organizations to have the full fidelity of their enterprise codebases in hand. They can then enact repeatable, automation-driven refactoring and remediation at scale with minimal developer disruption.

Contact us to learn how you can use this unique technology with your codebase.