rw-book-cover

Metadata

Highlights

  • I transform data every day and I usually do 2 kinds of transformations, changing the data format so I can use it in a tool (CSV to parquet) or the shape, like running an aggregation so I can understand it. I’m using LLMs more and more for this because it saves me a lot of time (and it’s cool) (View Highlight)
  • If you don’t know what the llm command is, please, go to check the fantastic llm cli tool from Simon Willison. The second one has many benefits: • The code will run way faster, LLMs are still slow compared to regular CPUs • The transformation can be audited and fixed (View Highlight)
  • Let’s test it. I have a file with NMEA records from a GPS. NMEA according to Wikipedia “is a combined electrical and data specification for communication between marine electronics such as echo sounder, sonars, anemometer, gyrocompass, autopilot, GPS receivers and many other types of instruments”. I NMEA was invested today would have been NDJSON but at that time machines were sending data through a 9600 bauds per second comm line so they needed to optimize. Parsing is also super easy (probably they couldn’t afford to spend a lot of code for the parsing) but let’s get back to the transformation thing. (View Highlight)
  • I have some data I got from my car’s GPS (which still sends the info using NMEA these days) in a file, I grep GPRMC sentences (the ones that have the coordinates) and pipe into the llm command (using gemini-2.0 code execution). This would be the command (I shortened it for clarity) (View Highlight)
  • It sounds like it did the right transformation (indeed, checking the data, it’s accurate). Just in case you are checking the data carefully, speed attribute feels like too high but it’s a car in a race track, so it’s expected. (View Highlight)
  • But how could we make sure it’s doing it right? I’d not trust the transformed data right away but I can use what we have been using in software development for years: tests. So let’s ask the LLM to generate not just the transform, but also the test with the backwards transformation: (View Highlight)
  • It fails to run because of the pynmea2 dependency but if you run it locally it manages to do it. So running that self-test gives me some confidence about the transformation function and I’d trust it to be in a pull request. (View Highlight)