Data Notes From A Recent Solution

SqlInSix Tech Blog
2 min readJul 16, 2024

--

Reviewing some thoughts from a recent solution.

Some thoughts from a recent data solution:

  • Without being able to test data accuracy with development (ie: baseline data source is not available), the development time will lengthen because it’s a theoretical solution. Test driven development saves time.
  • Know when to use ZORDER BY in Databricks, as it can be extremely useful.
  • Label data and infrastructure appropriately as early as possible. Avoid using overly complex terms or terms that may carry other meanings in labels. Example — if you have five notebooks for the same data set, referring to them as if they’re the same will confuse everyone. Create appropriate labels — (example) Ingestion notebook, Transformation notebook, Historical notebook, Testing notebook, Reporting notebook.
  • It only takes one minor detail for an AI solution to come with a huge cost. Also, related to this — people don’t like to admit they used AI. Unfortunately, we found this out later in the solution and if we had known earlier, it may have helped. This is a new business challenge that I expect to see.
  • The more general an error is, the more troubleshooting is involved. Make sure errors are detailed; if you’re tying out an identity in a data warehouse, but there’s a missing value causing a foreign key issue, throw that error. The reason that we can solve this fast in a database is the specificity.

--

--

SqlInSix Tech Blog
SqlInSix Tech Blog

Written by SqlInSix Tech Blog

I speak and write about research and data. Given my increased speaking frequency, I write three articles per year here.

No responses yet