Alex Pavluck is our resident AI expert and has spent months diving into AI tools, creating prototypes, and looking at what works, and what doesn't when it comes to helping schools with their data. Read more below on what he's found!
Generative AI tools like ChatGPT, Gemini, and Claude offer powerful capabilities, particularly for content generation and summarization. At Standard Ed, our team wanted to assess the feasibility of leveraging generative AI, potentially using a platform like Azure AI Foundry, to create a secure cloud service that enables users to interact with all their data using natural language. Our underlying theory was straightforward: using natural language should remove key barriers to information use, such as finding information, reading complex documents, and synthesizing knowledge.
LLM Success: Interpretation and Text
For text documents, this workflow proved highly effective. We also validated valuable use cases when we fed in data visualizations and successfully asked the LLM to summarize and interpret the key findings of the analysis. In these scenarios, the LLM excels at pattern recognition and turning complex visual data into clear, human-readable language output.
LLM Roadblocks: Raw Datasets and Code Generation
However, our experience with raw datasets, such as data contained in CSV or Excel files, demonstrated significant roadblocks. We attempted to use the LLM to act as a "data analyst," asking it to translate complex user prompts (like "Find the average attendance per school in 2023") into executable code, such as SQL or Python. This process was found to be highly inconsistent and often failed to execute correctly.
The models frequently struggled to manage the complexity of data schema, often hallucinating incorrect column names or failing to construct necessary complex joins or aggregation logic. Another major limiting factor we encountered was the context window, which is the maximum amount of information (measured in tokens) the model can hold in its short-term memory during a request. Because the dataset's entire schema often requires thousands of tokens, the general-purpose model sometimes “chunked” the data, resulting in it only seeing a subset of the available information.
Conclusion: General vs. Specialized AI
Our findings highlight a critical distinction regarding the current suite of general-purpose tools. While we recognize that specialized tools like Julius and PowerDrill successfully perform these types of analysis functions, our inability to replicate those results leads us to believe that general LLMs are currently best suited to workflows involving generation and summary of text and interpretation of existing data visualizations.
Today, reliably exploring raw datasets and returning accurate, actionable results using LLMs requires the proprietary, specialized execution environments developed by dedicated data-focused platforms. We expect this technology to evolve rapidly, so watch this space!
