This exploratory project examines how token representations evolve through transformer layers by comparing the complete attention mechanism outputs (including residual connections) against original vocabulary embeddings. By analyzing both context-dependent and function words, it reveals intriguing patterns in how different types of tokens are processed.
Key findings show that function words tend to maintain closer relationships to their original embeddings, while content words drift more significantly through the encoding space. This suggests the attention mechanism may process these token types differently, with particularly dramatic shifts occurring in layers 5 and 11.
The project raises important questions about transformer architecture efficiency and whether computational resources could be better allocated based on token type and position in the network.
An exploration into using sequence-based models for audio data, inspired by NLP techniques. The goal was to create an audio token vocabulary using K-means clustering on spectrogram STFT slices, allowing text-style sequence processing models to classify audio. Despite a whole bunch of experimentation, the models struggled with the complexity of the AudioSet dataset, achieving a peak mAP of 0.05.
Although the results were not conclusive, I believe this project demonstrates a novel approach to audio processing and identified several promising directions for future work, such as improving class balancing and experimenting with more complex models.
This project explores transformer interpretability in audio classification using the Urbansound8K dataset and a fine-tuned Audio Spectrogram Transformer (AST). The aim was to identify the "most important" time and frequency slices in an audio spectrogram for a model's classification decision by calculating which slice caused the largest drop in predicted probability of the target class when removed.
While primarily curiosity-driven, the project offers interesting insights into the influence of different parts of an audio signal on classification outcomes and suggests potential for using the approach in data augmentation for smaller datasets.
This project ropes in a language model to play interactive fiction games alongside the user, acting as an enthusiastic in-game partner. The user can choose to play the game themselves and seek advice from the LLM, or allow the LLM to take over entirely.
The novel part of this project is the management of a three-way conversation between the player, the LLM, and the Z-Machine, creating a dynamic gameplay experience. While this is a proof-of-concept, it's an interesting mix of storytelling, interactivity, and AI.
If you'd like to get in touch, feel free to drop me a line: demo@danavery.com