Last week, OpenAI introduced a groundbreaking feature for ChatGPT known as “memory.” This feature enables users to store information that they explicitly ask the program to remember for future reference.
In addition to the memory feature, ChatGPT also offers the ability to analyze text and images through existing file-upload capabilities. Users can simply drag and drop files, such as PDFs or JPEGs, into the chat window, optionally adding a prompt, and ChatGPT will generate text output based on the content of the uploaded files.
This functionality is available to all subscribers of the $20-per-month “Plus” version. The Plus version not only provides access to the latest ChatGPT version 4, which offers higher-quality output compared to version 3.5, but also includes the use of DALL-E, an image-generation program developed by OpenAI.
File upload functionality with ChatGPT opens up various useful applications such as summarization, outlining, and advanced semantic search beyond simple keyword queries. It’s incredibly convenient—all you need to do is drag and drop the file.
One of the most compelling aspects of file upload is its ability to efficiently handle long documents and perform tasks like thematic content isolation. This represents a form of semantic search that goes beyond mere keyword matching.
For instance, imagine uploading a lengthy 4,500-word report on silicon carbide, a specialized semiconductor material widely used in electric vehicles like those produced by Tesla. You could then ask ChatGPT a question like, “In this report on silicon carbide, are there any references to non-automotive use cases?”
ChatGPT would then provide a comprehensive summary of six identified use cases from the report that are unrelated to automotive applications. This capability surpasses traditional keyword-based searches and demonstrates the power of semantic understanding.
Given its ability to efficiently parse through and extract meaningful insights from lengthy documents, many users are considering ChatGPT as their primary tool for initial document analysis and processing.
Textual summarization proves invaluable for processing lengthy transcripts, such as interviews. By uploading a 6,800-word transcript, users can obtain a concise summary highlighting the most significant topics, serving as a basis for structuring an interview outline.
Yet, it’s essential to recognize that such summaries don’t replace the craft of editing and shaping a narrative. Effective storytelling demands identifying themes, rephrasing them effectively, and, crucially, deciding what to omit. Currently, ChatGPT’s capabilities fall short in this aspect, although more nuanced prompts can enhance results.
While ChatGPT’s file analysis feature accommodates picture files, video analysis remains unsupported. However, the program adeptly identifies the contents of uploaded images and even provides descriptive commentary, serving purposes like captioning.
For instance, when presented with images of the New York City skyline, ChatGPT accurately identified landmarks such as the Empire State Building and offered insights into the architectural juxtaposition of old and modern styles.
In addition to identifying landmarks like the Empire State Building in a skyline image, ChatGPT impressively generates descriptive captions for various scenes, such as a bustling street scene in midtown Manhattan.
ChatGPT provided a fitting yet straightforward description of a photo featured in a ZDNET article from November, showcasing OpenAI executives Sam Altman and Mira Murati, without explicitly mentioning the individuals’ names.
Recent advancements in AI, such as Google’s Gemini 1.5, demonstrate the rapid evolution of image and video analysis capabilities. For instance, Gemini 1.5 can pinpoint significant moments in extensive transcripts, like Neil Armstrong’s iconic “one small step” during the Apollo mission, or identify timestamps in silent films featuring actors like Buster Keaton. These feats remain beyond the current precision of ChatGPT’s file upload feature.
However, it’s foreseeable that document analysis will eventually integrate with ChatGPT’s memory function. Manually inputting memories through prompts may not be as efficient as providing entire documents containing relevant references and background information. Looking ahead, it’s likely that a year from now, the convergence of memory and analysis will represent a significant evolution of ChatGPT, shaping its capabilities and usability.
Leave a Reply