I worked on a chrome extension a few weeks ago that skips sponsorship sections in YouTube videos by reading through the transcript. Also was trying to experiment with an LLM to explain a function call chain across languages (in this case MakeFile, Python, Bash). I've tried running a few telegram bots that are PRE prompted to do certain things like help you with taxes.
What are you building?
What does the stack look like? How do you deploy it?
I ended up using it for more general purpose things because being able to have a hands-free phone call with an AI turned out to be pretty useful.
It's offline now, but here's the code with all the stack and deployment info: https://github.com/kevingduck/ChatGPT-phone/
Edit: forgot to mention this was all running off a $35 raspberry pi.
For the dictated recipes, I told him to dictate just "flat" the words and numbers. So that I had paragraphs of recipes.
For the scanned recipes, I used Google OCR (I found out it was the best one quality wise).
For both sets of recipes, I then used GPT4 to "format" the unformatted recipes into well formatted Markdown. It successfully fixed typos and bad OCR from Google.
We then pasted all that well formatted text into a big Google Docs, and added images. Using OpenAI image generation I generated images for each of the 250+ recipes. For some of them I had to manually curate it, given that some of the recipes are for typical Mexican food: For example there's a (delicious) recipe called "PibiPollo" that for the unitiated it may look like a stew, so I had to tell something like "large corn tamale with thick hard crust".
In the end, the book was pretty nice! We distributed digital copies within the family and everybody was amazed :) . I loved spending time doing that.
The backend is a Python FastAPI that uses ChromaDB to store my resume and Q&A pairs, OpenAI, and Airtable to log requests and responses. The UI is Sveltekit.
I'm currently building a different tool and will apply some learnings to my Interactive Resume AI. Instead of Airtable, I am going to use LangSmith for observability.
I started writing and my Substack articles are also linked to via my website. I'm currently working on applying sentence window retrieval and that article will be out shortly. This is part of a #buildinpublic effort to help build my brand as well.
I've been unemployed since Sept as a Senior Software Engineer. The market is tough so I'm focusing on the above to help get employment or a contract.
* I built a real life Pokedex to recognize Pokemon [video] https://www.youtube.com/watch?v=wVcerPofkE0
* I used ChatGPT to filter nice comments and print them in my office [video] https://www.youtube.com/watch?v=AonMzGUN9gQ
* I built a general purpose chat assistant into an old intercom [video] https://www.youtube.com/watch?v=-zDdpeTdv84
Again, nothing terribly useful, but all fun.
1. sketch (in notebook, ai for pandas) https://github.com/approximatelabs/sketch
2. datadm (open source, "chat with data", with support for the open source LLMs (https://github.com/approximatelabs/datadm)
3. Our main product: julyp. https://julyp.com/ (currently under very active rebrand and cleanup) -- but a "chat with data" style app, with a lot of specialized features. I'm also streaming me using it (and sometimes building it) every weekday on twitch to solve misc data problems (https://www.twitch.tv/bluecoconut)
For your next question, about the stack and deploy: We're using all sorts of different stacks and tooling. We made our own tooling at one point (https://github.com/approximatelabs/lambdaprompt/), but have more recently switched to just using the raw requests ourselves and writing out the logic ourselves in the product. For our main product, the code just lives in our next app, and deploys on vercel.
The thing I'm working on now is AI mock interviewing. It's basically scratching my own itch, since I hate leetcode prep, and have found I can learn better through interaction. To paste a blurb from an earlier comment of mine:
I'm building https://comp.lol. It's AI powered mock coding interviews, FAANG style. Looking for alpha testers when I release, sign up if you wanna try it out or just wanna try some mock coding. If its slow to load, sorry, everything runs on free tiers right now.
I really dislike doing leetcode prep, and I can't intuitively understand the solutions by just reading them. I've found the best way for me to learn is to seriously try the problem (timed, interview like conditions), and be able to 'discuss' with the interviewer without just jumping to reading the solution. Been using and building this as an experiment to try prepping in a manner I like.
It's not a replacement for real mock interviews - I think those are still the best, but they're expensive and time consuming. I'm hoping to get 80% of the benefit in an easier package.
I just put a waitlist in case anyone wants to try it out and give me feedback when I get it out
Gonna apologize in advance about the copywriting. Was more messing around for my own amusement, will probably change later
Runs on a local LLM, because even using GPT3 costs would have added up quickly.
Currently requires CUDA and uses a 10.7B model but if anyone wants to try a smaller one and report results let me know on github and I can give some help.
Built entirely on Vercel & OpenAI. Took about a day, hardest part was configuring Sign In With Google. Had several dozen candidates use it, saved a lot of time and helped prioritize conversations.
I just did a brief writeup about it yesterday: https://www.linkedin.com/pulse/i-built-ai-hiringscreening-as...
I believe a large company like Meta, or any of the other companies with messaging platforms, would find this valuable. Especially because they will be fined by the UK for fraud that takes place on their messaging services.
- Site: https://emergingtrajectories.com/
- GitHub repo: https://github.com/wgryc/emerging-trajectories
I've helped a number of companies build various sorts of LLM-powered apps (chatbots mainly) and found it interesting but not incredibly inspiring. The above is my attempt to build something no one else is working on.
It's been a lot of fun. Not sure if it'll be a "thing" ever, but I enjoy it.
There's also Moss[3], a GPT that acts as a senior, inquisitive, and clever Go pair programmer. I use it almost daily to help me code and it has been an huge help productivity-wise.
Powered by whisper-timestamped [1] using a model trained by the local tech university TTÜ [2]
And it just… works! (with some tweaks and corrections)
Primarily it was a PoC to see if a document based chatbot could work without crossing trust boundaries by calling out to untrusted APIs. It only makes calls to localhost.
If you’re familiar with the novel you will be pleased to know that the chatbot ended a recent answer with, “I must go now as I have an appointment with my chamber pot and I wouldn’t want to keep it waiting.”
[1]https://github.com/FlowiseAI/Flowise
[5]https://www.gutenberg.org/ebooks/1079
Everything runs on a Mac Mini with the M2 Pro CPU/GPU and Mac OS Sonoma.
Stack is a combination of TypeScript (Next / Node) + Python with a pretty simple deployment setup right now (GHA -> Container -> Cloud Run).
We’re processing the top podcasts in many genres every day (currently thousands of daily episodes) and running them through our pipeline.
From this we’ve made a semantic search engine, for example: https://www.podengine.ai/podcasts/search?search_term=Should+...
We’re soon going to improve and summarise the responses from the raw embeddings in a few ways. Would love some feedback on the experience.
We have also opened up a keyword alerting feature to alert folks when they’ve been talked about in an episode.
I initially built it using llama.cpp for offline LLM inference, but soon discovered mlc-llm and moved to using it, because the latter is way faster and flexible.
Ultimately I wanted a whole marketplace where anybody can create a tour and then sell it.
But the process of creating the tours was quite laborious.
So to speed this up I fed GPT-4 information about local points of information and had it write the questions and the multi choice answers. It also wrote some narrative bits as various personas. For example, there was a Christmas hunt where GPT4 played the part of an elf and came up with a theme about Santa needing to recruit you to be a new elf, once you’d answered all the various clues etc.
Front end is React Typescript, backend is Net Core Web API on Linux with MySQL under EF Core and also integrations with GPT4 and Stripe.
It’s hosted at treasuretours.org
Only superusers can access the AI tools right now because cost, but you can try out some of the pre-made hunts which were partially AI generated.
I have been hacking together a poor-man's crunchbase that's fueled by GPT.
React / Python / Supabase. The most interesting piece thus far has been the success of the self-correcting loops through GPT. At each turn basically feeding the results back to another 3.5 prompt that is only about reviewing quality. I found that with these loops you can get solid results without having to use the more expensive GPT4 API.
(Also loving all the projects in this thread)
I've built some abstract content development tools, generally focused on building larger content somewhat top-down (defining vibes, then details).
I'm working on a general project helper using the GPT-Vision, voice, and regular GPT. You setup the camera above your workspace, work on paper, and chat with the LLM while you do it. I think there's a lot of potential, but the voice stuff is quite hard to deal with... there's just a ton of stuff happening in parallel, and I find it very hard to code something reliable.
The stack I use is all in the browser, generally Next.js, Preact Signals, and my own code to call into GPT, Whisper, etc. I like having everything available for inspection, and I generally keep all the working bits visible somewhere. (This can be overwhelming when other people see it.)
But I haven't gotten over the deployment hump... the cost and complexity is a challenge. I've used Openrouter.ai recently in a project, and I think if I leaned on that more completely I'd find the release process easier.
Here's the story:
At first I was building a tool for stock analysis- the user writes in free language what companies they want to compare, along with a time period, and their requested stocks show up on a graph. They can then further reiterate on it- add companies, and change range all in free language (I had many more analysis functions planned). Following some unique dev challenges I've found- I ended up not releasing the product (possibly will sometime in the future..), and switched to work on a dev platform to help with these challenges.
I was using what I called 'LLM structured task'- basically instructing the LLM to perform some task on the user input, and outputting a json that my backend can work with (in the described case- finding mentioned companies and optional time range, and returning stock symbols, and string formatted dates). The prompting has turned out to be not trivial, and kind of fragile- things broke with even minor iterations on the prompt or model configurations. So- I developed a platform to help with that- testing (templated) prompt versions, as well on model configurations on whole collections of inputs at once- making sure nothing breaks in the development process (or after). * If you're interested, welcome to check it out on https://www.promptotype.io
AI concierge for my parents’ vacation rental. Mostly just pulling info from the guest binder, but I’ve also started using some local guides to give better suggestions. Built with NextJs and deployed on Vercel (was really easy and they have a generous free tier).
It's macro + calorie tracking over text message. You just text what you eat and it matches against a food database to estimate your food intake. It's basically an easier alternative to MyFitnessPal.
My stack is OpenAI on Azure, Vercel, Convoy, FatSecret API, Postmark, NextJS.
It’s basically an LLM-based RAG that works over the best blogs and websites covering any topic you provided during onboarding.
The tool still needs a trust mechanism and a coherent incremental publishing strategy to be able to operate in a public fashion. Right now, running one node using my RTX 3060 it would take 1.2 years to do one split of the C4 dataset.
https://arxiv.org/abs/2401.16380
Deployment is usually FastAPI for business logic, Langchain or MS/Guidance library, LLM hosted via. HF-TGI server
Then things like:
“Fix My Japanese” - uses LLM to correct Japanese grammar (built with Elixir LiveView): https://fixmyjapanese.com
It has different “Senseis” that are effectively different LLMs, each with slightly different style. One is Claude, one is ChatGPT.
Or a slack bot that summarizes long threads:
Basically, I want to write a book without having to type out the whole thing. I got the dictation idea from an episode of Columbo.
It is very much a work in progress and a proof of concept for another writing tool I want to make.
But I found that LLMs are often wrong and hallucinates, so I have to double check with google or other resources.
So I built a google and chatgpt alternative to answer any question and hallucinations are more obvious. I do this by using by multiple LLM's including search enabled ones i.e. GPT4, Gemini, Claude, Perplexity, Mistral, and Llama.
It's been growing healthily https://labophase.com
I made it available to the public aisearch.vip
https://github.com/codevideo/codevideo-ai
My goal is to definitely NOT generate the course content itself, but just take the effort out of recording and editing these courses: you provide (or get help generating) the stuff to speak and the code to write and the video is deterministically generated) The eventual vision is to convert book or article style text to generate the steps to generate the video in an as-close-as-possible-to-one-shot.
I also leverage Eleven Lab's voice cloning (technically not an LLM, but impressive ML models nonetheless)
For anyone more curious, I'm wondering if what I'm trying to do is in general a closed problem - to be able to generate step by step instructions to write functional code (including modifications, refactoring, or whatever you might do in an actual software course) or if this truly is something that can't be automated... any resources on the characteristics of coding itself would be awesome! What I'm trying to say is, at the end of the day code in an editor is a state machine - certain characters in a certain order produce certain results. Would love if anyone had more information about the meta of programming itself - abstract syntax trees and work there comes to mind, but I'm not even sure of the question I'm asking yet or trying to clarify at this point.
Unfortunately only for German, but I plan on expanding the languages soon.
Tech stack: - The app is in Flutter. - Backend I'm nodejs TS. - GPT4 for generation of sentences and explanations - GCP text-to-speech for audio
It doesn't do a ton, but it's kinda cool. Feel free to fix/add anything https://github.com/k-zehnder/gophersignal
GPT-4 excels as a translator, but it often encounters issues with content warnings and formatting errors when translating entire subtitle files via ChatGPT. The solution is straightforward: divide the subtitle file into sections, focusing solely on translating the text and disregarding the timestamps. While it's feasible to have ChatGPT maintain the correct format, I've observed a decline in translation quality when attempting this in a single pass. My preferred approach is a two-phase method: first, translate the text, and then, if necessary, request ChatGPT to adjust the formatting.
The webapp splits the srt file into batches of 20 phrases and translates each batch. It also allows for manual correction of the final translation.
Ah and it's also serverless: you input your OpenAI token & select the model of your choice and the webapp makes the requests to OpenAI directly.
- https://github.com/iloveitaly/sql-ai-prompt-generator generate a ChatGPT prompt with example data for a sqlite or postgres DB
- https://github.com/iloveitaly/conventional-notes-summarizati... summarize notes (originally for summarizing raw user interview notes)
- https://mikebian.co/using-chatgpt-to-convert-labcorp-pdfs-in... convert labcorp documents into a google sheet
- https://github.com/iloveitaly/openbook scrape VC websites with AI
It's just a hodgepodge of prototype scripts, but one that I actually used on a few occasions already. Most of the work is manual, but does seem easily run as "fire and forget" with maybe some ways to correct afterwards.
First, I'm using the pyannote for speech recognition: it converts audio to text, while being able to discern speakers: SPEAKER_01, _02, etc. The diarization provides nice timestamps, with resolution down to parts of words, which I later use in the minimal UI to quickly skip around, when a text is selected.
Next, I'm running a LLM prompt to identify speakers; so if SPEAKER_02 said to SPEAKER_05 "Hey Greg", it will identify SPEAKER_05 = Greg. I think it was my first time using the mistral 7b and I went "wow" out loud, once it got correct.
After that, I fill in the holes manually in speaker names and move on to grouping a bunch of text - in order to summarize. That doesn't seem interesting at a glance, but removing the filler words, which there are a ton of in any presentation or meeting, is a huge help. I do it chunk by chunk. I'm leaning here for the best LLM available and often pick the dolphin finetune of mixtral.
Last, I summarize those summarizations and slap that on the front of the google doc.
I also insert some relevant screenshots in between chunks (might go with some ffmpeg automatic scene change detection in the future).
aaand that's it. A doc, that is searchable easily. So, previously I had a bunch of 30 min. to 90 min. meeting recordings and any attempt at searching required a linear scan of files. Now, with a lot of additional prompt messaging I was able to:
- create meeting notes, with especially worthwile "what did I promise to send later" points
- this is huge: TALK with the transcript. I paste the whole transcript into the mistral 7b with 32k context and simply ask questions and follow-ups. No more watching or skimming an hour long video, just ask the transcript, if there was another round of lay-offs or if parking spaces rules changed.
- draw a mermaid sequence diagram, of a request flowing across services. It wasn't perfect, but it got me super excited about future possibilities to create or update service documentation based on ad-hoc meetings.
I guess everybody is actually trying to build the same, seems like a no-brainer based on current tool's capabilities.
The stack is mostly python running locally, and calling the OpenAI API (although we have plans to support offline models).
For better visual understanding, we use a custom fork of Set-of-Mark prompting (https://github.com/microsoft/SoM) deployed to EC2 (see https://github.com/OpenAdaptAI/SoM/pull/3).
Our backend stack: - AWS - SST - TypeScript
Our clients:
- Next (web) - Vanilla React Native (mobile)
OpenAI's App Store announcement is what got us interested in building w/ LLMs.
1. Analyze calories/macronutrients from a text description or photo
2. Provide onboarding/feedback/conversations like you'd get from a nutritionist
My stack is Ruby on Rails, PostgreSQL, OpenAI APIs. I chose Rails because I'm very fast in it, but I've found the combination of Rails+Sidekiq+ActionCable is really nice for building conversational experiences on the web. If I stick with this, I'll probably need a native iOS app though.
Vendor stack is: GitHub, Heroku (compute), Neon (DB), Loops.so (email), PostHog (analytics), Honeybadger (errors), and Linear.
The more notable one was experimenting with LLMs as high level task planners for robots (https://hlfshell.ai/posts/llm-task-planner/).
The other is a golang based AI assistant, like everyone else is building. Worked over text, had some neat memory features. This was more of a "first pass" learning about LLM applications. (https://github.com/hlfshell/coppermind).
I plan to revisit LLMs as context enriched planners for robot task planning soon.
And I'm working on a webapp that is a kanban board where LLM and human collaborate to build features in code. I just got a cool thing working there: like everyone, having LLM generate new code is easy but modifying code is hard. So my attempt at working on modifying code with LLM is starting with HTML and having GPT-4 write beautfulsoup code that then makes the desired modification to the HTML file. Will do with js, python via ast, etc. No link for this one yet :) still in development.
Also, hello HN! If you are interested, use this promo code for 50% off your first purchase ;)
HELLOHACKERNEWS
****
Project 2 - I also built a YouTube summarizer for individual video called Summary Cat (https://www.summarycat.com). It is not open source for now. The stack is very similar to project 1.
****
And yes I like summarizing YouTube videos:)
It is a Next.js application, calling OpenAI’s API using a plain API route.
[2] https://spliit.app/blog/announcing-receipt-scanning-using-ai
A Decentralised AI App store with cross border micro transactions.
You will be able to sell your LLM output (could be multi modal) for dollars or you decide. (LLMs working on your infra, you can keep weights for yourself forever.)
https://dev.invoker.network/share/9/0 (Dev environment is ready).
1. Games you can play with word2vec or related models (could be drop in replaced with sentence transformer). It's crazy that this is 5 years old now: https://github.com/Hellisotherpeople/Language-games
2. "Constrained Text Generation Studio" - A research project I wrote when I was trying to solve LLM's inability to follow syntactic, phonetic, or semantic constraints: https://github.com/Hellisotherpeople/Constrained-Text-Genera...
3. DebateKG - A bunch of "Semantic Knowledge Graphs" built on my pet debate evidence dataset (LLM backed embeddings indexes synchronized with a graphDB and a sqlDB via txtai). Can create compelling policy debate cases https://github.com/Hellisotherpeople/DebateKG
4. My failed attempt at a good extractive summarizer. My life work is dedicated to one day solving the problems I tried to fix with this project: https://github.com/Hellisotherpeople/CX_DB8
2) https://amiki.app - practise speaking French, Spanish, German or Italian with a 3D partner. Flutter web with Whisper and my own rendering package.
https://github.com/psugihara/FreeChat
I'm also working on a little text adventure game that I hope to release soon.
Now with LLMs it’s simple to extract structured data from emails.
I built [Orderling](https://orderl.ing) that is basically a CRM for your orders. It uses OpenAI api to extract the order information and automatically adds it.
The results were actually hilarious... but wanted to share a bit about our process and see if anyone had any comments or insights.
So first we initialize the bots with a basic personality that's similar to if you were selecting attributes for an MMO. Things like intelligence, toxicity, charisma and the like. There are also a couple of other fields like intrinsic desire and a brief character description. These are fed to the model as a system prompt with each inference.
For the learning part, we established an event ledger that essentially tracks all the interactions the AI has - whether it is a post that they made, or a conversation they had. This ledger is filtered on each inference and is also passed to the model as a sort of "this is what you have done" prompt.
Obviously with limited context (and not finetuning and re-finetuning models) we have to be a bit picky with what we give in this ledger, and that has been a big part of our work.
Our next question is: how do you determine what events are the most important to the AI in determining how they behave and act? It's been interesting!
The platform is anotherlife.ai for those curious!
You just export and upload a WhatsApp conversation and it will learn the personality AND voice of your conversation partner. You can send/receive text or voice messages; It was pretty damn spooky to actually have a voice conversation back and forth with an AI standing in for my "friend"
As for the stack, I have Supabase and Typescript on the frontend, python on the backend and k3's as a cluster for my apps (can recommend this if you want to get devops-y on a budget). Next time, I'll just go pure Typescript since python really doesn't add much working this far away from the base models.
We built it in Kotlin with Ktor server, htmx and tailwind. It uses a mixture of models, including gpt4-turbo, gpt4-vision and gemini-pro-vision. It's deployed using Kamal on bare metal.
Example canvas that provides a roundup of Apple Vision Pro reviews: https://jumprun.ai/share/canvas/01HNXB2K3GM7KPRP45Y2CVVJSC
Our learn more page with some screenshots to show creating a canvas: https://jumprun.ai/learn-more
It's a free closed beta at the moment to control costs, but let me know if you'd like an invite.
The last app, the only one that was deployed anywhere, is https://catchingkillers.com This app is a simple murder mystery game where the witnesses and the killer are ChatGPT bots. The first two stories are complete and active, the third is not complete yet. The first story of the working two is taken from another murder mystery group game https://www.whodunitmysteries.com/sour.html. The second story was highly influenced by ChatGPT.
It's a bit rough because I didn't spend too much time on it, but if anyone does signup to play, I'd love to hear feedback.
I would also in future try to make it generic so that it can crawl any website and store new contents in vector databases. Response to user query then can be returned by combining the vector search and llm
https://link.springer.com/chapter/10.1007/978-3-031-28238-6_...
Financial earnings calls are important events in investment managements: CEOs and CFOs present the results of the recent quarter, and a few invited analysts ask them questions at the end in a Q&A block.
Because this is very different prose from news, traditional summarization methods fail. So we pre-trained a transformer from scratch with a ton of high-quality (REUTERS only) finance news and then fine-tuned with a large (100k sentences) self-curated corpus of expert-created summaries.
We also implemented a range of other systems for comparison.
It looks through past transcripts, topics, view counts, and other metadata so users can quickly learn what a Youtuber is all about.
AgentX (https://theagentx.com), an LLM chat support app is one of the projects I built on this framework. It is a self-updating customer support agent that is trained on your support docs. Not only does this answer your customer questions, it provides summaries of the queries so you get a sense of where your product and/or documentation is deficient.
I’m working for an edTech company. Some students prefer video. So I built a Django app that takes a block of text and formats it into a set of slides, each with a title, some bullet points, an Dalle-3 generated image, and a voiceover.
It then compiles that all into a video.
The stack is simple, preact in the fronted with a custom framework on top and bun on the backend calling OpenAI, I may port it to rust in the future.
I plan to try local LLMs when I have some free time.
For now each users runs the application locally with their own keys[3].
[1] https://www.youtube.com/watch?v=nS1wsif3y94
[2] https://www.youtube.com/watch?v=f-txlMDLfng
[3] Alpha software, check the readme: https://gloodata.com/download/
Example 1: i get an email from a potential customer that says they want [product A]. I can forward that email (or call notes) to salesforce (or somewhere) and it will understand the preference and the relevant customer and update that customer's profile.
Example 2: In a B2B context, lets say my customer is a company, and there is a news article about them. I could forward a link to the article to the LLM and it would understand that the article is about a customer, and append that article and key info about it to my saleforce record for that customer. The news item becomes an object that is linked to that customer (for call context, better sales targeting, profiling, etc).
Can someone help me build that?
So far this is being used for:
- Sales -> guiding new recruits during more complex client calls
- HR -> Capturing respones during screening interviews
If you'd like to try this out feel free to DM me or email me at andrew at sightglass.ai, we're looking for more testers!
Currently running on my little digital ocean droplet. Stack is javascript/python.
It is still a work in progress (early beta), but you can check it out at https://www.bonamiko.com
Currently I have mainly been using it as a tandem conversation partner for a language I'm learning, but it can be used for many more things. As it is right now, you can use it to bounce ideas of, practice interviews, and help answer quick general questions. You just need to tell it what you want.
The stack is a Next.js application hosted on Vercel using Supabase for the backend. (There is also some plumbing in AWS for email and DNS.) It is automatically deployed via GitHub actions.
Since it is python library, we deploy it to pypi. But for using it on my own, I am using H100 linux server on the torch docker & CUDA. Running it needs only vim and bash. And plus, for running local model I love VLLM. I make my own VLLM Dockerfile and use it for deploying local model in 5 minutes.
FYI : Borrowing whole H100 instance is really expensive, but in my hometown, the government support us the instance for researching AI.
I think what will be really powerful is to have a registry for plugins and agents that can be easily installed in the system. Sort of like WordPress in that way. Also similar to an open source GPT store.
https://github.com/runvnc/agenthost
I believe the are several variations of this type of idea out there.
This project is a chatroom application that allows users to join different chat rooms, send messages, and interact with multiple language models in real-time. The backend is built with Flask and Flask-SocketIO for real-time web communication, while the frontend uses HTML, CSS, and JavaScript to provide an interactive user interface.
demo here supports communication with `vllm/openchat`:
It should be cheap enough to deploy that it can be applied to relatively low-value content like video meeting recordings, so it can’t spend a lot of expensive GPU time analyzing video frames.
It also needs to be easily customizable for various content verticals and visual styling like branding and graphics overlays.
And everything is meant to be open sourced, so that’s fun!
I wrote about it on my employer’s blog here:
https://www.daily.co/blog/automatic-short-form-video-highlig...
The stack is: 1. TypeScript/Node/tRPC/Postgres/Redis/OpenAI on the backend 2. SolidJS/Crxjs/tRPC on the front end 3. Astro for the docs/marketing site
And deployment is currently through render.com for the databases and servers, and manually via a zip file to the Chrome webstore for the extension itself.
Heavily inspired by https://humanornot.ai/ (which was a limited time research by Ai21 Labs), now the project is on its own path to be more that just a test.
My work is to make AI chats sound like real humans and it's shocking how good sometimes the AIs are .
Even I as a creator, knowing everything (prompts, fine-tuning data, design, backend etc.), often can't tell if I'm speaking to human or designed by me AIs
A tool to RAG a github repo, so i can ask questions of how a certain library or tool works? Even better if it pulls in issues
Github Link: https://github.com/joiahq/joia
Benefits vs the original: - Easy to invite entire teams and centralize billing - Talks to any Large Language Model (eg: Llama 2, Mixtral, Gemini) - Collaborative workspace to easily share GPTs within the team, similar to how Notion pages work - Savings of 50%-70% vs ChatGPT's monthly subscription
Tech stack: NextJS, Trpc and Postgres. All wonderful technologies that have helped me develop at the speed of thought.
I've built this by using AI as the foundation for everything. I am using LLMs to classify information and extract structured data points for any webpage, or RAG for finding data.
Tech stack: - Mistral 8x7b and Perplexity API for data processing and GPT-4 input - GPT-4 for content output - pgvector in Supabase - LangChain for the pipeline and RAG stuff
It's a flutter app (in beta on Google play store currently) that uses OpenAI embeddings with Postgres pg_vector DB hosted in Supabase. Any poor matches go to Dalle3 for generation.
Our charity (I am vice-chair on the board) is hoping to use it as part of our program: https://learningo.org/app/
I consult to a law firm as their founder-in-residence. For fun, I trained Llama 2 on all the non-client data of the firm so that people could ask it questions like "Who are the lawyers in Montreal who litigate American securities laws, what are their email addresses and what time is it where they are?" It's a njs app running on linode.
It's extremely simple, but people seem to find it useful.
Check it out here: https://app.commonbase.ai/
It has been a huge help for me when working with certain open-source libraries.
We used Plasmo to build the chrome extension, React for the frontend, and currently OpenAI as the LLM provider.
Currently it only works with Gmail but we plan on adding other email providers as well.
Feel free to check it out: https://chromewebstore.google.com/detail/mysterian-ai-for-gm...
I also wrote PromptPrompt, which is a free and extremely light-weight prompt management system that hosts + serves prompts on CDNs for rapid retrieval (plus version history): https://promptprompt.io
- Nextjs
- Deno Deploy for hosting the apis
- Supabase - postgres / auth
- Shadcn
I want to use the t3 app stack [2] for v2.It's really MVP, but I want to see if anyone is interested at all before I work on v2: creating gpts that come with databases!
[1] https://textool.dev
[2] https://create.t3.gg/
The stack is react / cloud run / job queue / LLMs (several) / vector db.
Little demo is up at npcquick.app.
Doesn’t look like much rn, but there’s no openai involved. Currently it doesn’t even use a gpu.
2. An embeddings-based job search engine: https://searchflora.com
3. I used LLMs to caption a training set of 1 million Minecraft skins, then finetuned Stable Diffusion to generate minecraft skins from a prompt: https://multi.skin
I made a private Discord bot for me and my friends to talk to, that also generates images using SD 1.5 LCM.
The self-hosted backend uses the ComfyUI Python API directly for images, and the LLM part uses oobabooga's web API.
I've tried several models and gpt4 is currently the one that better performs, but OS LLM's like Mixtral and Mixtral-Nous are quite capable too.
Still on progress at https://www.chathip.com/
It gives back ChatGPT styled answers, but they contain citations to underlying academic articles so that you know the claims are valid. Clicking on the reference actually takes you directly the paragraph in the source material where the claim was found.
https://botsin.space/@jokeunderstander
It's just a bash script that calls ollama on my desktop PC every morning and schedules a handful of posts on the Mastodon server.
Very pleased with how it turned out as it really brings the potential of LLMs to life IMO.
It's currently free to use. Its built using nextjs+tailwind and is powered by Vercel + Brave + Gemini Pro. https://xplained.vercel.app
There are other projects that I worked on as part of my job, mostly around bots, search, classification, and analytics.
Idea is to use it to identify code that sticks out, because that usually what's interesting or bad.
I'm building a minimal, cross-browser, and cross-platform UI for Ollama.
Stack: HTML, CSS, JavaScript, in other words, no dependency on React, Bootstrap, etc. Deployment: web server, browser extension, desktop, mobile
A chrome extension to show processed video overlay on YouTube to highlight motion.
A script to show stories going up and down on HN front page. This one just took 1 prompt.
Homeschoolmate.com
I have no time to read all that generic "vibrant neighborhood" stuff :D
I'm using modal.com as the backend for the AI related micro services.
I own the domain homestocompare and I am working on a project that will use AI to help compare homes. Unfortunately I don't have a working demo yet but please reach out to me if you would be interested in finding out more.
It does the work of understanding questions in the context of a repo, code snippet, or any programming question in general, and pulls in extra context from the internet with self thought + web searches.
You might want to connect that to SponsorBlock
I also use LLMs in some other web apps, but mainly as incidental writing aids, rather than the central feature of the app.
It adds no value beyond entertainment, but I suppose it does do that.
It is https://cmaps.io
Right now I’m working on including automatic fact checking and insights on how each source might be opinionated vs. reporting just the facts.
Super interesting learning exercise since it intersects with many enterprise topics, but the output is of course more fun.
In some ways it is more challenging - a summary is still useful if it misses a point or is a little scrambled, whereas when a story drops a thread it’s much more immediately problematic.
I’m working on a blog post as well as getting a dozen episodes uploaded for “season 1”.
Used SvelteKit and Supabase. Deployed to Cloudflare Pages.
https://chromewebstore.google.com/detail/news-article-summar...
I have all of the docs with summaries on a small webserver here: https://ayylmao.info
Simple Flask site with SQLite as the database.
I hit some interesting challenges, overcoming which was a valuable set of lessons learnt:
1. GPT4 Turbo slowed down to molasses in some Azure regions recently. Microsoft is not admitting this and is telling people to use GPT3.5 instead. The lesson learned is that using a regional API exposes you to slowdowns and queuing caused by local spikes in demand, such as “back to school” or end of year exams.
2. JSON mode won’t robustly stick to higher level schemas. It’s close enough, but parsing and retries are required.
3. The 128K context in GPT4 is only for the input tokens! The output is limited to 4K.
4. Most Asian languages use as many as one token per character. Translating 1 KB of English can blow through the 4 KB token limit all too easily.
5. You can ask GPT to “continue”, but then you have to detect if you received a partial or a complete JSON response, and stitch things together yourself… and validate across message boundaries.
6. The whole process above is so slow that it hits timeouts all over the place. Microsoft didn’t bother to adjust any of their default Azure SDK timeouts for HTTP calls. You have to do this yourself. It’s easy, just figure which of the three different documented methods are still valid. (Answer: none are.)
7. You’ll need a persistent cache. Just trust me on this. I simply hashed the input and used that as a file name to store responses that passed the checks.
8. A subtitle file is about 30–100 KB so it needs many small blocks. This makes the AI lose the context. So it’s important to have several passes so it can double check and stitch things together. This is very hard with automatic parsing of outputs.
9. Last but not least: the default mode of Azure is to turn the content policy up to “puritan priest censoring books”. Movies contain swearing, violence, and sex. The delicate mind of the machine can’t handle this, and it will refuse to do as it is asked. You have to dial it down to get it to do anything. There is no “zero censorship” setting. Microsoft says that I can’t feed text to an API that I can watch on Netflix with graphic visuals.
10. The missus says that the AI-translated subtitles are “perfect”, which is a big step up from some fan translated subtitles that have many small errors. Success!
I wrote this as a C# PowerShell module because that makes it easy to integrate the utility as a part of a pipeline. E.g.: I can feed it a directory listing and it’ll translate all of the subtitles.
The performance issues meant I had to process 8x chunks in parallel. Conveniently I already had code lying around to do this in PowerShell with callbacks to the main thread to report progress, etc…
My process involves generating chapters as markdown, using a script to join chapters together, and then finally converting the markdown to ebooks using Gitbook.
https://github.com/tg12/gpt_jailbreak_status
Let say, you have a row with 4 fields, you chat with your row, then you apply same conversation to all other rows!
https://www.youtube.com/watch?v=e550X6R89W4 https://bulkninja.com/