Show HN: I built an AI that turns GitHub codebases into easy tutorials

(github.com)

923 points | by zh2408 73 days ago

64 comments

bilalq 72 days ago
This is actually really cool. I just tried it out using an AI studio API key and was pretty impressed. One issue I noticed was that the output was a little too much "for dummies". Spending paragraphs to explain what an API is through restaurant analogies is a little unnecessary. And then followed up with more paragraphs on what GraphQL is. Every chapter seems to suffer from this. The generated documentation seems more suited for a slightly technical PM moreso than a software engineer. This can probably be mitigated by refining the prompt.
The prompt would also maybe be better if it encouraged variety in diagrams. For somethings, a flow chart would fit better than a sequence diagram (e.g., a durable state machine workflow written using AWS Step Functions).
[-]
- cushychicken 72 days ago
  Answers like this are sort of what makes me wonder what most engineers are smoking when they think AI isn’t valuable.
  I don’t think the outright dismissal of AI is smart. (And, OP, I don’t mean to imply that you are doing that. I mean this generally.)
  I also suspect people who level these criticisms have never really used a frontier LLM.
  Feeding in a whole codebase that I’m familiar with, and hearing the LLM give good answers about its purpose and implementation from a completely cold read is very impressive.
  Even if the LLM never writes a line of code - this is still valuable, because helping humans understand software faster means you can help humans write software faster.
  [-]
  - linotype 72 days ago
    Many devs still think their job is to write code not build products their business needs. I use LLMs extensively and it’s helped me work better faster.
    [-]
    - grugagag 71 days ago
      LLMs excel at some things and work very poorly at others. People working on different problems have had different experiences, sometimes opposite ends of the spectrum.
      [-]
      - danieldk 71 days ago
        I think the people who claim 10x-100x productivity improvements are working on tasks where LLMs work really well. There is a lot of development work out there that is relatively simple CRUD and LLMs are very good at it. On the complete opposite end we have designing new algorithms/data structures or extending them in a novel way. Or implementing drivers for new hardware from incomplete specs. LLMs do not do well on these tasks or even slow down developers 10x.
        So, I think the claims of improvement in productivity and regression in productivity can be true at the same time (and it's not just that people who don't find using LLMs productive are just prompting them wrong).
        I think most can be gained by learning in which areas LLMs can give large productivity boosts and where it's better to avoid using them. Of course, this is a continuous process, given that LLMs are still getting better.
        Personally, I am quite happy with LLMs. They cannot replace me, but they can do a chunk of the boring/repetitive work (e.g. boilerplate), so as a result I can focus on the interesting problems. As long as we don't have human-like performance (and I don't feel like we are close yet), LLMs make programming more interesting.
        They are also a great learning aid. E.g., this morning I wanted to make a 3D model for something I needed, but I don't know OpenSCAD. I iteratively made the design with Claude. At some point the problem becomes too difficult for Claude, but with the code generated at that point, I have learned enough about OpenSCAD that I can fix the more difficult parts of the project. The project would have taken me a few hours (to learn the language, etc.), but now I was done in 30 minutes and learned some OpenSCAD in a pleasant way.
        [-]
        kaycebasques 71 days ago
        Your OpenSCAD experience is an important point in the productivity debates that is often not discussed. A lot of projects that were previously impossible are now feasible. 10 years ago, you might have searched the OpenSCAD docs, watched videos, felt like it was impossible to find the info you needed, and given up. Claude and similar tools have gotten me past that initial blocker many times. Finding a way to unblock 0 to 1 productivity is perhaps as important (or maybe even more important than) as enabling 1 to 10 or 1 to 100.
        iteria 71 days ago
        You don't even need such fancy examples. There are plenty of codebases where people are working with code that is over a decade old and has several paradigms all intermixed with a lot of tribal knowledge that isn't documented in code or wiki. That is where AI sucks. It will not be able to make meaningfully change in that environment.
        There is also the frontend and tnpse code bases don't need to be very old at all before AI falls down. NPM packages and clashing styles in a codebase and AI has been not very helpful to me at all.
        Generally speaking, which AI is a fine enhancement to autocomplete, I haven't seen it be able to do anything more serious in a mature codebase. The moment business rules and tech debt sneak in in any capacity, AI becomes so unreliable that it's faster to just write it yourself. If I can't trust the AI to automatically generate a list of exports in an index.ts file. What can I trust it for?
        [-]
        simonw 71 days ago
        When is the last time you tried using LLMs against a large, old, crufty undocumented codebase?
        Things have changed a lot in the past six weeks.
        Gemini 2.5 Pro accepts a million tokens and can "reason" with them, which means you can feed it hundreds of thousands of lines of code and it has a surprisingly good chance of figuring things out.
        OpenAI released their first million token models with the GPT 4.1 series.
        OpenAI o3 and o4-mini are both very strong reasoning code models with 200,000 token input limits.
        These models are all new within the last six weeks. They're very, very good at working with large amounts of crufty undocumented code.
        [-]
        grugagag 71 days ago
        Ultimately LLMs don’t really understand what the code does at runtime. Sure, just parsing out the codebase can help make a good guess but in some cases it’s hard to trust LLMs with changes because the consequences are unknown in complex codebases that have weird warts nobody documented.
        Maybe in a generation or two codebases will become more uniform and predictible if fewer humans do it by hand. Same with self driving cars, if there were no human drivers out there the problem would become trivial to conquer.
        [-]
        simonw 70 days ago
        That's a lot less true today than it was six weeks ago. The "reasoning" models are spookily good at answering questions about how code runs, and identifying the source of bugs.
        They still make mistakes, and yeah they're still (mostly) next token predicting machines under the hood, but if your mental model is "they can't actually predict through how some code will execute" you may need to update that.
        LunaSea 70 days ago
        Gemini 2.5 Pro crashes with a 50) status code every 5 requests. Not great for a model you're supposed to rely on.
        [-]
        simonw 70 days ago
        Yeah, there's a reason it still has "preview" and "experimental" in the model names.
  - kaycebasques 71 days ago
    > hearing the LLM give good answers about its purpose and implementation from a completely cold read
    Cold read ability for this particular tool is still an open question. As others have mentioned, a lot of the example tutorials are for very popular codebases that are probably well-represented in the language model's training data. I'm personally going to test it on my private, undocumented repos.
  - tossandthrow 71 days ago
    > Even if the LLM never writes a line of code - this is still valuable, because helping humans understand software faster means you can help humans write software faster.
    IMHO, Ai text additions are generally not valuable and I assume, until proven wrong, that Ai text provides little to no value.
    I have seen so many startups fold after they made some ai product that on the surface level appeared impressive but provided no substantial value.
    Now, I will be impressed by the ai that can remove code without affecting the product.
    [-]
    - jonahx 71 days ago
      > Now, I will be impressed by the ai that can remove code without affecting the product.
      Current AIs can already do this decently. With the usual caveats about possible mistakes/oversight.
  - otabdeveloper4 69 days ago
    Summarization is one thing LLM's can do well, yes. (That's not what this current hype cycle is selling though.)
  - panny 71 days ago
    >Answers like this are sort of what makes me wonder what most engineers are smoking when they think AI isn’t valuable.
    I'll just wait for a winner to shake out and learn that one. I've gotten tired of trying AIs only to get slop.
  - CodeMage 71 days ago
    > Answers like this are sort of what makes me wonder what most engineers are smoking when they think AI isn’t valuable.
    Honestly, I wonder if I'm living in some parallel universe, because my experience is that "most engineers" are far from that position. The reactions I'm seeing are either "AI is the future" or "I have serious objections to and/or problems with AI".
    If you're calling the latter group "the outright dismissal of AI", I would disagree. If I had to call it the outright dismissal of anything, it would be of AI hype.
    > I also suspect people who level these criticisms have never really used a frontier LLM.
    It's possible. At my workplace, we did a trial of an LLM-based bot that would generate summaries for our GitHub PRs. I have no idea whether it's a "frontier" LLM or not, but I came out of that trial equally impressed, disappointed, and terrified.
    Impressed, because its summaries got so many details right. I could immediately see the use for a tool like that: even when the PR author provides a summary of the PR, it's often hard to figure out where to start looking at the PR and in which order to go through changes. The bulleted list of changes from the bot's summary was incredibly useful, especially because it was almost always correct.
    Disappointed, because it would often get the most important thing wrong. For the very first PR that I made, it got the whole list of changes right, but the explanation of what the PR did was the opposite of the truth. I made a change to make certain behavior disabled by default and added an option to enable it for testing purposes, and the bot claimed that the behavior was impossible before this change and the PR made it possible if you used this option.
    Terrified, because I can see how alluring it is for people to think that they can replace critical thinking with AI. Maybe it's my borderline burnout speaking, but I can easily imagine the future where the pressure from above to be more "efficient" and to reduce costs brings us to the point where we start trusting faulty AI and the small mistakes start accumulating to the point where great damage is done to millions of people.
    > Even if the LLM never writes a line of code - this is still valuable, because helping humans understand software faster means you can help humans write software faster.
    I have my doubts about this. Yes, if we get an AI that is reliable and doesn't make these mistakes, it can help us understand software faster, as long as we're willing to make the effort to actually understand it, rather than delegating to the AI's understanding.
    What I mean by that is that there are different levels of understanding. How deep do you dive before you decide it's "deep enough" and trust what the AI said? This is even more important if you start also using the AI to write the code and not just read it. Now you have even less motivation to understand the code, because you don't have to learn something that you will use to write your own code.
    I'll keep learning how to use LLMs, because it's necessary, but I'm very worried about what we seem to want from them. I can't think of any previous technological advance that aimed to replace human critical thinking and creativity. Why are we even pursuing efficiency if it isn't to give us more time and freedom to be creative?
    [-]
    - doug_durham 71 days ago
      The value is that it got the details correct as you admit. That alone is worth the price of admission. Even if I need to rewrite or edit parts it has saved me time, and has raised the quality of PRs being submitted across the board. The key point with these tools is *Accountability*. As an engineer you are still accountable for your work. Using any tool doesn't take that away. If the PR tool gets it wrong, and you still submit it, that on the engineer. If you have a culture of accountability, then there is nothing to be terrified of. Any by the way the most recent tools are really, really good at PRs and commit messages.
      [-]
      - svieira 71 days ago
        Are you accountable for CPU bugs in new machines added to your Kubernetes fleet? The trusting-trust problem only works if there is someone to trust.
  - voidUpdate 71 days ago
    Well companies lock "frontier LLMs" behind paywalls, and I don't want to pay for something that still might not be of any use to me
    [-]
    - GaggiX 71 days ago
      Gemini 2.5 Pro Experimental (a frontier model) has 5 RPM and 25 RPD.
      Gemini 2.5 Flash Preview 04-17 another powerful model has 10 and 500.
      OpenAI also allows you to use their API for free if you agree to share the tokens.
      [-]
      - voidUpdate 71 days ago
        What are "RPM" and "RPD"? I assume not Revolutions Per Minute?
        [-]
        GaggiX 71 days ago
        Requests
- kaycebasques 72 days ago
  > Spending paragraphs to explain what an API is through restaurant analogies is a little unnecessary. And then followed up with more paragraphs on what GraphQL is.
  It sounds like the tool (as it's currently set up) may not actually be that effective at writing tutorial-style content in particular. Tutorials [1] are usually heavily action-oriented and take you from a specific start point to a specific end point to help you get hands-on experience in some skill. Some technical writers argue that there should be no theory whatsoever in tutorials. However, it's probably easy to tweak the prompts to get more action-oriented content with less conceptual explanation (and exclamation marks).
  [1] https://diataxis.fr/tutorials/
- neop1x 70 days ago
  >> This can probably be mitigated by refining the prompt
  Sometimes it explains things like I am a child and sometimes it doesn't explain things well enough. I think fixing this just by a simple prompt change won't work - it may fix it in one part and make things worse in the other part. This is a problem which I have with LLM: you can fine-tune the prompt for a specific case but I find it difficult to write a universally-working prompt. The problem seems to be LLM "does not understand my intents", like it can't deduce what I need and "proactively" help. It follows requirements from the prompt but the prompt has to (and can't) handle all situations. I am getting tired of LLM.
- hackernewds 72 days ago
  exactly it is. I'd rather impressive but at the same time the audience is always going to be engineers, so perhaps it can be curated to still be technical to a degree? I can't imagine a scenario where I have to explain to the VP my ETL pipeline
  [-]
  - trcf21 72 days ago
    From flow.py
    Ensure the tone is welcoming and easy for a newcomer to understand{tone_note}.
    - Output only the Markdown content for this chapter.
    Now, directly provide a super beginner-friendly Markdown output (DON'T need ```markdown``` tags)
    So just a change here might do the trick if you’re interested.
    But I wonder how Gemini would manage different levels. From my take (mostly edtech and not in English) it’s really hard to tone the answer properly and not just have a black and white (5 year old vs expert talk) answer. Anyone has advice on that?
    [-]
    - porridgeraisin 72 days ago
      This has given me decent success:
      "Write simple, rigorous statements, starting from first principles, and making sure to take things to their logical conclusion. Write in straightforward prose, no bullet points and summaries. Avoid truisms and overly high-level statements. (Optionally) Assume that the reader {now put your original prompt whatever you had e.g 5 yo}"
      Sometimes I write a few more lines with the same meaning as above, and sometimes less, they all work more or less OK. Randomly I get better results sometimes with small tweaks but nothing to make a pattern out of -- a useless endeavour anyway since these models change in minute ways every release, and in neural nets the blast radius of a small change is huge.
      [-]
      - trcf21 72 days ago
        Thanks I’ll try that!
swashbuck1r 71 days ago
While the doc generator is a useful example app, the really interesting part is how you used Cursor to start a PocketFlow design doc for you, then you fine-tuned the details of the design doc to describe the PocketFlow execution graph and utilities you wanted the design of the doc-generator to follow…and then you used used Cursor to generate all the code for the doc-generator application.
This really shows off that the simple node graph, shared storage and utilities patterns you have defined in your PocketFlow framework are useful for helping the AI translate your documented design into (mostly) working code.
Impressive project!
See design doc https://github.com/The-Pocket/Tutorial-Codebase-Knowledge/bl...
And video https://m.youtube.com/watch?v=AFY67zOpbSo
mooreds 72 days ago
I had not used gemini before, so spent a fair bit of time yak shaving to get access to the right APIs and set up my Google project. (I have an OpenAPI key but it wasn't clear how to use that service.)
I changed it to use this line:
```
   api_key=os.getenv("GEMINI_API_KEY", "your-api_key")
```
instead of the default project/location option.
and I changed it to use a different model:
```
    model = os.getenv("GEMINI_MODEL", "gemini-2.5-pro-preview-03-25")
```
I used the preview model because I got rate limited and the error message suggested it.
I used this on a few projects from my employer:
- https://github.com/prime-framework/prime-mvc a largish open source MVC java framework my company uses. I'm not overly familiar with this, though I've read a lot of code written in this framework.
- https://github.com/FusionAuth/fusionauth-quickstart-ruby-on-... a smaller example application I reviewed and am quite familiar with.
- https://github.com/fusionauth/fusionauth-jwt a JWT java library that I've used but not contributed to.
Overall thoughts:
Lots of exclamation points.
Thorough overview, including of some things that were not application specific (rails routing).
Great analogies. Seems to lean on them pretty heavily.
Didn't see any inaccuracies in the tutorials I reviewed.
Pretty amazing overall!
[-]
- mooreds 72 days ago
  If you want to see what output looks like (for smaller projects--the OP shared some for other, more popular projects), I posted a few of the tutorials to my GitHub:
  https://github.com/mooreds/prime-mvc-tutorial
  https://github.com/mooreds/railsquickstart-tutorial
  https://github.com/mooreds/fusionauth-jwt-tutorial/
  Other than renaming the index.md file to README.md and modifying it slightly, I made no changes.
  Edit: added note that there are examples in the original link.
  [-]
  - mooreds 72 days ago
    Update, billing was delayed, but for 4 tutorials it cost about $5.
manofmanysmiles 73 days ago
I love it! I effectively achieve similar results by asking Cursor lots of questions!
Like at least one other person in the comments mentioned, I would like a slightly different tone.
Perhaps good feature would be a "style template", that can be chosen to match your preferred writing style.
I may submit a PR though not if it takes a lot of time.
[-]
- zh2408 73 days ago
  Thanks—would really appreciate your PR!
TheTaytay 73 days ago
Woah, this is really neat. My first step for many new libraries is to clone the repo, launch Claude code, and ask it to write good documentation for me. This would save a lot of steps for me!
[-]
- randomcatuser 71 days ago
  Exactly what I did today! (for Codex!) The output here is actually slightly better!
  I bet in the next few months we'll be getting dynamic, personalized documentation for every library!! Good times
fforflo 72 days ago
If you want to use Ollama to run local models, here’s a simple example:
from ollama import chat, ChatResponse
def call_llm(prompt, use_cache: bool = True, model="phi4") -> str: response: ChatResponse = chat( model=model, messages=[{ 'role': 'user', 'content': prompt, }] ) return response.message.content
[-]
- mooreds 72 days ago
  Is the output as good?
  I'd love the ability to run the LLM locally, as that would make it easier to run on non public code.
  [-]
  - fforflo 72 days ago
    It's decent enough. But you'd probably have to use a model like llama2, which may set your GPU on fire.
Too 71 days ago
How well does this work on unknown code bases?
The tutorial on requests looks uncanny for being generated with no prior context. The use cases and examples it gives are too specific. It is making up terminology, for concepts that are not mentioned once in the repository, like "functional api" and "hooks checkpoints". There must be thousands of tutorials on requests online that every AI was already trained on. How do we know that it is not using them?
chairhairair 73 days ago
A company (mutable ai) was acquired by Google last year for essentially doing this but outputting a wiki instead of a tutorial.
[-]
- kaycebasques 71 days ago
  I meant to write a blog post about mutable.ai but didn't get around to it before the product shut down.
  I did however archive the wiki that it generated for the project I work on: https://web.archive.org/web/20240815184418/wiki.mutable.ai/g...
  (The images aren't working. I believe those were auto-generated class inheritance or dependency diagrams.)
  * The first paragraph is pretty good.
  * The second paragraph is incorrect to call pw_rpc the "core" of Pigweed. That implies that you must always use pw_rpc and that all other modules depend on it, which is not true.
  * The subsequent descriptions of modules all seemed decent, IIRC.
  * The big issue is that the wiki is just a grab bag summary of different parts of the codebase. It doesn't feel coherent. And it doesn't mention the other 100+ modules that the Pigweed codebase contains.
  When working on a big codebase, I imagine that tools like mutable.ai and Pocket Flow will need specific instruction on what aspects of the codebase to document.
- zh2408 73 days ago
  Their site seems to be down. I can't find their results.
  [-]
  - codetrotter 73 days ago
    Were they acquired? Or did they give up and the CEO found work at Google?
    https://news.ycombinator.com/item?id=42542512
    The latter is what this thread claims ^
    [-]
    - chairhairair 72 days ago
      I don’t know the details of the deal, but their YC profile indicates they were acquired.
    - cowsandmilk 72 days ago
      you're going to trust the person who started the thread with no idea what happened to the company and then jumped to conclusions based on LinkedIn?
    - nxobject 72 days ago
      It sounds like it'd be perfect for Google's NotebookLM portfolio -- at least if they wanted to scale it up.
gregpr07 72 days ago
I built browser use. Dayum, the results for our lib are really impressive, you didn’t touch outputs at all? One problem we have is maintaining the docs with current codebase (code examples break sometimes). Wonder if I could use parts of Pocket to help with that.
[-]
- cehrlich 72 days ago
  As a maintainer of a different library, I think there’s something here. A revised version of this tool that also gets fed the docs and asked to find inaccuracies could be great. Even if false positives and false negatives are let’s say 20% each, it would still be better than before as final decisions are made by a human.
- zh2408 72 days ago
  Thank you! And correct, I didn't modify the outputs. For small changes, you can just feed the commit history and ask an LLM to modify the docs. If there are lots of architecture-level changes, it would be easier to just feed the old docs and rewrite - it usually takes <10 minutes.
esjeon 72 days ago
At the top are some neat high-level stuffs, but, below that, it quickly turns into code-written-in-human-language.
I think it should be possible to extract some more useful usage patterns by poking into related unit tests. How to use should be what matters to most tutorial readers.
remoquete 71 days ago
This is nice and fun for getting some fast indications on an unknown codebase, but, as others said here and elsewhere, it doesn't replace human-made documentation.
https://passo.uno/whats-wrong-ai-generated-docs/
[-]
- kaycebasques 71 days ago
  My bet is that the combination of humans and language models is stronger than humans alone or models alone. In other words there's a virtuous cycle developing where the codebases that embrace machine documentation tools end up getting higher quality docs in the long run. For example, last week I tried out a codebase summary tool. It had some inaccuracies and I knew exactly where it was pulling the incorrect data from. I fixed that data, re-ran the summarization tool, and was satisfied to see a more accurate summary. But yes, it's probably key to keep human technical writers (like myself!) in the loop.
  [-]
  - remoquete 71 days ago
    Indeed. Augmentation is the way forward.
axelr340 62 days ago
We are also building a tool to understand codebases. Our tool shows the features implemented in a codebase visually, along with their hierarchy, and with traceability to associated code.
Here is an example feature map for the Spot robot SDK from Boston Dynamics with 100k lines of code: https://product-map.ai/app/public?url=https://github.com/bos...
mattfrommars 72 days ago
WTF
You built in in one afternoon? I need to figure out these mythical abilities.
I've thought about this idea few weeks back but could not figure out how to implement it.
Amazing job OP
fforflo 72 days ago
With $GEMINI_MODE=gemini-2.0-flash I also got some decent results for libraries like simonw/llm and pgcli.
You can tell that because simonw writes quite heavily-documented code an the logic is pretty straightforward, it helps the model a lot!
https://github.com/Florents-Tselai/Tutorial-Codebase-Knowled...
https://github.com/Florents-Tselai/Tutorial-Codebase-Knowled...
[-]
- 3abiton 71 days ago
  How does it perform for undocumented repos?
amelius 72 days ago
I've said this a few times on HN: why don't we use LLMs to generate documentation? But then came the naysayers ...
[-]
- runeks 72 days ago
  Useful documentation explains why the code does what it does. Ie. why is this code there?
  An LLM can't magically figure out your motivation behind doing something a certain way.
  [-]
  - bonzini 71 days ago
    Tell it the why of the API and ask it to write individual function and class docs then.
  - johnnyyyy 72 days ago
    Are you implying that only the creator of the code can write documentation?
    [-]
    - oblio 72 days ago
      No, they're saying that LLMs (and really, most other humans) can't really write the best documentation.
      Frankly, most documentation is useless fluff. LLMs will be able to write a ton of that for sure :-)
- remoquete 71 days ago
  Because you can't. See my previous comment. https://news.ycombinator.com/item?id=43748908
- tdy_err 72 days ago
  Alternatively, document your code.
  [-]
  - cushychicken 72 days ago
    Id have bought a lot of lunches for myself if I had a dollar for every time I’ve pushed my team for documentation and had it turn into a discussion of “Well, how does that stack up against other priorities?”
    It’s a pretty foolproof way for smart political operators to get out of a relatively dreary - but high leverage - task.
    AI doesn’t complain. It just writes it. Makes the whole task a lot faster when a human is a reviewer for correctness instead of an author and reviewer.
mvATM99 72 days ago
This is really cool and very practical. definitely will try it out for some projects soon.
Can see some finetuning after generation being required, but assuming you know your own codebase that's not an issue anyway.
citizenpaul 71 days ago
This is really cool. One of the best AI things I've seen in the last two years.
wg0 72 days ago
That's a game changer for a new Open source contributor's onboarding.
Put in postgres or redis codebase, get a good understanding and get going to contribute.
[-]
- tgv 72 days ago
  Isn't that overly optimistic? The postgres source code is really complex, and reading a dummy tutorial isn't going to make you a database engine ninja. If a simple tutorial can, imagine what a book on the topic could do.
  [-]
  - wg0 72 days ago
    No, I am not that optimistic about LLMs. I just think that something is better then nothing.
    The burden of understanding still is with the engineers. All you would get is some (partially inaccurate at places) good overview of where to look for.
Retr0id 73 days ago
The overview diagrams it creates are pretty interesting, but the tone/style of the AI-generated text is insufferable to me - e.g. https://the-pocket.github.io/Tutorial-Codebase-Knowledge/Req...
[-]
- zh2408 73 days ago
  Haha. The project is fully open-sourced, so you can tune the prompt for the tone/style you prefer: https://github.com/The-Pocket/Tutorial-Codebase-Knowledge/bl...
- fn-mote 72 days ago
  I guess you already know what a “Functional API” is and feel patronized. Also possibly you dislike the “cute analogy” factor.
  I think this could be solved with an “assume the reader knows …” part of the prompt.
  Definitely looks like ELI5 writing there, but many technical documents assume too much knowledge (especially implicit knowledge of the context) so even though I’m not a fan of this section either, I’m not so quick to dismiss it as having no value.
  [-]
  - Retr0id 72 days ago
    I don't mind analogies if they actually convey some meaning, but this example just seems to be using it as an aesthetic flair.
- vivzkestrel 72 days ago
  mind explaining what exactly was insufferable here?
  [-]
  - Retr0id 72 days ago
    If you don't feel the same way from reading it, I'm not sure it can be explained.
    [-]
    - stevedonovan 72 days ago
      I agree, it's hopelessly over-cheerful and tries to be cute. The pizza metaphor fell flat for me as well
kaycebasques 72 days ago
Very cool, thanks for sharing. I imagine that this will make a lot of my fellow technical writers (even more) nervous about the future of our industry. I think the reality is more along the lines of:
* Previously, it was simply infeasible for most codebases to get a decent tutorial for one reason or another. E.g. the codebase is someone's side project and they don't have the time or energy to maintain docs, let alone a tutorial, which is widely regarded as one of the most labor-intensive types of docs.
* It's always been hard to persuade businesses to hire more technical writers because it's perenially hard to connect our work to the bottom or top line.
* We may actually see more demand for technical writers because it's now more feasible (and expected) for software projects of all types to have decent docs. The key future skill would be knowing how to orchestrate ML tools to produce (and update) docs.
(But I'm also under no delusion: it definitely possible for TWs to go the way of the dodo bird and animatronics professionals.)
I think I have a very good way to evaluate this "turn GitHub codebases into easy tutorials" tool but it'll take me a few days to write up. I'll post my first impressions to https://technicalwriting.dev
P.S. there has been a flurry of recent YC startups focused on automating docs. I think it's a tough space. The market is very fragmented. Because docs are such a widespread and common need I imagine that a lot of the best practices will get commoditized and open sourced (exactly like Pocket Flow is doing here)
[-]
- kaycebasques 70 days ago
  Here's my write-up: https://technicalwriting.dev/ml/pocketflow/index.html
  [-]
  - teknico 64 days ago
    Thank you, very detailed and useful.
potamic 72 days ago
Did you measure how much it cost to run it against your examples? Trying to gauge how much it would cost to run this against my repos.
[-]
- pitched 72 days ago
  Looks like there are 4 prompts and the last one can run up to 10 times for the chapter content.
  You might get two or three tutorials built for yourself inside the free 25/day limit, depending on how many chapters it needs.
stephantul 72 days ago
The dspy tutorial is amazing. I think dspy is super difficult to understand conceptually, but the tutorial explained it really well
theptip 72 days ago
Yes! AI for docs is one of the usecases I’m bullish on. There is a nice feedback loop where these docs will help LLMs to understand your code too. You can write a GH action to check if your code change / release changes the docs, so they stay fresh. And run your tutorials to ensure that they remain correct.
[-]
- mooreds 72 days ago
  > And run your tutorials to ensure that they remain correct.
  Do you have examples of LLMs running tutorials you can share?
badmonster 73 days ago
do you have plans to expand this to include more advanced topics like architecture-level reasoning, refactoring patterns, or onboarding workflows for large-scale repositories?
[-]
- zh2408 73 days ago
  Yes! This is an initial prototype. Good to see the interest, and I'm considering digging deeper by creating more tailored tutorials for different types of projects. E.g., if we know it's web dev, we could generate tutorials based more on request flows, API endpoints, database interactions, etc. If we know it's a more long-term maintained projects, we can focus on identifying refactoring patterns.
  [-]
  - kristopolous 73 days ago
    Have you ever seen komment.ai? Is so did you have any issues with the limitation of the product?
    I haven't used it, but it looks like it's in the same space and I've been curious about it for a while.
    I've tried my own homebrew solutions, creating embedding databases by having something like aider or simonw's llm make an ingests json from every function, then using it as a rag in qdrant to do an architecture document, then using that to do contextual inline function commenting and make a doxygen then using all of that once again as an mcp with playwright to hook that up through roo.
    It's a weird pipeline and it's been ok, not great but ok.
    I'm looking into perplexica as part of the chain, mostly as a negation tool
    [-]
    - zh2408 73 days ago
      No, I haven't, but I will check it out!
      One thing to note is that the tutorial generation depends largely on Gemini 2.5 Pro. Its code understanding ability is very good, combined with its large 1M context window for a holistic understanding of the code. This leads to very satisfactory tutorial results.
      However, Gemini 2.5 Pro was released just late last month. Since Komment.ai launched earlier this year, I don't think models at that time could generate results of that quality.
      [-]
      - kristopolous 73 days ago
        I've been using llama 4 Maverick through openrouter. Gemini was my go to but I switched basically the day it came out to try it out.
        I haven't switched back. At least for my use cases it's been meeting my expectations.
        I haven't tried Microsoft's new 1.58 bit model but it may be a great swap out for sentencellm, the legendary all-MiniLM-L6-v2.
        I found that if I'm unfamiliar with the knowledge domain I'm mostly using AI but then as I dive in the ratio of AI to human changes to the point where it's AI at 0 and it's all human.
        Basically AI wins at day 1 but isn't any better at day 50. If this can change then it's the next step
        [-]
        zh2408 73 days ago
        Yeah, I'd recommend trying Gemini 2.5 Pro. I know early Gemini weren't great, but the recent one is really impressive in terms of coding ability. This project is kind of designed around the recent breakthrough.
        [-]
        kristopolous 73 days ago
        I've used it, I used to be a huge booster! Give llama 4 maverick a try, really.
iamsaitam 67 days ago
BLOATED. This project is 100 lines of code, but everything that is non-code related is bloated like a gas giant. All the text and videos are written by an LLM. The author would learn from understanding that QUANTITY isn't QUALITY, toning down the verbiage would benefit greatly what they are trying to communicate.
PS: The generated "design documents" are 2k+ lines long. This seems like a great way to exceed quotas.
1899-12-30 71 days ago
As an extension to this general idea: AI generated interactive tutorials for software usage might be a good product. Assuming it was trained on the defined usage paths present in the code, it would be able to guide the user through those usages.
ganessh 72 days ago
Does it use the docs in the repository or only the code?
[-]
- zh2408 72 days ago
  By default we use both based on regex:
  DEFAULT_INCLUDE_PATTERNS = { ".py", ".js", ".jsx", ".ts", ".tsx", ".go", ".java", ".pyi", ".pyx", ".c", ".cc", ".cpp", ".h", ".md", ".rst", "Dockerfile", "Makefile", ".yaml", ".yml", } DEFAULT_EXCLUDE_PATTERNS = { "test", "tests/", "docs/", "examples/", "v1/", "dist/", "build/", "experimental/", "deprecated/", "legacy/", ".git/", ".github/", ".next/", ".vscode/", "obj/", "bin/", "node_modules/", ".log" }
  [-]
  - m0rde 72 days ago
    Have you tried giving it tests? Curious if you found they made things worse.
  - Tokumei-no-hito 72 days ago
    why exclude tests and docs by default?
nitinram 70 days ago
This is super cool! I attempted to use this on a project and kept running into "This model's maximum context length is 200000 tokens. However, your messages resulted in 459974 tokens. Please reduce the length of the messages." I used open ai o4-mini. Is there an easy way to handle this gracefully? Basically if you had thoughts on how to make some tutorials for really large codebases or project directories?
[-]
- zh2408 69 days ago
  Could you try to use gemini 2.5 pro? It's free every day for first 25 requests, and can handle 1M input tokens
lummm 71 days ago
I actually have created something very similar here: https://github.com/Black-Tusk-Data/crushmycode, although with a greater focus on 'pulling apart' the codebase for onboarding. So many potential applications of the resultant knowledge graph.
bionhoward 72 days ago
“I built an AI”
Looks inside
REST API calls
pknerd 72 days ago
Interesting..would you like to share some technical details? it did not seem you have used RAG here?
[-]
- zh2408 72 days ago
  Yeah, RAG is not the best option here. Check out the design doc: https://github.com/The-Pocket/Tutorial-Codebase-Knowledge/bl... I also have a YouTube Dev Tutorial. The link is on the repo.
chbkall 72 days ago
Love this. These are the kind of AI applications we need which aid our learning and discovery.
zarkenfrood 72 days ago
Really nice work and thank you for sharing. These are great demonstrations of the value of LLMs which help to go against the negative view on the impacts to junior engineers. This helps bridge the gap of most projects lacking updated documentation.
android521 72 days ago
For anyone doubting AI as pure hype, this is the counter example of its usefulness
[-]
- croes 72 days ago
  Nobody said AI isn’t useful.
  The hype is that AI isn’t a tool but the developer.
  [-]
  - hackernewds 72 days ago
    I've seen a lot of developers that are absolute tools. But I've yet to see such a succinct use of AI. Kudos to the author.
    [-]
    - croes 72 days ago
      Exactly, kudos to the author because AI didn’t came up with that.
      But that’s what they sell, that AI could do what the author did with AI.
      The question is, is it worth to put all that money and energy in AI. MS sacrificed its CO2 goals for email summaries and better autocomplete not to mention all the useless things we do with AI
      [-]
      - relativ575 72 days ago
        > But that’s what they sell, that AI could do what the author did with AI.
        Can you give an example of what you meant here? The author did use AI. What does "AI coming up with that" mean?
        [-]
        murkt 72 days ago
        GP commenter complains that it’s not AI that came up with an idea and implemented it, but a human did.
        In the few years we will see complaints that it’s not AI that built a power station and a datacenter, so it doesn’t count as well.
        [-]
        croes 72 days ago
        Some people already said it’s useless to learn to program because AI will do, that‘s the hype of AI not that AI isn’t useful as such like parent comment suggested.
        They push AI into everything like it’s the ultimate solution but it is not instead is has serious limitations.
        croes 72 days ago
        It’s about the AI hype.
        The AI companies sell it like the AI could do it by itself and developers are obsolete but in reality it‘s a tool that still needs developers to make something useful
  - relativ575 72 days ago
    Nobody?
    https://news.ycombinator.com/item?id=41542497
    [-]
    - croes 72 days ago
      Doesn’t claim it isn’t useful just it’s not as useful as they thought.
      For instance to me AI is useful because I don’t have to write boilerplate code but that’s rarely the case. For other things it still useful to write code but I am not faster because the time I save writing the code I need to fix the prompt, audit and fix the code.
lastdong 71 days ago
Great stuff, I may try it with a local model. I think the core logic for the final output is all in the nodes.py file, so I guess one can try and tweak the prompts, or create a template system.
touristtam 72 days ago
Just need to find one way to integrate into the deployment pipeline and output some markdown (or other format) to send them to what ever your company is using (or simply a live website), I'd say.
thom 72 days ago
This is definitely a cromulent idea, although I’ve realised lately that ChatGPT with search turned on is a great balance of tailoring to my exact use case and avoiding hallucinations.
trash_cat 72 days ago
This is literally what I use AI for. Excellent project.
orsenthil 72 days ago
It will be good to integrate a local web server to fire up and read the doc. I use vscode, markdown preview. And it works too. Cool project.
polishdude20 72 days ago
Is there an easy way to have this visit a private repository? I've got a new codebase to learn and it's behind credentials.
[-]
- nitinram 70 days ago
  You can also use it by using the --dir flag once you clone the repository. It can work from a local directory.
- zh2408 72 days ago
  You can provide GitHub token
andrewrn 72 days ago
This is brilliant. I would make great use of this.
gbraad 72 days ago
Interesting, but gawd awful analogy: "like a takeout order app". It tries to be amicable, which feels uncanny.
throwaway314155 72 days ago
I suppose I'm just a little bit bothered by your saying you "built an AI" when all the heavy lifting is done by a pretrained LLM. Saying you made an AI-based program or hell, even saying you made an AI agent, would be more genuine than saying you "built an AI" which is such an all-encompassing thing that I don't even know what it means. At the very least it should imply use of some sort of training via gradient descent though.
[-]
- j45 72 days ago
  It is an application of AI which is just software, and applying it to solve a problem or need.
bdg001 72 days ago
I was using gitdiagram but llms are very bad at generating good error free mermaid code!
Thanks buddy! this will be very helpful !!
dangoodmanUT 72 days ago
it appears like it's leveraging the docs and learned tokens more than the actual code. For example I don't believe it could achieve that understanding of levelDB without the prior knowledge and extensive material it's probably learned on already
andybak 72 days ago
Is there a way to limit the number of exclamation marks in the output?
It seems a trifle... overexcited at times.
las_nish 72 days ago
Nice project. I need to try this
rtcoms 70 days ago
I would be very interested in knowing how did you build this ?
anshulbhide 72 days ago
Love this kind of stuff on HN
souhail_dev 72 days ago
that's amazing, I was looking for that a while ago Thanks
lasarkolja 72 days ago
Can anyone turn nextcloud/server into an easy tutorial
CalChris 72 days ago
Do one for LLVM and I'll definitely look at it.
throwaway290 72 days ago
You didn't "build an AI". It's more like you wrote a prompt.
I wonder why all examples are from projects with great docs already so it doesn't even need to read the actual code.
[-]
- afro88 72 days ago
  > You didn't "build an AI".
  True
  > It's more like you wrote a prompt.
  False
  > I wonder why all examples are from projects with great docs already so it doesn't even need to read the actual code.
  False.
  This: https://github.com/browser-use/browser-use/tree/main/browser...
  Became this: https://the-pocket.github.io/Tutorial-Codebase-Knowledge/Bro...
  [-]
  - quantumHazer 72 days ago
    The example you made has, in fact, a documentation
    https://docs.browser-use.com/introduction
    [-]
    - afro88 72 days ago
      You don't point this tool at the documentation though. You point it at a repo.
      Granted, this example (and others) have plenty of inline documentation. And, public documentation is likely in the training data for LLMs.
      But, this is more than just a prompt. The tool generates really nicely structured and readable tutorials that let you understand codebases at a conceptual level easier than reading docstrings and code.
      Even if it's only useful for public repos with documentation, that's still useful, and flippant dismissals are counterproductive.
      I am keen to try this with one of my own (private, badly documented) codebases and see how it fares. I've actually found LLMs quite useful at explaining code, so I have high hopes.
      [-]
      - quantumHazer 72 days ago
        I’m not saying that the tool is useless, I was confuting your argument about being a project WITHOUT docs. LLM can write passable docs, but obviously can write better docs of project well documented in training data. And this example is probably in training data as of April 2025
        [-]
        afro88 70 days ago
        For what it's worth, I have tried it on a couple of private repos today and the quality is incredible. Give it a shot.
        I think Gemini 2.5 Pro is doing a lot of the heavy lifting here. I have tried this sort of thing before (documentation, not tutorials, granted) and it wasn't anywhere near this good.
  - throwaway290 72 days ago
    That's good, though there's tons of docstrings. In my experience LLM completely make no sense from undocumented code.
    [-]
    - afro88 72 days ago
      Fair, there are tons of docstrings. I have had the opposite experience with LLMs explaining code, so I am biased towards assuming this works. I'm keen to try it and see.
firesteelrain 72 days ago
Can this work with Codeium Enterprise?
saberience 72 days ago
I hate this language: "built an AI", did you train a new model to do this? Or are you in fact calling ChatGPT 4o, or Sonnet 3.7 with some specific prompts?
If you trained a model from scratch to do this I would say you "built an AI", but if you're just calling existing models in a loop then you didn't build an AI. You just wrote some prompts and loops and did some RAG. Which isn't building an AI and isn't particularly novel.
[-]
- eapriv 72 days ago
  > “I built an AI”
  > look inside
  > it’s a ChatGPT wrapper
mraza007 72 days ago
Impressive work.
With the rise of AI understanding software will become relatively easy
chyueli 72 days ago
Great, I'll try it next time, thanks for sharing
MoonieSzzS 73 days ago
[dead]
dahuangf 72 days ago
[dead]
imposter 71 days ago
[dead]
lionturtle 72 days ago
>:( :3
ryao 73 days ago
I would find this more interesting if it made tutorials out if the Linux, LLVM, OpenZFS and FreeBSD codebases.
[-]
- zh2408 73 days ago
  The Linux repository has ~50M tokens, which goes beyond the 1M token limit for Gemini 2.5 Pro. I think there are two paths forward: (1) decompose the repository into smaller parts (e.g., kernel, shell, file system, etc.), or (2) wait for larger-context models with a 50M+ input limit.
  [-]
  - achierius 72 days ago
    Some huge percentage of that is just drivers. The kernel is likely what would be of interest to someone in this regard; moreover, much of that is architecture specific. IIRC the x86 kernel is <1M lines, though probably not <1M tokens.
    [-]
    - throwup238 72 days ago
      The AMDGPU driver alone is 5 million lines - out of about 37 million lines total. Over 10% of the codebase is a driver for a single vendor, although most of it is auto generated per-product headers.
  - rtolsma 73 days ago
    You can use the AST for some languages to identify modular components that are smaller and can fit into the 1M window
  - ryao 72 days ago
    The first path would be the most interesting, especially if it can be automated.
- wordofx 73 days ago
  I would find this comment more interesting if it didn’t dismiss the project just because you didn’t find it valuable.
  [-]
  - ryao 72 days ago
    My comment gave constructive feedback. Yours did not.
  - revskill 72 days ago
    So what is the problem with raising an opinion ?
- fn-mote 72 days ago
  You would need a more specific goal than “make a tutorial”.
  Do you have anything in mind? Are you familiar enough with any of those codebases to suggest something useful?
  The task will be much more interesting if there is not a good existing tutorial that the LLM may have trained on.
  OS kernel: tutorial on how to write a driver?
  OpenZFS: ?
  [-]
  - ryao 72 days ago
    I am #4 here:
    https://github.com/openzfs/zfs/graphs/contributors
    I would have preferred to see what would have been generated without my guidance, but since you asked:
    * Explanations of how each sub-component is organized and works would be useful.
    * Explanations of the modern disk format (an updated ZFS disk format specification) would be useful.
    * Explanations of how the more complex features are implemented (e.g. encryption, raid-z expansion, draid) would be interesting.
    Basically, making guides that aid development by avoiding a need to read everything line by line would be useful (the ZFS disk format specification, while old, is an excellent example of this). I have spent years doing ZFS development, and there are parts of ZFS codebase that I do not yet understand. This is true for practically all contributors. Having guides that avoid the need for developers to learn the hard way would be useful. Certain historical bugs might have been avoided had we had such guides.
    As for the others, LLVM could use improved documentation on how to make plugins. A guide to the various optimization passes would also be useful. Then there is the architecture in general which would be nice to have documented. Documentation for various esoteric features of both FreeBSD and Linux would be useful. I could continue, but I the whole point of having a LLM do this sort of work is to avoid needing myself or someone else to spend time thinking about these things.
istjohn 72 days ago
This is neat, but I did find an error in the output pretty quickly. (Disregard the mangled indentation)
```
  # Use the Session as a context manager
  with requests.Session() as s: 
   
 s.get('https://httpbin.org/cookies/set/contextcookie/abc')
      response = s.get(url) # ???
      print("Cookies sent within 'with' block:", response.json())
```
https://the-pocket.github.io/Tutorial-Codebase-Knowledge/Req...
[-]
- zh2408 72 days ago
  This code creates an HTTP session, sets a cookie within that session, makes another request that automatically includes the cookie, and then prints the response showing the cookies that were sent.
  I may miss the error, but could you elaborate where it is?
  [-]
  - Armazon 72 days ago
    The url variable is never defined
    [-]
    - foobarbecue 72 days ago
      Yes it is, just before the pasted section.
      [-]
      - Armazon 72 days ago
        oh you are right, nevermind then
- totally 72 days ago
  If only the AI could explain the errors that the AI outputs.
  [-]
  - ForOldHack 71 days ago
    Cannot you stuff the snakes tail into its mouth?
    I tried this for some very small decompilation projects, and it was cute at best.
    Then I sent it a boot loader. I should have posted it on a ceral box for better results.
    Is someone going to suggest that I check the dissembly into git hub and watch it make a tutorial?