Show HN: I ported Tree-sitter to Go

(github.com)

128 points | by odvcencio 2 hours ago

10 comments

acedTrex 1 hour ago
Claude attempted a treesitter to go port
Better title
[-]
- gritzko 58 minutes ago
  I work on a revision control system project, except merge is CRDT. On Feb 22 there was a server break-in (I did not keep unencrypted sources on the client, server login was YubiKey only, but that is not 100% guarantee). I reported break-in to my Telegram channel that day.
  My design docs https://replicated.wiki/blog/partII.html
  I used tree-sitter for coarse AST. Some key parts were missing from the server as well, because I expected problems (had lots of adventures in East Asia, evil maids, various other incidents on a regular basis).
  When I saw "tree-sitter in go" title, I was very glad at first. Solves some problems for me. Then I saw the full picture.
  [-]
  - ctmnt 20 minutes ago
    Wait, are you suggesting that OP broke in to your server and stole code and is republishing it as these repos?
    I have questions. Have you reviewed the code here to see if it matches? What, more specifically, do you mean when you say someone broke in? What makes you think that this idea (which is nice but not novel) is worth stealing? If that sounds snarky, it’s not meant to; just trying to understand what’s going on. Why is that more likely than someone using Claude to vibe up some software along the same lines?
  - ctmnt 15 minutes ago
    Also, evil maids, what?
- odvcencio 1 hour ago
  well how did it do?
  [-]
  - ctmnt 17 minutes ago
    Hard to say. Claude’s very good at writing READMEs. In fact, Copilot often complains about docs that sound like they’re about current capabilities when in fact they’re future plans or just plan aspirational.
    Without downloading and testing out your software, how can we know if it’s any good? Why would we do that if it’s obviously vibed? The dilemma.
    I’m not at all against vibe coding. I’m just pointing out that having a nice README is trivial. And the burden of proof is on you.
  - wocram 57 minutes ago
    Shouldn't you be able to answer that?
    [-]
    - odvcencio 56 minutes ago
      yes and if you clicked the links you would know that i did answer it in the readme.
      [-]
      - mathfailure 44 minutes ago
        But how do we know the readme isn't also vibecoded?
      - do_not_redeem 48 minutes ago
        > Pure-Go tree-sitter runtime — no CGo, no C toolchain, WASM-ready.
        No you didn't. The readme is obvious LLM slop. Em-dash, rule of three, "not x, y". Why should anyone spend effort reading something you couldn't be bothered to write? Why did you post it to HN from a burner account?
- red_hare 58 minutes ago
  How is OP using Claude relevant?
  [-]
  - ks2048 17 minutes ago
    People should say what models/tools they used in even show the prompts.
  - gritzko 50 minutes ago
    OK for prototyping. Not OK for prod use if noone actually read it line by line.
    [-]
    - odvcencio 47 minutes ago
      ii am trying to not take issue with this comment because im aware of the huge stigma around ai generated code.
      i needed this project so i made it for my use case and had to build on top of it. the only way to ensure quality is to read it all line by line.
      if you give me code that you yourself have not reviewed i will not review it for you.
    - znpy 33 minutes ago
      That ship has sailed, man…
      [-]
      - overfeed 8 minutes ago
        No it has not - if it had, there'd be no need to shout down folk who disagree.
        Not everyone buys into the inevitabilism. Why should I read code "author" didn't bother to write?
  - DeepYogurt 58 minutes ago
    maintenance burden
  - IshKebab 45 minutes ago
    AI often produces nonsense that a human wouldn't. If a project was written using AI the chances that it is a useless mess are significantly higher than if it was written by a human.
sluongng 2 hours ago
Oh this is really neat for the Bazel community, as depending on tree-sitter to build a gazelle language extension, with Gazelle written in Go, requires you to use CGO.
Now perhaps we can get rid of the CGO dependency and make it pure Go instead. I have pinged some folks to take a look at it.
[-]
- odvcencio 2 hours ago
  thanks so much for the note! i really appreciate it. i built this precisely for folks like yourself with this specific pain, thanks again!
3rly 2 hours ago
Wouldn't `got` be confused with OpenBSD's Got: https://gameoftrees.org/index.html
[-]
- odvcencio 2 hours ago
  oh wow! i really thought i was being too clever but i shouldve assumed nothing new under the sun. well im taking name suggestions now!
  [-]
  - allknowingfrog 1 hour ago
    Well, find and sed have modern "fd" and "sd" alternatives. Naming it "gt" allows you to claim that your version save 33% compared to typing "git".
  - boobsbr 1 hour ago
    Goty McGotface
  - Imustaskforhelp 1 hour ago
    uGOT / uGOTme? (sort of like the idea behind uTorrent) but I will agree that sbankowi's idea of Yet another got is great as well. +1 to that as well.
  - sbankowi 1 hour ago
    YAGOT (Yet Another GOT)
    [-]
    - bityard 1 hour ago
      Probably taken already, better use YAGOT-NG (Next Generation) just to be safe.
      [-]
      - himata4113 1 hour ago
        might be taken too so just YAGOT2 would work
trickypr 54 minutes ago
Do you have an equivalent of TreeCursors or tree-sitter-generate?
There are at least some use cases where neither queries nor walks are suitable. And I have run into cases where being able to regenerate and compile grammars on the fly is immeasurably helpful.
At least for my use cases, this would be unusable.
Also, what the hell is this:
> partial [..] missing external scanner
Why do you have a parsing mode that guarantees incorrect outputs on some grammars (html comes to mind) and then use it as your “90x faster” benchmark figure?
[-]
- odvcencio 49 minutes ago
  the 90x figure is on Go source for apples to apples against CGO bound tree-sitter.
  your use case is not one i designed for although yeah maybe the readme has some sections too close. the only external scanner missing atm is norg. now that i know your use case i can probably think of a way to close it
  [-]
  - trickypr 23 minutes ago
    So your benchmarks are primarily just “how fast is go’s c interop” rather than any algorithmic improvement on tree-sitter?
    Edit: yep, you are just calling a c function in a loop. So your no-op benchmark is just the time it takes for cgo to function. I would not be able to get any perf benefits from e.g. rust
shayief 57 minutes ago
This is great, I was looking for something like this, thanks for making this!
I imagine this can very useful for Go-based forges that need syntax highlighting (i.e. Gitea, Forgejo).
I have a strict no-cgo requirement, so I might use it in my project, which is Git+JJ forge https://gitncoffee.com.
[-]
- odvcencio 54 minutes ago
  thank you for the kind words! Very cool project! Very happy you can find some utility in it
conartist6 1 hour ago
It looks like porting the custom C lexers is a big part of the trouble you had to go to do this.
[-]
- odvcencio 1 hour ago
  yes basically about 70% of the engineering effort was spent porting the external scanners and ensuring parity with original (C) tree-sitter
gritzko 1 hour ago
That is very very interesting. I work on a similar project https://replicated.wiki/blog/partII.html
I use CRDT merge though, cause 3-way metadata-less merges only provide very incremental improvements over e.g. git+mergiraf.
How do you see got's main improvement over git?
[-]
- odvcencio 1 hour ago
  primarily, got is structural VCS intended for concurrent edits of the same file.
  it does this via gotreesitter and gts-suite abstractions that enable it to: - have entity-aware diffs - not line by line but function by function - structural blame - attribution resolution for the lifetime of the entity - semver from structure - it can recommend bumps because it knows what is breaking change vs minor vs patch - entity history - because entities are tracked independently, file renames or moves dont affect the entity's history
  when gotreesitter cant parse a language, the 3way text merge happens as a fallback. what the structural merge enables is no conflicts unless same entity has conflicting changes
  [-]
  - gritzko 1 hour ago
    I think I understand the situation.
  - odvcencio 1 hour ago
    gah,. sincere apologies for formatting of this post. i ahve been on HN for basically 10 years now without ever having made a post (:
    [-]
    - dorianmariecom 1 hour ago
      use four spaces " " in front of a line for <pre> formatting
      like " this"
      [-]
      - srcreigh 1 hour ago
        It's 2 or more spaces, not four
jbreckmckye 1 hour ago
Interesting. I have a similar usecase but intended to use CGo tree-sitter with Zig
Are these pretty up-to-date grammars? I'm awfully tempted to switch to your project
How large are your binaries getting? I was concerned about the size of some of the grammars
[-]
- odvcencio 1 hour ago
  206 binary blobs = 15MB, so not crazy but i built for this use case where you can declare the registry of languages you want to load and not have to own all the grammar binaries by default
  [-]
  - jbreckmckye 1 hour ago
    If all the languages together add up to 15MB that is a game changer for me.
    It means the CLI I am working on can ship support for many languages whilst still being a smallish (sub 50mb) download
    I shall definitely check it out!
    [-]
    - odvcencio 1 hour ago
      re: up to date grammars, yes i found the official grammars in use by the original tree-sitter library today
skybrian 1 hour ago
How about making 'got' compatible with git repos like jujutsu? It would be a lot easier to try out.
[-]
- odvcencio 1 hour ago
  it is interoperable with git. we like git when its good but attempted to ease the pains in UX somewhat. you can take advantage of got locally but still push it to git remote forges jsut the same. when you pull stuff in this way, got will load the entity history into the git repo ensuring that you can still do got stuff locally (inspect entity histories, etc)
irishcoffee 1 hour ago
Is it a go-ism that source for implementation and test code lives in the root of the repo or is this an LLM thing?
[-]
- odvcencio 1 hour ago
  yeah the tests live with the implementation code always (Go thing) and the repo root thing is like a preference, main is an acceptable package to put stuff in (Go thing), i see this a lot with smaller projects or library type projects