Datadog, thank you for blocking us

(deductive.ai)

73 points | by binarylogic 1 day ago

11 comments

  • reactordev 6 hours ago
    Big fan of signoz - otel, self-hosted, Prometheus based, works with grafana, scales.

    https://www.signoz.io

    Datadog is good, sentry too, but after running a cloud practice for a major world business, I prefer to have my sensitive system logs and traces in house.

  • eddythompson80 9 hours ago
    SRE agents are the worst agents. I totally get why business and management will demand them and love them. After all, they are the n+1 of customer support chat bot that you get frustrated talking to before you find the magic way to get to a person.

    We have been using few different SRE agents and they all fucking suck. The way they are promoted and run always makes them eager to “please” by inventing processes, services, and work-arounds that don’t exist or make no sense. Giving examples will always sound pity or “dumb”. Every time I have to explain to management where SRE agent failed they just hand wave it and assume it’s a small problem. And the problem is, I totally get it. When the SRE agent says “DNS propagation issues are common. I recommend flushing dns cache or trying again later” or “The edge proxy held a bad cache entry. Cache will eventually get purged and the issue should be solved eventually” sounds so reasonable and “smart”. The issue was in DNS or in the proxy configuration. How smart was the SRE agent to get there? They think it’s phenomenal and it may be. But I know that the “DNS issue” isn’t gonna resolve itself because we have a bug in how we update DNS. I know the edge proxy cache issue is always gonna cause a particular use case to fail because the way cache invalidation is implemented has a bug. Everyone loves deflection (including me) and “self correcting” systems. But it just means that a certain class of bugs will forever be “fine” and maybe that’s fine. I don’t know anymore.

    • Eridrus 8 hours ago
      I have no personal experience with the SRE agents, but I used Codex recently when trying to root cause an incident after we're put in a stop gap, and it did the last mile debugging of looking through the code for me once I had assembled a set of facts & log lines and accurately pointed me to some code I had ignored in my mental model because it was so trivial I didn't think it could be an issue.

      That experience made me think we're getting close to SRE agents being a thing.

      And as the LLM makers like to reiterate, the underlying models will get better.

      Which is to say, I think everyone should have some humility here because how useful the systems end up being is very uncertain. This of course applies just as much to execs who are ingesting the AI hype too.

    • kevmo314 8 hours ago
      That’s my experience working with most SRE humans too. They’re more than happy to ignore the bug in DNS and build a cron job to flush the cache every day instead.

      So in some sense the agent is doing a pretty good job…

    • BuildItBusk 7 hours ago
      I guess that depends on how you use agents (SRE or in general). If you ask it a question (even implicitly) and blindly trust the answer, I agree. But if you have it help you find the needle in the haystack, and then verify that did indeed find the needle, suddenly it’s a powerful tool.
    • 0xbadcafebee 8 hours ago
      Have you used Amazon Q? It's actually pretty handy at investigating, diagnosing, and providing solutions for AWS issues. For some reason none of our teams use it, and waste their time googling or opening tickets for me to answer. I go to Q and ask it, it provides the answer, I send it back to the user. I don't think an "SRE Agent" will be useful because it's too generic, but "Agent customized to solve problems for one specific product/service/etc" can actually be very useful.

      That said, I think you're right that you can't really replace an Operations staff, as there will always need to be a human making complex, multi-dimensional decisions around constantly changing scenarios, in order to keep a business operational.

  • matrix12 8 hours ago
    They suck in all your data. Then charge you minibar prices to access it.
  • cebert 9 hours ago
    Wow, BitsAI in Datadog isn’t even good. I didn’t realize Datadog considered it a genuine product offering rather than a mere gimmick.
  • petesergeant 7 hours ago
    If you're looking for somewhere to pipe your logs, Axiom's been great and very cheap.
  • BoorishBears 9 hours ago
    Seems a bit too perfect that the AI SRE gets unfairly blocked
  • antonvs 9 hours ago
    Edit: a couple of comments pointed out that the blog does mention paying Datadog. Leaving my comment as is below, because I still find the whole interaction weird. It makes me wonder if the story is fabricated.

    > we lost visibility into production systems that depend fundamentally on continuous observability signals to operate safely.

    The Datadog message implies that Deductive wasn't paying for any service from Datadog: "We've noticed you're actively evaluating Datadog" and "our Master Subscription Agreement that you accepted by using our service".

    And Deductive apparently did this from Feb to Dec 2025. Quite a long time for a free evaluation, but perhaps they were just using the very limited free tier?

    It's a little strange to be relying on a free tier or evaluation for "production systems that depend fundamentally on continuous observability". Presumably it couldn't have been that important to Deductive, otherwise they would have paid for the service they were "depending fundamentally" on.

    • valinator 9 hours ago
      > Two things, however, were consistently true throughout our usage. First, our Datadog bills were steep, roughly 2-3x of what we would otherwise expect to pay for equivalent telemetry storage and retention. Second, despite the richness of the platform, we rarely used Datadog for anything beyond being a reliable system of record for logs, metrics, and traces. We were paying for workflows we almost never touched.

      This paragraph from the article makes it clear otherwise.

    • chanux 9 hours ago
      > First, our Datadog bills were steep, roughly 2-3x of what we would otherwise expect to pay for equivalent telemetry storage and retention.

      The blog says they were paying(?)

      • antonvs 9 hours ago
        Thanks, I missed that. Strange message from Datadog then. Assuming it’s real.
    • Nextgrid 9 hours ago
      It may very well be a canned message since such issues would typically be detected during the evaluation period.
      • antonvs 8 hours ago
        It could also be a fabricated story.
    • solid_fuel 9 hours ago
      > otherwise they would have paid for the service they were "depending fundamentally" on.

      It's a "*.ai" company. Deductive probably spent more human time on their fancy animated landing page than engineering their actual system. If they vibe coded most of their product, I wouldn't be surprised if they didn't even know they were using Datadog until they got the email.

      • antonvs 5 hours ago
        Yeah I think this explanation makes most sense. The story doesn’t add up, it’s just deceptive marketing.
  • Morromist 9 hours ago
    This was clearly written by a bot heavily trained on linkedin posts, or someone horrifically addicted to linkedin. Its nauseating to read.
    • stevage 9 hours ago
      Yeah there's a lot of repetition of the core tehsis and not much about the interesting part, how they switched.
    • nkrisc 9 hours ago
      Even the crappy AI images are distracting.

      Why does the gate say “blocked” if the stuff is clearly still flowing through it? Having no image is better than shitty ones.

    • selestify 9 hours ago
      Why do you say that? It didn’t come across that way to me at all, perhaps because I don’t spend much time on LinkedIn. But even granting that, the content presented was interesting and useful.
      • Morromist 8 hours ago
        The big tell is Blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah

        Hype Phrase! (but like super over the top) "That world no longer exists." "code is cheap" "That future is already here."

        repeat.

        Very ai. Ai loves to summerise stuff in cheesy little bits like that. Very linkedin. Very bad writing too.

      • chrneu 8 hours ago
        The wording and repetition made me think this was likely, at minimum, written by a non-english speaker who used AI to translate it.

        But looking back at it with an AI nose going, it does have a ton of AI slop feeling to it. LinkedIn slop is kind of interesting to read. The repetition is pretty obvious, though. This reads like it was written by a 10th grade english class trying to fit a very specific structure. Like every section had to check a list of requirements, which it did cuz it's AI.

    • sph 2 hours ago
      - Clickbait title

      - AI mentioned in the subtitle

      - AI generated image

      Straight to /dev/null with this slop.

  • echelon 9 hours ago
    Logging, tracing, observability, and control plane (flags, etc.) should be open.

    We built 100% in-house pieces for all of this at a major fintech a decade ago. Everything worked and single teams could manage these systems.

    Someone in leadership said we had to get rid of all "weirdware". Open solutions weren't robust, so we went commerical.

    SignalFX got acquired, immediately 10x'd our prices and put all hands on deck to migrate. Unscheduled, stressful, bullshit. We missed the migration date and had to pay anyway.

    LaunchDarkly promised us the moon to replace the system my team built. It didn't work with Ruby, Go, and the Java client sucked. It couldn't sync online changes at runtime like our five nines distributed and fault tolerant system could. We had to upstream a ton of code. And their system still sucked by the time I left the project.

    These systems need to be open and owned by us. Managed is okay, but they shouldn't be proprietary offerings.

    I could extend that one step further to cloud itself, but that's an argument for another day.

    • pm90 8 hours ago
      > I could extend that one step further to cloud itself, but that's an argument for another day

      Absolutely. OSS platforms like k8s got a long way. Openstack was the dream (deeply flawed in execution). If we want to seriously talk about resilience we can’t accept that almost all major clouds run proprietary systems and we just have to trust them that they’ll be around forever.

    • 0xbadcafebee 8 hours ago
      NIH syndrome isn't sustainable, unless you're like Google and have more money than sense.

      > These systems need to be open and owned by us. Managed is okay, but they shouldn't be proprietary offerings.

      You could say this about all software in the world, but good luck with that... people who make money off of making things and selling things are going to keep doing so in non-open ways, because it's advantageous. And customers will keep buying them, because it's better than the alternative.

      • zdc1 8 hours ago
        My last place also rolled their own feature flag service as their business logic around users/orgs/segments didn't neatly match anything off-the-shelf. It did what it was meant to and worked fine. OTOH we used Datadog for telemetry, which was expensive but made sense since we didn't have enough headcount with the skills to support something self-hosted.

        At the end of the day, you just need to make good decisions based on honest analysis of your needs, capabilities, and general context.

      • echelon 7 hours ago
        > NIH syndrome isn't sustainable, unless you're like Google and have more money than sense.

        Control plane and observability are key concerns of a fintech handling billions in daily transaction volume.

        We had teams building and managing our solutions. After the migrations, we had teams managing the integrations. The headcount didn't change, we just wound up paying external vendors and sequencing multiple provider moves and company wide migrations. The changes caused several outages and shifted OKRs.