• 0 Posts
  • 22 Comments
Joined 2 years ago
cake
Cake day: July 5th, 2023

help-circle
  • Javascript for this seems like the wrong tool. The http server itself can usually be configured to serve alternative images (including different formats) to supporting browsers, where it serves JXL if supported, falls back to webp if not, and falls back to JPEG if webp isn’t supported.

    And the increased server side adoption for JXL can run up the stats to encourage the Chromium team to resume support for JXL, and encourage the Firefox team to move support out from nightly behind a flag, especially because one of the most popular competing browsers (Safari on Apple devices) does already support JXL.


  • It’s not too late.

    The current standard on the web is JPEG for photographic images. Everyone agrees that it’s an inefficient standard in terms of quality for file size, and that its 8-bit RGB support isn’t enough for higher dynamic range or transparency. So the different stakeholders have been exploring new modern formats for different things:

    WEBP is open source and royalty free, and has wide support, especially by Google (who controls a major image search engine and the dominant web browser), and is more efficient than JPEG and PNG in lossy and lossless compression. It’s 15 years old and is showing its age as we move towards cameras that capture better dynamic range than the 8-bit limits of webp (or JPEG for that matter). It’s still being updated, so things like transparency have been added (but aren’t supported by all webp software).

    AVIF supports HDR and has even better file size efficiency than webp. It’s also open source and royalty free, and is maintained by the Linux Foundation (for those who prefer a format controlled by a nonprofit). It supports transparency and animation out of the box, so it doesn’t encounter the same partial support issues as webp. One drawback is that the AVIF format requires a bit more computational power to encode or decode.

    HEIC is more efficient than JPEG, supports high bit depth and transparency, but is encumbered by patents so that support requires royalty payments. The only reason why it’s in the conversation is because it has extensive hardware acceleration support by virtue of its reliance on the HEVC/h.265 codec, and because it’s Apple’s default image format for new pictures taken by its iPhone/iPad cameras.

    JPEG XL has the best of all possible worlds. It supports higher bit depths, transparency, animation, lossless compression. It’s open source and royalty free. And most importantly, it has a dedicated compression path for taking existing JPEG images and losslessly shrinking the file size. That’s really important for the vast majority of digitally stored images, because people tend to only have the compressed JPEG version. The actual encoding and decoding is less computationally intensive than webp or avif. It’s a robust enough standard for not just web images, but raw camera captures (potentially replacing DNG and similar formats), raw document scans and other captured imagery (replacing TIFF), and large scale printing (where TIFF is still often in the workflow).

    So even as webp and avif and heic show up in more and more places, the constant push forward still allows JXL to compete on its own merits. If nothing else, JXL is the only drop in replacement where web servers can silently serve the JXL version of a file when supported, even if the “original” image uploaded to the site was in JPEG format, with basically zero drawbacks. But even on everything else, the technical advantages might support processing and workflows in JXL, from capture to processing to printing.






  • I was a dual major Electrical Engineering/Philosophy. The rigorous logic in some branches of philosophy was very helpful for programming principles. And the the philosophy of mathematics and philosophy of mind has overlaps with and supplements modern AI theory pretty well.

    I’m out of the tech world now but if I were hiring entry level software developers, I’d consider a philosophy degree to be a plus, at least for people who have the threshold competency in actual programming.






  • to decide for what purpose it gets used for

    Yeah, fuck everything about that. If I’m a site visitor I should be able to do what I want with the data you send me. If I bypass your ads, or use your words to write a newspaper article that you don’t like, tough shit. Publishing information is choosing not to control what happens to the information after it leaves your control.

    Don’t like it? Make me sign an NDA. And even then, violating an NDA isn’t a crime, much less a felony punishable by years of prison time.

    Interpreting the CFAA to cover scraping is absurd and draconian.


  • What counts as an algorithm? Surely it can’t be the actual definition of algorithm.

    Because in most forum software (even the older stuff that predates reddit or social media) if I just click on a username, that fetches from the database every comment that the user has ever made, usually sorted in reverse chronological order. That technically fits the definition of an algorithm, and presents that user’s authored content in a manner that correlates the comments with the same user, regardless of where it originally appeared (in specific threads).

    So if it generates a webpage that shows the person once made a comment in a cooking subreddit that says “I’m a Muslim and I love the halal version” next to a comment posted to a college admissions subreddit that says “I graduated from Harvard in 2019” next to a comment posted to a gardening subreddit that says “I live in Berlin,” does reddit violate the GDPR by assembling this information all in one place?



  • To be precise, the “lossless” compression is still a compression algorithm. They just didn’t implement the steps that actually make the compression algorithm lossless.

    From the write up:

    JBIG2, the image format used in the affected PDFs, usually has lossless and lossy operation modes. Pattern Matching & Substitution„ (PM&S) is one of the standard operation modes for lossy JBIG2, and „Soft Pattern Matching“ (SPM) for lossless JBIG2 (Read here or read the papery by Paul Howard et al.1)). In the JBIG2 standard, the named techniques are called „Symbol Matching“.

    PM&S works lossy, SPM lossless. Both operation modes have the basics in common: Images are cut into small segments, which are grouped by similarity. For every group only a representative segment is is saved that gets reused instead of other group members, which may cause character substitution. Different to PM&S, SPM corrects such errors by additionally saving difference images containing the differences of the reused symbols in comparison to the original image. This correction step seems to have been left out by Xerox.





  • That doesn’t logically follow so no, that would not make an ad blocker unauthorized under the CFAA.

    The CFAA also criminalizes “exceeding authorized access” in every place it criminalizes accessing without authorization. My position is that mere permission (in a colloquial sense, not necessarily technical IT permissions) isn’t enough to define authorization. Social expectations and even contractual restrictions shouldn’t be enough to define “authorization” in this criminal statute.

    To purposefully circumvent that access would be considered unauthorized.

    Even as a normal non-bot user who sees the cloudflare landing page because they’re on a VPN or happen to share an IP address with someone who was abusing the network? No, circumventing those gatekeeping functions is no different than circumventing a paywall on a newspaper website by deleting cookies or something. Or using a VPN or relay to get around rate limiting.

    The idea of criminalizing scrapers or scripts would be a policy disaster.


  • gaining unauthorized access to a computer system

    And my point is that defining “unauthorized” to include visitors using unauthorized tools/methods to access a publicly visible resource would be a policy disaster.

    If I put a banner on my site that says “by visiting my site you agree not to modify the scripts or ads displayed on the site,” does that make my visit with an ad blocker “unauthorized” under the CFAA? I think the answer should obviously be “no,” and that the way to define “authorization” is whether the website puts up some kind of login/authentication mechanism to block or allow specific users, not to put a simple request to the visiting public to please respect the rules of the site.

    To me, a robots.txt is more like a friendly request to unauthenticated visitors than it is a technical implementation of some kind of authentication mechanism.

    Scraping isn’t hacking. I agree with the Third Circuit and the EFF: If the website owner makes a resource available to visitors without authentication, then accessing those resources isn’t a crime, even if the website owner didn’t intend for site visitors to use that specific method.