1
Fork 0
mirror of https://github.com/thegeneralist01/archivr synced 2026-05-30 08:36:47 +02:00
No description
Find a file
TheGeneralist cd7dfd7c8a
Merge pull request #3 from thegeneralist01/codex/feat/archiving-twitter-threads
feat: add generic media source handling and local file archiving
2026-04-03 14:46:16 +02:00
docs Flatten tweet archives and rearchive tweet assets 2026-04-01 14:56:39 +02:00
src Rename resolve_from_cwd to absolutize_path 2026-04-02 21:13:55 +02:00
vendor/twitter Add Twitter tweet and thread archiving support 2026-03-31 21:25:24 +02:00
.gitignore Add Twitter tweet and thread archiving support 2026-03-31 21:25:24 +02:00
Cargo.lock feat: add archiving of platform media files (#1) 2026-03-31 12:39:35 +02:00
Cargo.toml feat: add archiving of platform media files (#1) 2026-03-31 12:39:35 +02:00
flake.lock feat: add archiving of platform media files (#1) 2026-03-31 12:39:35 +02:00
flake.nix Add Twitter tweet and thread archiving support 2026-03-31 21:25:24 +02:00

archivr

An open-source self-hosted archiving tool. Work in progress.

Milestones

  • Archiving
    • Archiving media files from social media platforms
      • YouTube Videos
      • Twitter Videos
      • Instagram
      • Facebook
      • TikTok
      • Reddit
      • Snapchat
      • YouTube Posts (postponed)
    • Archiving local files
    • Archiving files from cloud storage services (Google Drive, Dropbox, OneDrive) and from URLs
      • URLs
      • Google Drive
      • Dropbox
      • OneDrive
      • (Some of these could be postponed for later.)
    • Archiving Twitter threads
    • Archive web pages (HTML, CSS, JS, images)
    • Archiving emails (???)
      • Gmail
      • Outlook
      • Yahoo Mail
  • Management
    • Deduplication
    • Tagging system
    • Search functionality
    • Categorization
    • Metadata extraction and storage
  • User Interface
    • Web-based UI
  • Backup and Sync
    • Cloud backup (AWS S3, Google Cloud Storage)
    • Local backup

Motivation

There are two driving factors behind this project:

  • In the age of information, all data is ephemeral. Social media platforms frequently delete content, and cloud storage services can become inaccessible and unreliable. Being able to archive important data is very important for preserving personal memories and digital history.
  • I will be creating a small encyclopedia for my future family and kids. Therefore, I want to make sure that all the information I gather is preserved and accessible for future reference.

This project aims to provide a reliable solution for archiving important data from various sources, ensuring that users can preserve their digital assets for the long term.

Twitter/X Archive Inputs

  • Tweet content TOML: tweet:ID, x:tweet:ID, x:x:ID, twitter:x:ID, twitter:tweet:ID
  • Tweet media/video: tweet:media:ID
  • Thread TOML content: x:thread:ID, twitter:thread:ID

Tweet and thread TOMLs are stored directly in raw_tweets/. Downloaded tweet media and avatars are re-archived into the hashed raw/ store, and the TOMLs point at those archived files using store-relative raw/... paths.

Twitter tweet/thread scraping requires ARCHIVR_TWITTER_CREDENTIALS_FILE to point to a cookies file for the vendored scraper.

License

This project is licensed under the MIT License. See the LICENSE file for details.