1
Fork 0
mirror of https://github.com/thegeneralist01/archivr synced 2026-05-30 08:36:47 +02:00

feat: add archiving of platform media files (#1)

* chore: specify non-ignored `.md` files

* refactor: rename youtube downloader to ytdlp

More generic name since yt-dlp supports many sites beyond YouTube.

* feat: add local file downloader

Supports file:// URLs for archiving local files.

* deps: add regex crate for URL pattern matching

* feat: expand source detection with granular YouTube types

- Split Source::YouTube into YouTubeVideo, YouTubePlaylist, YouTubeChannel
- Add Source::X for Twitter/X posts
- Add Source::Local for file:// URLs
- Add regex-based URL pattern matching for YouTube URLs
- Add shorthand schemes (yt:video/ID, youtube:playlist/ID, etc.)
- Add comprehensive tests for all URL patterns

* docs: update README milestones

Mark YouTube videos, Twitter videos, and local files as done.

* chore: update flake.lock

* feat: add shorthand schemes for X/Twitter media

* chore: move docs into docs dir

* Remove temp file using timestamp path

Delete the temp entry at store_path/temp/<timestamp> in both
the hash-exists and success paths. Stop constructing the full filename
with extension and remove the early process::exit to de-duplicate
cleanup.

* Add Nix caches and default flake package

* Add social platform source detection and update milestones

* Tighten social URL matching to avoid false positives

* Mark media archiving milestone complete
This commit is contained in:
TheGeneralist 2026-03-31 12:39:35 +02:00 committed by GitHub
parent 553cca99ca
commit 2d59ab0af5
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
12 changed files with 616 additions and 74 deletions

21
docs/LICENSE.md Normal file
View file

@ -0,0 +1,21 @@
# MIT License
Copyright (c) 2025-present thegeneralist01
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.

49
docs/README.md Normal file
View file

@ -0,0 +1,49 @@
# archivr
An open-source self-hosted archiving tool. Work in progress.
## Milestones
- [ ] Archiving
- [X] Archiving media files from social media platforms
- [X] YouTube Videos
- [X] Twitter Videos
- [X] Instagram
- [X] Facebook
- [X] TikTok
- [X] Reddit
- [X] Snapchat
- [ ] YouTube Posts (postponed)
- [X] Archiving local files
- [ ] Archiving files from cloud storage services (Google Drive, Dropbox, OneDrive) and from URLs
- [ ] URLs
- [ ] Google Drive
- [ ] Dropbox
- [ ] OneDrive
- (Some of these could be postponed for later.)
- [ ] Archiving Twitter threads
- [ ] Archive web pages (HTML, CSS, JS, images)
- [ ] Archiving emails (???)
- [ ] Gmail
- [ ] Outlook
- [ ] Yahoo Mail
- [ ] Management
- [ ] Deduplication
- [ ] Tagging system
- [ ] Search functionality
- [ ] Categorization
- [ ] Metadata extraction and storage
- [ ] User Interface
- [ ] Web-based UI
- [ ] Backup and Sync
- [ ] Cloud backup (AWS S3, Google Cloud Storage)
- [ ] Local backup
## Motivation
There are two driving factors behind this project:
- In the age of information, all data is ephemeral. Social media platforms frequently delete content, and cloud storage services can become inaccessible and unreliable. Being able to archive important data is *very important* for preserving personal memories and digital history.
- I will be creating a small encyclopedia for my future family and kids. Therefore, I want to make sure that all the information I gather is preserved and accessible for future reference.
This project aims to provide a reliable solution for archiving important data from various sources, ensuring that users can preserve their digital assets for the long term.
## License
This project is licensed under the MIT License. See the [LICENSE](LICENSE.md) file for details.