Thank you for using ODGrep! This tool help you to set up local mirror of your (friend's) Open Diary. It work specifically to www.opendiary.com. It is public domain so you may use it anyway you wish. You can modify it freely and free of charge. You are welcomed to write to me (zzzhong@zzzhong.com) if you like or dislike it. The usual disclaimer applies: This software comes with absolutely no implict or explict warrant of any kind including but not limit to suitability, accuracy, personal and national safely. Author is in no way responsibility for whatever consequences caused by using this piece of work. By using this software you agreed with this terms & condition. Actually ODGrep is just one-day-work. The time for documenting is just about to code the scripts! Newsgroup: news.cuhkacs.org/cuhk.acg.comp (English / Big5 chinese) --zzzhong ======= FEATURE ======= - Run at any time interval by installation as cron job - Scan and detect update at OD diary main page - Grep and clean new entries - Save cleaned HTML to local folder for offline archive or online mirror - Optionally send new entries as email / usenet article - Optionally notify people by ICQ when new entries arrive - Tested on multibyte/unicode diary =========== REQUIREMENT =========== - Prefer you own your (Linix / *nix) machine or have a shell account (this program is not installed as root) - Internet access on that machine (of course) by wget - You can add cron job and execute bash / perl script - Apache web server if you want online mirror - Licq (optional) - PHP to control icq notification subscription (optional) - MTA with Sendmail command interface (optional) - Inn usenet server (optional) ==== TODO ==== This list is at your imagination. However, because intention of ODGrep is for my own use, I will not maintain it further unless there is strong interest of this piece. Contribution is very welcomed. - Automatic download of past articles - Cleaner setup by seperating code and mirror pages - RSS compatible feeds ============================= Preferred way of installation ============================= - Extract tar ball to a place. If you want public mirror, it can be your web path like /home/myaccount/public_html - Edit refresh.sh AUTHORCODE For example, http://www.opendiary.com/...?authorcode=A123456 Then enter AUTHORCODE=A123456 SEARCHROW You can leave it blank now. You may edit this later if you want to get past articles. LICQ_FIFO See ICQ setup section in this file. Otherwise leave it blank / commented. - If you need newsgroup (usenet) synchronization you may need to edit entrycollect.pl now before getting past articles. See the section Newsgroup mirror in this file. For email / mailing-list archive you properly do not want to flood your / others mail box with hundreds of old articles, so you may do this later. ===================== Download past entries ===================== SEARCHROW A bit background: You know OD site list diary content usually not in one page. Current setting is 30 entries per page. For example, if you have 1000 entries, they are separated in 34 pages. If you want to download all the past entries, you have to pay a bit effort by following these steps. If you are only interested in getting last-30 and future articles, just leave the setting as-is. (leave it blank after the equal sign). Large SEARCHROW means older entries while <30 means the latest 30 entries. Leave it blank means it always get latest 30 entries. - At OD site home, click "diary contents" on left if you are not there. - You see a list of entries at the right side of page. At the bottom of list there is a "Beginning" link. Point the mouse / click the URL, you see the URL is something like: http://www.opendiary.com/...&searchrow=757&chapter... Note the number after "searchrow=". Enter this number at refresh.sh - Run ./refresh.sh - Now edit refresh.sh and ***decrement*** SEARCHROW by 30. You may confirm this number (whether it is 30 per page) by clicking "Next" link in original OD site. Then run ./refresh.sh again - Repeat above step to get remaining articles (change SEARCHROW, run ./refresh.sh) until the number is less than 30. - Change SEARCHROW to empty (delete anything after equal sign but keep the equal sign) so that you get latest articles from now on. - ODgrep will not get duplicated articles if you run refresh.sh multiples times. But you can only change SEARCHROW in decrement order. For example, if you define it as 100 or empty, refresh it, there is no effect if you change it to 200 something. A quick way to fix it is to edit file "top.txt" and reset the counter to 10000 (according to current OD, 10000 is the oldest article number). ===================================== Scheduled synchronization by cron job ===================================== - Run "crontab -e" to install cron job on linux. It open an editor. If it has nothing, you can follow the example below for refreshing every four hours SHELL=/bin/bash 30 */4 * * * $HOME/public_html/diarygrep/refresh.sh >> $HOME/public_html/diarygrep/cron.log 2>&1 ================================================================= ADVANCED SETUP ADVANCED SETUP ADVANCED SETUP ADVANCED SETUP ================================================================= =========================== Email / mailing-list mirror =========================== - Edit entrycollect.pl. Follow the example to set up destinated email address ($to_addr), sender name ($from_header). For example, mail to foo@nowhere.com: $to_addr = 'foo@nowhere.com'; mail to local account: $to_addr = 'myaccount'; Sorry, there is only one email address destination. Condsider to set up simple type of mailing-list (/etc/aliases / .forward / .procmailrc) to send to multiple email destination. GNU Mailman is a good and popular mailing list software to make archieve and let user to subscribe/unsubscribe through web page as well as many other features. Sendmail path ($sendmail): default usually OK. Confirm by command "which sendmail". ================ Newsgroup mirror ================ Sample setup below: - Download our mail-to-post gateway script from our web site: http://odgrep.sourceforge.net This script is slightly modified from original inn mailpost. You may use original inn script for your own goodness (I have not test it! my past experience is that one is not working great so I make mailpost2 for my own use) Put the script at your favorite location, example: /usr/local/bin chmod 755 mailpost2 Edit the script to confirm INN and perl path. In particular, older Redhat RPM is "/usr/lib/innshellvars.pl" Newer Redhat/Gentoo is "/usr/lib/news/lib/innshellvars.pl" FreeBSD is "/usr/local/news/libinnshellvars.pl" - Create a script in smrsh path On Redhat it is /etc/smrsh Gentoo is /usr/adm/sm.rsh FreeBSD is /usr/libexec/sm.bin For others, "man smrsh". Our example in Redhat pick /etc/smrsh/mailpost The script content is: #!/bin/bash exec /usr/local/bin/mailpost2 -o localhost $* If you need login to password protected group, it looks like #!/bin/bash exec /usr/local/bin/mailpost2 -u login -p 1234 -o news.example.com $* Especially you have password in the script, pay attention to permission: chown mail:mail /etc/smrsh/mailpost chmod 700 /etc/smrsh/mailpost - If you are news server administrator, you may need to add a new group: # ctlinnd newgroup mirror.opendiary.alice - Edit alias file /etc/aliases or /etc/mail/aliases, add a line mymaildrop.address: "|mailpost mirror.opendiary.alice" - Follow mailing-list mirror section to enter mail destination in entrycollect.pl. $to_addr = 'mymaildrop.address'; ================ ICQ notification ================ First set up licq running as a "daemon" that is start up every time you boot. An example Redhat type init script is available at our web site http://odgrep.sourceforge.net I suggest you register a fresh ICQ number for easy management. You may use your own / odgrep user or create another shell user to run licq "daemon". If you are currently using licq and want another identity, setup licq by "licq -b ~/.licq.123" and change init script "conf=/home/$user/.licq.123". The script assume you always use display :1. First local X display is usually :0 or it can be your vncserver screen. Look through the init script and see if you have to change other variables such as path, user name, X screen. The user running odgrep script should have permission to access that icq directory and write to the fifo pipe file licq_fifo. To add an buddies for notification, edit icq-subscription and enter their ICQ numbers, each on a new line. (For simplicity, please use ICQ number as buddy name too). These buddies should be already added in the buddy list on sending agent. The file icq.php let receiving party to control subscribe or un-subscribe themselves. But YOU keep control of your buddy list so no one outside your buddy list can receive notification by mistake or evil wills. You may do some experiment before real production use. Add only your ICQ number to icq-subscription. Run the command to test: # LICQ_FIFO=/home/myaccount/.licq.123/licq_fifo ./sendicq.sh If OK, edit refresh.sh, uncomment line LICQ_FIFO as LICQ_FIFO=/home/myaccount/.licq.123/licq_fifo By default the subscription list is hidden from web access using .htpasswd. ======= Privacy ======= Some people may need to protect diary mirror for privacy concern. - Sample set up of HTTP authentication: Create an user by command: # htpasswd -c /home/myaccount/htpasswd myfriend Edit htaccess-passwd, change AuthUserFile to /home/myaccount/htpasswd save as .htaccess - For usenet login, refer to readers.conf. Example auth "any" { hosts: "*" auth: "ckpasswd -f /etc/news/passwd.nnrpd" default: "" } access "myfriend" { users: "myfriend*" newsgroups: "*.test, mirror.opendiary.*" } You can get inn password handling perl script from our web site http://odgrep.sourceforge.net