CIA.vc
Bixo
Web Mining toolkit
Stats » Projects » Bixo
information
Bixo is an open source web mining toolkit that runs as a series of Cascading pipes on top of Hadoop. By building a customized Cascading pipe assembly, you can quickly create specialized web mining applications that are optimized for a particular use case.
syndicateUTC clock
08:10 on May 27, 2012
event counters
The last message was received 1 month ago at 08:29 on Apr 27, 2012
0 messages so far today, 0 messages yesterday
0 messages so far this week, 0 messages last week
0 messages so far this month, 7 messages last month
159 messages since the first one, 1.62 years ago, for an average of 3.72 days between messages
recent messages
dateReversed sort columnprojectcontentlink
10:12 on Apr 26bixo
Commit by vivek on master :: r69a09bf / (4 files in 3 dirs): (link)
Moved the bin/bixo file to the examples directory to make it clear that the script is part of the examples.
Modified the 'dist' target in the main build file such that we now set up the runtime directory within the examples directory and and examples jar file is also kept at this layer. - http://git.io/21l6Bw
#
10:10 on Apr 26bixo
Commit by vivek on master :: r41b48bb / examples/build.xml : (link)
Added a 'runtime' target that is used when building the distribution file. - http://git.io/HHfEyQ
#
08:33 on Apr 26bixo
Commit by vivek on master :: r1a1a061 / examples/build.xml : (link)
Fixed bug where resources weren't getting copied into the jar. - http://git.io/t4eb-g
#
22:31 on Apr 17bixo
Commit by Ken Krugler on master :: r6f7ff3d / src/test/java/bixo/utils/ThreadedExecutorTest.java : (link)
Make threading test more resilient to being run on heavily loaded computer

This test would fail if I was converting QT movies - http://git.io/JpD8Dw
#
22:30 on Apr 17bixo
Commit by Ken Krugler on master :: ra8b5d1d / examples/src/main/java/bixo/examples/crawl/SimpleStatusTool.java : (link)
Fix SimpleStatusTool using the wrong field - http://git.io/Ohaz9A
#
21:51 on Apr 05bixo
Commit by vivek on master :: r24c09b3 / doc/Releasing.txt : (link)
Minor fix up to releasing notes. - http://git.io/PprFjA
#
18:31 on Apr 05bixo
Commit by vivek on master :: r34cd850 / doc/Releasing.txt : (link)
Updated releasing notes. (+40 more commits...) - http://git.io/-MBpVA
#
18:58 on Mar 29bixo
Commit by vivek on webmining :: r35b66ba / (3 files in 2 dirs): (link)
Reset version to 1.0-SNAPSHOT as we are getting ready to merge this branch back into master. - http://git.io/WogU_w
#
16:53 on Mar 29bixo
Commit by vivek on webmining :: rd0ba144 / (16 files in 11 dirs): (link)
Warning(s) patrol. (+7 more commits...) - http://git.io/r5pH8g
#
16:48 on Mar 28bixo
Commit by Ken Krugler on webmining :: r6a6230e / src/main/java/bixo/parser/DOMParser.java : (link)
Strip out namespace from XML - http://git.io/XT2YQQ
#
16:48 on Mar 28bixo
Commit by Ken Krugler on webmining :: r1cc9c0d / src/main/java/bixo/parser/DOMContentExtractor.java : (link)
Get rid of unused DOMContentExtractor - http://git.io/3-ctsQ
#
01:51 on Mar 28bixo
Commit by Ken Krugler on webmining :: rf5352e5 / (25 files in 7 dirs): (link)
Finish up most changes for webmining

The AnalyzeHtml function still needs to be fixed up. - http://git.io/6zV--Q
#
18:23 on Mar 27bixo
Commit by vivek on webmining :: r27bc29c / (6 files in 3 dirs): (link)
Fix some javadoc errors. - http://git.io/htFoOg
#
18:05 on Mar 27bixo
Commit by vivek on webmining :: r4afb8af / (4 files): (link)
Added a UrlImporter to demonstrate how one can populate a crawl db with more than one url/domain. - http://git.io/G8cypg
#
18:04 on Mar 27bixo
Commit by vivek on webmining :: r068dfae / bin/bixo : (link)
Fix up bin/bixo script to include the bixo-examples jar in the classpath. - http://git.io/NN7Xmw
#
21:34 on Mar 26bixo
Commit by vivek on webmining :: rd4bde12 / (4 files in 3 dirs): (link)
Some more files that needed to have the license boilerplate updated. - http://git.io/tAJp7Q
#
21:28 on Mar 26bixo
Commit by vivek on webmining :: rd42f2c4 / examples/doc/hadoop-lists.txt : (link)
Delete unused file. - http://git.io/-BRlHg
#
21:20 on Mar 26bixo
Commit by vivek on webmining :: rfba03bb / (1198 files in 9 dirs): (link)
Updated to ec2-tools-api 1.5.2.5 and deleted the older versions. - http://git.io/ymE34A
#
18:42 on Mar 26bixo
Commit by vivek on webmining :: rc0ec993 / (examples/pom.xml pom.xml): (link)
Hadoop ends up pulling in ant 1.6.5 (via jsp ) which we don't want. - http://git.io/iYCvnQ
#
18:00 on Mar 26bixo
Commit by vivek on webmining :: r2b58e38 / (63 files in 17 dirs): (link)
Copyright update - moved to APL 2.0 (I had missed some files earlier). - http://git.io/1dRSwA
#