Friday, November 21, 2008

Week 11 muddy point

This is regarding the DiLight hand-on-point. I registered and everything and was looking around and a ton of the links didn't work - I kept getting an 'internal server error'. Also, a few links from the slides were outdated and didn't work. For an IT class, this isn't a good thing.

Thursday, November 20, 2008

Week 12 Readings

Weblogs: Their use and application in science and technology libraries

This article provided an interesting history of blogs (in all honesty, I hate that word) and find it very neat that they sprung from sites that summarized new/interesting websites way back in 1996, when the Internet was much smaller obviously. I wonder if things like Digg are similar to this now or are there other sites that still do this? I really do not know but I would think it would be next to impossible because of the sheer size of the Internet.

I like the whole idea of 'co-evolution' instead of control when it comes to interaction between people on blogs. I can only assume that's a main draw of them and I am starting to see how they could really help for members of research projects, especially in medicine and other fields where timeliness of publication, etc. is very important.

I didn't really understand the whole 'having to go to the website' criticism of blogs. This is the only criticism metioned in the article and it is laughably minor. I wonder if other criticisms exist.

It is going to be more and more important for librarians of all types to know how to create and use blogs effectively. However, creating them is the easy part. Finding the time to keep them updated, etc. isn't so easy.

Using a wiki to manage a library instruction program...

This article provides a good definition and various uses of wikis in a very specific setting. The main idea I got from the article is that knowledge should be shared, coordinated and built upon and that's what wikis do. In the long run, I wonder what effect wikis and blogs will have on advances in science and technology.

Creating the academic library folksonomy...

Trying to 'catalog the Internet' seems like an impossible task but through social tagging of items from a variety of web sources (licensed databases, etc), it is a good replacement for the idea of cataloging. The article is right in that there is a lot of 'gray literature' out there. Right now at work, we have a subscription to the New York Academy of Medicine Gray Literature Report which arrives bimonthly to my email address and provides links to hundreds of gray articles, reports, etc. Here's a link to the most current report

This report helps but how do we really know what else is out there that would benefit the research being done in my department? Hopefully because of things like Zotera and citeulike which are somewhat easy to use and very helpful, more and more people will get turned onto tagging.

Wikipedia video

It's interesting that this class has been the only one in my MLIS program that had readings from Wikipedia. Most other professors forbid the use of it quite adamantly. Because of this, it was just beat into my head that Wikipedia was useless for research purposes and information found on it cannot be trusted. My opinion has changed from watching the video. Wales said that 'everyone should have free access to the sum of human knowledge' and that is quite a great goal to have. I did not realize that changes, etc. were done by a close-knit community who take their jobs very seriously and he also points out that people who write an encyclopedia for FUN tend to be smart, - I don't think anyone could disagree with that!

The fact that they handle controversial articles well, are quick to take action when someone does something bad (i.e. the skinhead example) and claim to not have the built-in biases that other encyclopedias and textbooks have makes me trust it a lot more. Verifying a wikipedia entry's references will always be necessary (and just a good practice) but at least I feel like I can use it with more confidence now.

Friday, November 14, 2008

Week 11 comments

Digital Libraries: Challenges and Influential Work

This article describes discusses the the powerful tools we have to access resources and the changes that are happening to make access more efficient. It is a good history lesson in how digital libraries really came about and how federal funds played a big part in what we have to work with now and what we will have in the future. I think it was very forward-thinking (which I don't normally say about the government) of the federal government to work with the NSF and NASA to fund DLI. A lot of technologies that complement the activities funded by the DLI were not federally funded too.

Federated searching seems to be an important issue that still needs to be addressed now.

One thing I was curious about in the article was how it discussed metadata searching v. full-text searching. I wonder if Google is doing or looking into being able to conduct both types of searching.

The last issue the article mentions are library portals and how the NISO Metasearch Initiative is trying to develop standards for libraries to have one-search access to multiple resources through an easy Google-type page. I sometimes have difficulty searching, for instance, the CLP site and think sometimes that it's too busy and takes too many clicks to actually get to what is needed. Having an easier way to search all of the data bases at one time would be a very good thing, especially since a lot of people are used to this type of searching/retrieving.

Dewey Meets Turing

This article sort of pits computer scientists against librarians and then resolves the issues each discipline faces in working together in the world of the DLI. It appears that both sides want to hold on to their traditional roles and still be able to move forward together and by the end of the article, it is clear that this is possible and is currently happening. Librarians of the future will be working even more closely with computer scientists in the emergent institutional repository realm, for instance. All librarians will have to be more forward-thinking and proactive to help find solutions to some problems that still remain and to know that they still have a very viable and important job to do, just like the computer scientists.

Institutional Repositories

Lynch wrote a very interesting article that stemmed from a talk he gave at a workshop on insitutional repositories and their role in scholarship. It does seem like institutional repositories, if handled properly, could really increase collaboration between different universities, especially when it comes to data sharing, etc. Right now, these collaborations can be very costly and unweildly, especially in medicine, where data bases, etc. have to be run and funded (sometimes at very high cost) through the grant. The money that could be saved here could be used for more actual research.

Another interesting thing about the article was the discussion of the increase in traditional journal articles having supplementary materials published online. This is both a blessing and a curse. The New England Journal of Medicine, for instance, has been doing this for a while now and has slowly gotten better with: 1) actually making it very clear in the article that there is some supplementary material available; and 2) being able to find and access this information well after the publication date. It appears that the supplementary material will be forever linked with the actual article. However, when one downloads the PDF of the article itself, the supplementary information is not there. A better system would have it all in the same file and then with the option for the user to print/save the supplementary material. Right now, the system is still a bit burdensome. Perhaps a better system would be to have all of this information in the author's institutional repository but then the question arises, would outsiders have access to it? Would the journal subscribers?

Just like Lynch mentions, institutional repositories have to be set up so as to further and enhance scholarly work, not make it more burdensome.

Thursday, November 13, 2008

Week 10 muddy point

With regards to link analysis and page ranking, I was wondering about Google. I seem to recall that people can pay to have their websites ranked higher in the Google results pages and within the past 2-3 or so years, the sites at the top of a Google results page are commercial sites. I guess I'm confused because doesn't this sort of defeat the purpose of link analysis if one can just pay for a higher ranking even though the site isn't good (low back link amount, for instance)?

Thursday, November 6, 2008

Week 10 Readings

Web Search Engines Parts 1 & 2

These two articles were very helpful in defining a lot of terms and explaining how search engines are able to provide high quality answers to queries. Explanation of data center set up with clusters and how they make it possible for distributing the load of answering 2000+ queries/second was very interesting. The discussion of the amount of web data out there that search engines crawl through is staggering and it makes sense that it is carried out using many, many machines specifically with this purpose. The 'politeness delay' was also interesting to learn about and makes a lot of sense to have them in the crawler algorithms.

The second part explained certain indexing tricks that make it easier to process phrases better and increase the quality of results. These are all things I take for granted when searching but it's great to know that there's this massive infrastructure working away when I Google, for instance, "cat paw growth" like I just did today! As always though, the Internet will never replace a vet's examination!

OAI Protocol for Metadata Harvesting article

This article provides a brief description of Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) and requires a 'relatively high level of familiarity with how the protocol works...' Even though I do not have familiarity with OAI, the article does describe the 2 parts of OAI (data providers and harvesters) and that the protocol provides access to parts of the "invisible web". Since different communities use this protocol to meet their different needs, obviously it has to be nonspecific. The article lists future enhancements and future directions. Since the article was written, I'm curious if whether or not these enhancements have been made. The most intesting future direction they mention is the importance of using a controlled vocabulary. They mention that the normalization of numerous different controlled vocabularies used by different data providers is 'prohibitively resource intensive but they do mention that in the future 'authority agencies' could use their thesauri to be able to access items in the repository. This probably should be a high priority because there's really no point in having all of this data in one place but not be able to find what you need easily.

Deep Web White Paper

This paper from 2001 describes the Deep Web and is the first known to actually study the Deep Web in quantifiable terms. The Deep Web is the large part of the web that is not searchable by using typical search engines like Google. As of their data in 2001, only .03% of web pages are searched when using search engines. This number has probably gone up since search engines probably have better crawlers now but by how much?

It is curious that Deep Web sites, at least in 2001, get far more traffic than surface web sites. At first I found this surprising since search engines do not typically turn up Deep Web sites based on a searcher's query and, hence, I thought they would receive far less traffic. I guess it is safe to assume that they get more hits because they are used by a specific group of people who do not get to them by using search engines. This is also surprising considering that the subject area of Deep Web sites doesn't appear to be very different than the surface web (Table 6). The fact that the Deep Web has more quality results than surface web is somewhat worrisome and I wonder if the amount of Deep Web pages have decreased because they are able to be crawled now as opposed to 7 years ago because of Google, Yahoo, etc. coming up with better algorithms for their crawlers. As information seekers, we obviously have to take care and figure out better ways to find relevant information in the Deep Web.