Talk:Information retrieval

From Wikipedia, the free encyclopedia

Huge gaps in article

This article is missing major concepts. For example, there is no mention of page rank (!) or the receiver operating characteristic in evaluation, or multimedia IR. I agree precision/recall -- ROC -- evaluation might do better in a separate page. 67.101.41.94 (talk) 18:04, 10 July 2008 (UTC)

A very major issue of this article is the fact that the second half of information retrieval is completely ignored: navigation (or browsing). A huge number of IR studies even show that navigation is by far the more important retrieval method. As a hybrid method, faceted search/navigation is also missing. Information retrieval is not the same as search! Novoid (talk) 16:09, 20 August 2012 (UTC)

If you look at the study of information retrieval (e.g., conferences like the ACM Special Interest Group on Information Retrieval), you'll see that navigation and browsing don't get that much attention. I happen to be passionate about Human–computer information retrieval myself, but I think the content in this entry is fairly representative of the field. There are gaps, but I wouldn't call them huge. Dtunkelang (talk) 13:32, 21 August 2012 (UTC)

Precision and Recall: Separate page

Searching for precision/recall, I was surprised to find them "buried" here in the IR page, and not described in a separate wiki article (as they are in German). Precision and Recall are widely used in different fields of computer science, not only IR. Therefore, I have created a precision/recall page, Precision and Recall mostly adapted from the German page Tobi Kellner (talk) 07:30, 21 November 2007 (UTC) PLEASE check and correct my Precision and Recall article!

Cleaning

Rather than clean those two paragraphs up again, I chose to revert to Mikkalai's version, which retained the corrections. If the missing paragraph is re-inserted, please make sure that the grammatical corrections are not overwritten. The two paragraphs I had cleaned up looked like they were written in another language and run through a translation program. They used English words and successfully communicated a concept, but were horrible from a readability standpoint. Please do not overwrite corrections that do not affect the substance of the material. SWAdair | Talk 06:15, 13 May 2004 (UTC)

I am planning on removing the lists of open source and other IR tools, perhaps incorporating some into the list of search engines. This is a downwards spiral trend of introducing spam into the article. WP:NOT a directory or place for commercial links. While I understand some may have good intentions, we can't keep some and then hide others and there are all types of issues. I might do this soon given the recent edits to this page. Please let me know of any objections. Josh Froelich 03:29, 7 January 2007 (UTC)

Precision and Recall

  • Current or tide:
P = (number of relevant documents retrieved) / (number of retrieved documents)
R = (number of retrieved documents) / (number of relevant documents)
  • Correct or tidy:
P = (number of relevant documents retrieved) / (number of documents retrieved)
R = (number of relevant documents retrieved) / (number of relevant documents stored)

Hopefully yours, --KYPark 01:15, 3 Jun 2005 (UTC)

Thank you. I've implemented the corrections you suggested. In general: If you feel a change is needed, feel free to make it yourself! Wikipedia is a wiki, so anyone (yourself included) can edit any article by following the Edit this page link. You don't even need to log in, although there are several reasons why you might want to. Wikipedia convention is to be bold and not be afraid of making mistakes. If you're not sure how editing works, have a look at How to edit a page, or try out the Sandbox to test your editing skills. New contributors are always welcome. --MarkSweep 03:40, 3 Jun 2005 (UTC)

Major figures in information retrieval

I wonder if having a subjective list of "major figures" is really a good idea... Sure, there are some recognizeable people in the field, but who decides who goes on the list and who doesn't? I have my own list of who I think are "major figures", and I'm sure there might be others who have a completely disjoint list. Just seems too subjective to me. --.msbmsb 19:21, 17 October 2005 (UTC)

F Measure

I've changed the formula for F measure, so that it uses the product of N squared and P, rather than the product of N squared and R, in the denominator. This brings the formula in line with that used by van Rijsbergen (as referenced in the article), and is consistent with the descriptions of F0.5 and F2 as given in the article.

Evaluation of machine translation

Precision/Recall are often used in the automatic evaluation of machine translation, indeed in a lot of NLP evaluation. - 88.96.32.193 13:48, 4 May 2006 (UTC)

Terminology according to ISO

I have talked with many people trained in science and engineering who are initially very confused by the IR terms "precision" and "recall". The confusion is caused by incompatible meanings for IR "precision" versus the other technical meanings of precision. Only when the terms are defined in context of IR do we realize "precision" and "recall" map to relevancy (a form of accuracy) and sensitivity (tests) respectively. In an encyclopedia, it would be considerate to make this clear early and often in discussions using IR's "precision" and "recall".

Please look at the following excerpt from Talk:Accuracy_and_precision#Terminology according to ISO.

The International Organization for Standardization (ISO) provides the following definitions.

  • Accuracy: The closeness of agreement between a test result and the accepted reference value.
  • Trueness: The closeness of agreement between the average value obtained from a large series of test results and an accepted reference value.
  • Precision: The closeness of agreement between independent test results obtained under stipulated conditions.

Reference: International Organization for Standardization. ISO 5725. Accuracy (trueness and precision) of measurement methods and results. Geneve: ISO, 1994.

Clark Mobarry 18:03, 12 May 2006 (UTC)

Sure, the distinction should be made clear, but I don't think it needs to be phrased as '"beware", this definition "conflicts" with others'... .msbmsb 18:52, 12 May 2006 (UTC)

History of IR

How about a (short) section discussing the history and development of IR. Stuff like the initial impetus for IR (census information etc.) IR establishing itself as its own field. Changing descriptions of IR over time... IR and the WWW...

It's all about the memex. 74.220.78.49 09:46, 19 September 2007 (UTC)

Break IR down into subfields

One thing I missed when I read this article was that information retrieval was not broken down into smaller subfields. It might be helpful to break this field down when developing the article further.

Nybbles 10:29, 17 November 2006 (UTC)

References would be nice

For those of us wanting to cite something other than this article it would give a nice starting point to give, say, the source of each equation.

65.93.206.3 07:43, 20 December 2006 (UTC)

Confusion table

I think the whole article would be much easier to follow in terms of true positives, etc. instead of retrieved documents, relevant documents, etc. --Ben T/C 15:40, 21 May 2007 (UTC)

Absolutely not. These are standard terms. 74.220.78.49 09:55, 19 September 2007 (UTC)

Glimpse / Webglimpse

They are listed in open-source IR systems but according to their respective websites, their licenses do not seem to be open-source anymore. I don't know if they were open-source or not in the past, so I didn't remove them. Maybe someone with more knowledge on the subject could take care of the issue ? --Lastrainson 09:57, 5 September 2007 (UTC)

MAP

The following comments were posted on the article page:

shouldn't the denominator just be N instead of number of relevant documents? It seems natural to me that if you're summing N occurrences of P, that you would then divide by N. This change would make all the problems mentioned below go away

Please confirm/delete/edit the following.

One version of MAP I've seen is referred to as MAP @ N (where N is an arbitary retrieval cut-off, typical 5, 10, 20 etc).

In this case, the formula does not seem to be correct. For example -- consider first 4 ranks are relevant, the 5th is non-relevant. So AP @ 1 = 1, @ 2= 1, @ 3= 1 , @ 4 = 1 , and surprisingly it is still 1 @5 = (( 1+1+1+1 + 0)/ 4) This is clearly wrong. AP should be calculated even when the rel(r) is not true. Thus for this silly example, it would (1+1+1+1+0.8 ) / 5 . Which brings me to my second question -the numerator "Relevant Documents" is retrieved relevant, but actually again, for MAP @ N, it should be "retrieved documents (ie N). The above definition only seems to work when the LAST retrieved item is relevant.

Final comment - what does one do when |all relevant docs| < N AND |all retrieved relevant| == |all relevant documents|. This may happen with some specific test sets. My solution is to stop calculating MAP at the rank of the last, highest relevant document retrieved, and reporting that MAP for all following MAP @ x statistics.

Depending on who owns this page, perhaps some Matlab or R code would be nice.

The denominator in the average precision formula is the number of relevant documents because effectively there are exactly that many terms in the sum (rel(r) = 0 for nonrelevant documents).
The (mean) average precision formula considers the list of retrieved documents only up to the last relevant document. It doesn't care if there are nonrelevant documents after the it. So if the four relevant documents are on ranks 1-4 the AP is always 1 regardless of possible nonrelevant documents after the rank 4. AnAj 07:26, 13 October 2007 (UTC)

New article request

Hi there, I'd like to suggest a new article on the full history of information handling/management/techonology (details). I'm not knowledgeable enough to do it myself, but contributors here probably are. Thanks, JackyR | Talk 18:04, 4 December 2007 (UTC)

Information scent

Also add an article about information scent, as also noted on information foraging. Jidanni (talk) 18:35, 18 February 2008 (UTC)

Term Discrimination

I am going to start a new page on TermDiscrimination. I am a noob, so can some one give me some pointers on how to get this article linked to? Dspattison (talk) 18:29, 7 January 2008 (UTC)

Relevance

I just overhauled the Relevance (information_retrieval) entry after an unsuccessful attempt to get it deleted and merged into this one. Please take a look and help improve it and appropriately link it to this one.

Dtunkelang (talk) 14:30, 26 May 2008 (UTC)

External Links

Should we move the "5 Open source systems" and "6 Other retrieval tools" sections somewhere else? I assume those are the sections triggering the external links warning.

Dtunkelang (talk) 04:41, 4 June 2008 (UTC)

ASIS&T Award of Merit

The ASIS&T Award of Merit, established in 1964 by the Delaware Valley Chapter, is now the Society's highest honor, bestowed annually to an individual who has made a noteworthy contribution to the field of information science, including the expression of new ideas, the creation of new devices, the development of better techniques and outstanding service to the profession of information science. The Award of Merit is sponsored directory by the Society.

Past recipients of the Award of Merit:

Cited by --KYPark (talk) 19:22, 7 November 2009 (UTC)

Major figures

My comment

The first two, Bayes and Shannon, were not contributing to information retrieval (hence rather irrelevant) but statistical inference in general.

The next two, Luhn and Salton, were frontal, computational and experimental, helplessly suffering from lack of decent relevant IR theories.

The rest are Cambridge alumni or affiliates. They were especially keen on statistical approaches, which make up a part of what IR is all about as an average science.

The keyword web and the citation web are the two driving wheels of IR, not to mention any information search or research as well as (esp. new) encyclopedism. Hence the pivotal role of hypertext in principle and practice, around which a variety of statistical and computational methods may center. Frankly, the major figures, as biased as above, have little to do with such a central role obvious since late 1970s. --KYPark (talk) 21:21, 7 November 2009 (UTC)

Question answering

The question answering article begins "In information retrieval, question answering (QA) is the task of automatically answering a question posed in natural language." Yet there is no mention of question answering in this article. pgr94 (talk) 17:49, 21 June 2010 (UTC)

'Science'

I'm not sure about the criteria used in these cases, but shouldn't 'IR is the science of searching...' read 'IR is the field of computer science' or something similar? i.e. calling IR a science by itself goes a bit over the top, doesn't it?

94.67.230.10 (talk) 06:57, 3 March 2011 (UTC) (to bored to start an account)

Replaced "science" with "area of study".

Dtunkelang (talk) 07:33, 6 March 2011 (UTC)

Graphic Help

There's a little mistake in adaptation of Kuropka's categorization. It should be "Binary Independence", instead of "Binary Interdependence". I don't know how to edit this part (I'm pretty new to this and it's a graphic). — Preceding unsigned comment added by 186.176.147.82 (talk) 02:42, 13 February 2013 (UTC)

Merger proposal

I propose that Webometrics be merged into this article, or one of its peers which tackles Web specific information retrieval metrics.

The article seems slim to me. It seems poorly referenced, with many researchers appearing in more than one reference at the same time. Most of the see-also links seem to be supporting the article, rather than leading to more information. The article also appears to link only to the kinds of other articles, which I now gather it should be merged into. Its prime content also has to do with "web impact", which is a form of information retrieval metric; an "impact factor".

As such, please consider Webometrics for merge into this article, and others near it, as the case may be. Decoy (talk) 20:20, 3 June 2015 (UTC)

  • Hi, Webometrics is much more related to bibliometrics. Information retrieval is the posh academic term for search engines, webometrics is more about rating an organization based on web link structures pointing to the organization's web site. — Preceding unsigned comment added by 58.161.6.1 (talk) 11:25, 4 June 2015 (UTC)
  • As I commented on Talk:Webometrics, certainly Webometrics can be used as a technique in information retrieval, in addition for general research or other purposes. But the content of that article is not suitable for a general overview of information retrieval, especially given that it is specific to the World Wide Web and is at most about one class of techniques. Certainly even if it was mentioned in the main information retrieval article it would also need its own page with a more detailed explanation. I added Webometrics to Category:Information retrieval techniques and removed the merge tag. There are plenty more references which could be used to improve Webometrics as a standalone article. -- Beland (talk) 16:42, 14 June 2015 (UTC)

Deletion nomination for Gain (information retrieval)

The article Gain (information retrieval) has been nominated for deletion at Wikipedia:Articles for deletion/Gain (information retrieval). As no one has yet commented on the nomination, I'm posting this message here in hopes of attracting editors familiar with the subject matter. —Psychonaut (talk) 08:47, 16 November 2015 (UTC)

evaluation metrics split to Evaluation_measures_(information_retrieval)

The evaluation metrics section is almost an exact duplicate of Evaluation_measures_(information_retrieval). I don't think it should stay like this, because any improvements made to one or the other location won't be copied over. I'm going to propose a split. Proxyma (talk) 02:23, 30 June 2017 (UTC)

External links modified

Hello fellow Wikipedians,

I have just modified 3 external links on Information retrieval. Please take a moment to review my edit. If you have any questions, or need the bot to ignore the links, or the page altogether, please visit this simple FaQ for additional information. I made the following changes:

  • Added archive https://web.archive.org/web/20130928060217/http://www.scribd.com/doc/13742235/Information-Retrieval-Data-Structures-Algorithms-William-B-Frakes to https://www.scribd.com/doc/13742235/Information-Retrieval-Data-Structures-Algorithms-William-B-Frakes
  • Added archive https://web.archive.org/web/20111120074515/http://pascallin.ecs.soton.ac.uk/challenges/VOC/pubs/everingham10.pdf to http://pascallin.ecs.soton.ac.uk/challenges/VOC/pubs/everingham10.pdf
  • Added archive https://web.archive.org/web/20121208201457/http://icpr2010.org/pdfs/icpr2010_ThBCT8.28.pdf to http://icpr2010.org/pdfs/icpr2010_ThBCT8.28.pdf

When you have finished reviewing my changes, you may follow the instructions on the template below to fix any issues with the URLs.

You may set the |checked=, on this template, to true or failed to let other editors know you reviewed the change. If you find any errors, please use the tools below to fix them or call an editor by setting |needhelp= to your help request.

  • If you have discovered URLs which were erroneously considered dead by the bot, you can report them with this tool.
  • If you found an error with any archives or the URLs themselves, you can fix them with this tool.

If you are unable to use these tools, you may set |needhelp=<your help request> on this template to request help from an experienced user. Please include details about your problem, to help other editors.

Cheers.—InternetArchiveBot (Report bug) 21:50, 13 November 2017 (UTC)

Retrieved from "https://en.wikipedia.org/w/index.php?title=Talk:Information_retrieval&oldid=810196620"
This content was retrieved from Wikipedia : http://en.wikipedia.org/wiki/Talk:Information_retrieval
This page is based on the copyrighted Wikipedia article "Talk:Information retrieval"; it is used under the Creative Commons Attribution-ShareAlike 3.0 Unported License (CC-BY-SA). You may redistribute it, verbatim or modified, providing that you comply with the terms of the CC-BY-SA