Integrating SuperMemo with the Internet

SuperMemo Corp. Dr A. Szepieniec, Dr P. Wozniak Oct 25, 1996

This text was an internal document originally entitled: Where do we want to go? and designed as a theoretical basis for the business plan for SuperMemo Corp. seeking venture capital

The progress of the mankind is a multifaceted phenomenon that spans the entire spectrum of human activity across all branches of science and technology down to the daily routine of a milkman or a housewife. Nothing, however, determines the pace of progress more visibly that the mankind’s ability to process information. And nothing but the information processing is so much subject to positive feedback that made it explode this century with ferocity that makes it truly impossible to predict what boundaries it is going to crash just a mere ten years from now. Forty thousand years ago, humans started communicating orally. It took until four thousand years BC before they were able to put down their message in writing. Then only in the 15-th century, Gutenberg’s invention made it possible to widely disseminate writing. The next breakthrough came with the advent of computers this century. Particularly in the 1980s with the explosive growth of desktop computing on one hand and networking on the other. The most recent revolution originated in the European Laboratory for Particle Physics research labs in Switzerland. Yet in 1993, only the most devoted Internet insiders new its name: World Wide Web. The creators of WWW developed several simple application-layer protocols and a document-publishing standard. The tree key concepts were URLs, HTML and HTTP. These have unleashed the global hunger for more easily available information and for the benefits of publish-as-you-go. The vision of a global hyperspace has finally become reality and at its best: without borders and without (or nearly without) government control. Bandwidth limitations permitting, there seems to be no end to the exponential growth of the Web and its technological versatility. It is our strong conviction that the next revolution in information processing will come with cognitive technologies. This is a collection of technologies that make use of the newest findings in the field of psychophysiology that affect information processing on the part of a human subject. SMC has pioneered a number of such technologies. Most prominently, repetition spacing algorithms, commercially known worldwide as SuperMemo. In short, cognitive technologies make documents 'understand' the reader by taking into account the imperfection of his or her memory and cognition. This approach is possible by keeping track of user’s navigation in the knowledge space and by creating mathematical models of his or her memory.

SMC’s mission is to provide humans with the most efficient interface to the world of information with application of all known cognitive technologies.

Cognitive technologies optimize information processing and learning at the following four pivotal areas (in parentheses: concepts developed at SMC):

  • state of the human mind (currently not subject to optimization)
  • access to knowledge (processing attributes, ordinal attributes, semantic attributes, knowledge filters, knowledge charts, knowledge meters, etc.)
  • knowledge representation (topics and items, minimum information principle, etc.)
  • knowledge retention (optimum repetition spacing)

SMC is currently working over Project SM-XXI (currently at the design stage) that will be a collection of software components that will make the following vision a reality in the XXI century:

  1. the interface to external sources of information will cover both electronic and non-electronic sources (the latter case will require tools for easy incorporation of information coming from external sources within the knowledge system paradigm)
  2. the electronic sources will be both general and dedicated. SMC will work on promoting standards that will allow publishers of information to comply with cognitive technologies so that to increase the proportion of dedicated sources with the lapse of time
  3. for information sources: all platforms should be covered one way or another: desktop operating systems (CD-ROMs), Internet, handheld devices, dedicated databases and knowledge systems developed for SuperMemo 7, SuperMemo 8, its successors, and many more
  4. for information publishing: all platforms should be covered as well: stand-alone desktop applications, CD-ROM publishing, Internet, client-server environment, handheld devices, voice-operated systems, etc.
  5. the main application modes will be as follows: (1) stand-alone application (as with earlier versions of SuperMemo), (2) course application (e.g. for CD-ROM title publishing), (3) Internet application (esp. for organizing and learning web knowledge), (3) client-server application (e.g. for education in schools, for corporate training, etc.), and (4) tele-learning application (client-server approach over the Internet).
  6. from the user standpoint, knowledge will dynamically flow from the disorganized collection of items, Web pages, CD-ROMs, and other sources into the knowledge hierarchy: a graph of semantic connections between individual knowledge elements (Note: do not confuse knowledge hierarchy with a collection of hyperlinked pages)
  7. all knowledge elements (leaves of knowledge hierarchy) having the form of pieces of information or external information sources will be provided with processing attributes that may assume the following values: intact (not yet classified), suppressed (classified as irrelevant and made invisible in the knowledge system), dismissed (dismissed as cognitively relevant but valued for future reference), reviewed (reviewed and considered cognitively relevant; perhaps worth the pending status), pending (consider particularly important for its associative or inferential nature and scheduled for later committing) and committed (committed to memory of the user of the knowledge system).
  8. editable knowledge hierarchy that visualizes processing attributes of knowledge elements forms a knowledge chart that makes it easy to graphically view the user’s progress in wading through a sea of information
  9. all non-primitive elements in dedicated sources will be divided into semantic units that will also be provided with processing attributes. In non-dedicated sources, wherever possible, semantic units will be separated by means of available technologies (e.g. HTML tags, parsing tools, or simply highlighting tools; in the latter case, the users will be able to set processing attributes to an equivalent of a semantic unit in the form of a display area highlighted with a mouse)
  10. processing attributes will determine the appearance and behavior of elements or their semantic units. For example, suppressed elements will disappear from view and made their URLs unavailable, committed elements will crop up in repetitions scheduled by means of SuperMemo, etc.
  11. ordinal attributes
  12. will be used to sort elements and semantic units with a view to their future processing. The following processing attributes will be associated with ordinal attributes: intact (ordinal attributes will determine the order of review; this attribute can only be set automatically by means of filtering tools, HTTP connection score, elements of information democracy, etc.), reviewed (ordinals will determine the order of the next review), pending (ordinals will determine the order in which items are committed to memory), committed (ordinals will determine rescheduling priority or uncommitting priority in cases of repetition overload).

  • semantic attributes
  • will be used to approximate the semantic contents of a semantic unit or element. These are needed for search and filtering purposes. The simplest approach to implementing semantic attributes is a keyword system. Future applications might make use of natural language processing technologies.
  • knowledge filters
  • can be used to determine visibility or accessibility of elements or semantic units within the knowledge system by making use of processing, ordinal and semantic attributes in dedicated sources and word context analysis in non-dedicated sources. Knowledge filters are a useful tool for addressing disorganized knowledge before it enters the knowledge hierarchy and in automatically determining ordinal attributes in the intact pool. Knowledge filters can also be used in thematic navigation.
  • knowledge meters
  • are tools for diagnostics and control of the flow of information between element pools tagged with different processing attributes. In general, the flow proceeds from disorganized/external pool to intact pool, then to suppressed, dismissed and reviewed pool, then to pending pool and finally to committed pool. Some stages may be skipped (e.g. committing element without placing it in the pending queue), and some backflow is not unusual (e.g. dismissing once committed item as result of the loss of relevancy, etc.). Knowledge meters allow to view the information flow as well as to impose minimum or maximum flow limits.
  • knowledge should be divided into topics (elements that present information, like pages in a help system, web pages, etc.) and items (elements that have a stimulus-response structure, e.g. question and answer, that can be effectively used in the process of learning based on active recall)
  • for training and tele-learning purposes, item subsets should be structured for automatic grading purposes (e.g. with multiple choice-test, spelling test, automatic voice recognition test, etc.)
  • topics will have local and remote nature (e.g. as a web link executed via OLE browser in-place activation)
  • application of items complying with the minimum information principle is not customary in present information sources. A number of solutions will have to be adopted to facilitate the transition to the new approach by content providers. Most importantly, World Wide Web extensions are inevitable. The adjustments will have to be made at both the server and the client side. The new generation of web browsers provide an easy plug-in interfaces that can be addressed with format-independent OLE Documents (to extend or go beyond HTML), language-independent binary ActiveX controls, and scripting languages. On the server side, with standard methods such as CGI, WWW servers can be extended to communicate with back-end scripts, dynamically produce the content of a web page, store information the user has provided, etc. In Microsoft Internet Server changes are possible via ISAs and other ISAPI extensions.
  • for knowledge retention, Project SM-XXI envisages application of most modern SuperMemo algorithms based of algebraic and algorithmic solutions combined with neural networks
  • to let learning run in the background, without the need for a special time slot of the users, intelligent, self-configuring techniques for detecting idle state of the user’s terminal will be implemented (e.g. popping up the drilling procedure at download time, printing time, and other long-drawn processes; including selected operations in custom-chosen applications)
  • although the theoretical foundations and underlying concepts might seem intricate, the new software solutions should provide an intuitive and fool-proof interface that will open the world of well-structured knowledge to every user with or without his or her understanding of cognitive processes
  • particular software components will be used for complementary solutions like: handheld devices, voice-operated devices, redistributable modules for third-party developers, etc.
  • Beyond simple software solutions

    :
    • to increase the number of dedicated sources, SMC intends to work closely with publishers, courseware developers and standardization bodies. Most importantly, SMC wants to promote new WWW formats and protocols by using RFCs and working with IETF, and W3C. This will primarily concern the adherence to context-free semantic units and, wherever possible and sensible, associating semantic units with knowledge items used in the process of learning
    • for knowledge representation, SMC intends to facilitate or enforce adherence to the minimum information principle not only at the software level but also by promoting new standards and formats (see above), educating information publishers by example, and extensively publishing the rules of effective knowledge representation. The most crucial in this respect are: (1) publishing information in context-independent, concise and richly hyperlinked manner, (2) separation of semantic units whenever necessary, and (3) associating elements or semantic units with stimulus-response material (e.g. in the form of questions and answers that might be used in learning and training)
    • strategic partnerships with major players in the Internet market may be most fruitful: Microsoft, Netscape, etc. These would make it easier to develop global solutions intertwined with the Internet and operating systems that would make cognitive technologies truly span the globe

    Marketing strategy and the step-wise education of the public on cognitive technologies

  • introducing SM7 with a couple of catchy databases to educate the public on the most appealing component of cognitive technologies: repetition spacing. Application mode: stand-alone.
  • introducing SM8 as an extension of repetition spacing into the direction of knowledge structuring. Introducing the concept of knowledge hierarchy, processing attributes, ordinal attributes, topics and items. Application modes: stand-alone, and CD-ROM title publishing.
  • introducing Project SM-XXI with all the remaining components of the cognitive approach. Introducing client-server architecture and encompassing the Internet. Application modes: stand-alone, CD-ROM title, Internet organizer, client-server training, and tele-learning.
  • promoting standard solutions, formats, protocols and interfaces. Working on an equivalent of Dewey Decimal or Library of Congress system for WWW within the framework of cognitive technologies
  • Glossary

    (proprietary terminology is marked with SMC)
    • ActiveX controls
    • encapsulate the WinInet APIs. Programmers can write to these controls using Visual Basic, Delphi, Powerbuilder, etc. (formerly known as OLE Controls, or OCXes)
  • CGI (Common Gateway Interface)
  • the simplest language for server-side support
  • cognitive technologies
  • (SMC) technologies targeted at applying the findings from the field of psychophysiology in the area of information processing
  • information democracy
  • (SMC) enabling the public to determine ordinals associated with web pages, hyperlinks, sites, etc. on the basis of popularity, reliability, usability, etc. Those ordinals might be useful at the entry of elements into a user’s knowledge system
  • item
  • (SMC) simple stimulus-response formulation of a piece of knowledge for learning purposes (e.g. question and answer)
  • ISA (Internet Server Applications)
  • ISAs are dynamic-link libraries (DLLs) that are similar to CGI scripts. ISAs are loaded in the same address space of the HTTP server. This creates a back-end scripting solution that provides a higher level of performance than CGI and consumes far less RAM
  • ISAPI (Internet Server API)
  • Application Programming Interface developed for MS Internet Server
  • IETF
  • (Internet Engineering Task Force) the protocol engineering and development arm of the Internet. The IETF is a large, open, international community of network designers, operators, vendors, and researchers concerned with the evolution of the Internet architecture and the smooth operation of the Internet. It is open to any interested individual
  • knowledge chart
  • (SMC) knowledge hierarchy that graphically visualizes processing attributes
  • knowledge elements
  • (SMC) single leaf in knowledge hierarchy that may have a form of: item, topic or external source of information
  • knowledge filters
  • (SMC) filters that make a subset of knowledge elements invisible for the purpose of thematic navigation, knowledge organization (transition from disorganized pool to knowledge hierarchy, etc.)
  • knowledge hierarchy
  • (SMC) graph representing semantic relationships between individual pieces or sources of information. Implemented in SM8 in a simplified form as a knowledge tree that corresponds with a table of contents
  • HTML
  • (Hypertext Markup Language) standard file format for distributing hypermedia information on the World Wide Web. HTML allows text to include codes that define fonts, layout, embedded graphics, and hypertext links
  • HTTP (Hypertext Transfer Protocol)
  • the method by which World Wide Web pages are transferred over the network
  • ordinal attributes
  • (SMC) attributes associated with processing attributes that determine the priority of the element of semantic unit in a given processing pool. For example, ordinal attributes of pending elements determine the order in which pending elements are committed to memory
  • processing attributes
  • (SMC) attributes associated with elements or semantic units that determine the degree of processing afforded the element. For example: dismissed, pending, committed, etc.
  • RFC
  • (Request For Comment) documents that describe or propose new standards. All Internet standards are described as RFCs. An RFC is a description of a protocol, procedure, or service; a status report or a summary of research
  • semantic attributes
  • (SMC) attributes that approximate the semantics of a semantic unit (in the simplest case: keywords)
  • semantic unit
  • (SMC) smallest part of an element that coveys the simplest understandable message (e.g. a single sentence)
  • scripting language
  • simple interpreted script used to enliven web pages (e.g. Perl, Java, LiveScript, Visual Basic Script, etc.). Script applets can be downloaded from the server and run on the client computer
  • topic
  • (SMC) simple representation of a small piece of knowledge (e.g. a short web page)
  • URL (Uniform Resource Locator)
  • increasingly popular standard for addressing resources on the Internet. Developed for the World Wide Web. URLs are essentially an extension of a full pathname
  • W3C (World Wide Web Consortium)
  • the World Wide Web Consortium exists to realize the full potential of the Web. W3C works with the global community to produce specifications and reference software. W3C is funded by industrial members, but its products are freely available to all
    1.4.35-dev.2