SuperMemo Corp. Dr A. Szepieniec, Dr P. Wozniak Oct 25, 1996
This text was an internal document originally entitled: Where do we want to go? and designed as a theoretical basis for the business plan for SuperMemo Corp. seeking venture capital
The progress of the mankind is a multifaceted
phenomenon that spans the entire spectrum of human activity across all branches of science
and technology down to the daily routine of a milkman or a housewife. Nothing, however,
determines the pace of progress more visibly that the mankind’s ability to process
information. And nothing but the information processing is so much subject to positive
feedback that made it explode this century with ferocity that makes it truly impossible to
predict what boundaries it is going to crash just a mere ten years from now. Forty
thousand years ago, humans started communicating orally. It took until four thousand years
BC before they were able to put down their message in writing. Then only in the 15-th
century, Gutenberg’s invention made it possible to widely disseminate writing. The
next breakthrough came with the advent of computers this century. Particularly in the
1980s with the explosive growth of desktop computing on one hand and networking on the
other. The most recent revolution originated in the European Laboratory for Particle
Physics research labs in Switzerland. Yet in 1993, only the most devoted Internet insiders
new its name: World Wide Web. The creators of WWW developed several simple
application-layer protocols and a document-publishing standard. The tree key concepts were
URLs, HTML and HTTP. These have unleashed the global hunger for more easily available
information and for the benefits of publish-as-you-go. The vision of a global hyperspace
has finally become reality and at its best: without borders and without (or nearly
without) government control. Bandwidth limitations permitting, there seems to be no end to
the exponential growth of the Web and its technological versatility. It is our strong
conviction that the next revolution in information processing will come with cognitive
technologies. This is a collection of technologies that make use of the newest
findings in the field of psychophysiology that affect information processing on the part
of a human subject. SMC has pioneered a number of such technologies. Most prominently,
repetition spacing algorithms, commercially known worldwide as SuperMemo. In short,
cognitive technologies make documents 'understand' the reader by taking into account the
imperfection of his or her memory and cognition. This approach is possible by keeping
track of user’s navigation in the knowledge space and by creating mathematical models
of his or her memory.
SMC’s mission is to provide humans with the most efficient
interface to the world of information with application of all known cognitive
technologies.
Cognitive technologies optimize information processing and learning
at the following four pivotal areas (in parentheses: concepts developed at SMC):
- state of the human mind (currently not subject to optimization)
- access to knowledge (processing attributes, ordinal attributes,
semantic attributes, knowledge filters, knowledge charts, knowledge meters, etc.)
- knowledge representation (topics and items, minimum information
principle, etc.)
- knowledge retention (optimum repetition
spacing)
SMC is currently working over Project SM-XXI (currently at the
design stage) that will be a collection of software components that will make the
following vision a reality in the XXI century:
- the interface to external sources of information will cover both
electronic and non-electronic sources (the latter case will require tools for easy
incorporation of information coming from external sources within the knowledge system
paradigm)
- the electronic sources will be both general and dedicated. SMC will
work on promoting standards that will allow publishers of information to comply with
cognitive technologies so that to increase the proportion of dedicated sources with the
lapse of time
- for information sources: all platforms should be covered one way or
another: desktop operating systems (CD-ROMs), Internet, handheld devices, dedicated
databases and knowledge systems developed for SuperMemo 7, SuperMemo
8, its successors, and many more
- for information publishing: all platforms should be covered as well:
stand-alone desktop applications, CD-ROM publishing, Internet, client-server environment,
handheld devices, voice-operated systems, etc.
- the main application modes will be as follows: (1) stand-alone
application (as with earlier versions of SuperMemo), (2) course application (e.g. for
CD-ROM title publishing), (3) Internet application (esp. for organizing and learning web
knowledge), (3) client-server application (e.g. for education in schools, for corporate
training, etc.), and (4) tele-learning application (client-server approach over the
Internet).
- from the user standpoint, knowledge will dynamically flow from the
disorganized collection of items, Web pages, CD-ROMs, and other sources into the knowledge
hierarchy: a graph of semantic connections between individual knowledge elements (Note: do
not confuse knowledge hierarchy with a collection of hyperlinked pages)
- all knowledge elements (leaves of knowledge hierarchy) having the
form of pieces of information or external information sources will be provided with processing
attributes that may assume the following values: intact (not yet classified),
suppressed (classified as irrelevant and made invisible in the knowledge system),
dismissed (dismissed as cognitively relevant but valued for future reference),
reviewed (reviewed and considered cognitively relevant; perhaps worth the pending
status), pending (consider particularly important for its associative or
inferential nature and scheduled for later committing) and committed (committed to
memory of the user of the knowledge system).
- editable knowledge hierarchy that visualizes processing attributes of
knowledge elements forms a knowledge chart that makes it easy to graphically view
the user’s progress in wading through a sea of information
- all non-primitive elements in dedicated sources will be divided into
semantic units that will also be provided with processing attributes. In non-dedicated
sources, wherever possible, semantic units will be separated by means of available
technologies (e.g. HTML tags, parsing tools, or simply highlighting tools; in the latter
case, the users will be able to set processing attributes to an equivalent of a semantic
unit in the form of a display area highlighted with a mouse)
- processing attributes will determine the appearance and behavior of
elements or their semantic units. For example, suppressed elements will disappear from
view and made their URLs unavailable, committed elements will crop up in repetitions
scheduled by means of SuperMemo, etc.
- ordinal attributes
will be used to sort elements and semantic
units with a view to their future processing. The following processing attributes will be
associated with ordinal attributes: intact (ordinal attributes will determine the order of
review; this attribute can only be set automatically by means of filtering tools, HTTP
connection score, elements of information democracy, etc.), reviewed (ordinals will
determine the order of the next review), pending (ordinals will determine the order in
which items are committed to memory), committed (ordinals will determine rescheduling
priority or uncommitting priority in cases of repetition overload).
semantic attributes will be used to approximate the semantic
contents of a semantic unit or element. These are needed for search and filtering
purposes. The simplest approach to implementing semantic attributes is a keyword system.
Future applications might make use of natural language processing technologies.
knowledge filters can be used to determine visibility or
accessibility of elements or semantic units within the knowledge system by making use of
processing, ordinal and semantic attributes in dedicated sources and word context analysis
in non-dedicated sources. Knowledge filters are a useful tool for addressing disorganized
knowledge before it enters the knowledge hierarchy and in automatically determining
ordinal attributes in the intact pool. Knowledge filters can also be used in thematic
navigation.
knowledge meters are tools for diagnostics and control of the
flow of information between element pools tagged with different processing attributes. In
general, the flow proceeds from disorganized/external pool to intact pool, then to
suppressed, dismissed and reviewed pool, then to pending pool and finally to committed
pool. Some stages may be skipped (e.g. committing element without placing it in the
pending queue), and some backflow is not unusual (e.g. dismissing once committed item as
result of the loss of relevancy, etc.). Knowledge meters allow to view the information
flow as well as to impose minimum or maximum flow limits.
knowledge should be divided into topics (elements that present
information, like pages in a help system, web pages, etc.) and items (elements
that have a stimulus-response structure, e.g. question and answer, that can be effectively
used in the process of learning based on active recall)
for training and tele-learning purposes, item subsets should be
structured for automatic grading purposes (e.g. with multiple choice-test, spelling test,
automatic voice recognition test, etc.)
topics will have local and remote nature (e.g. as a web link executed
via OLE browser in-place activation)
application of items complying with the minimum information principle
is not customary in present information sources. A number of solutions will have to be
adopted to facilitate the transition to the new approach by content providers. Most
importantly, World Wide Web extensions are inevitable. The adjustments will have to be
made at both the server and the client side. The new generation of web browsers provide an
easy plug-in interfaces that can be addressed with format-independent OLE Documents (to
extend or go beyond HTML), language-independent binary ActiveX controls, and scripting
languages. On the server side, with standard methods such as CGI, WWW servers can be
extended to communicate with back-end scripts, dynamically produce the content of a web
page, store information the user has provided, etc. In Microsoft Internet Server changes
are possible via ISAs and other ISAPI extensions.
for knowledge retention, Project SM-XXI envisages application of most
modern SuperMemo algorithms based of algebraic and algorithmic solutions combined with
neural networks
to let learning run in the background, without the need for a special
time slot of the users, intelligent, self-configuring techniques for detecting idle state
of the user’s terminal will be implemented (e.g. popping up the drilling procedure at
download time, printing time, and other long-drawn processes; including selected
operations in custom-chosen applications)
although the theoretical foundations and underlying concepts might
seem intricate, the new software solutions should provide an intuitive and fool-proof
interface that will open the world of well-structured knowledge to every user with or
without his or her understanding of cognitive processes
particular software components will be used for complementary
solutions like: handheld devices, voice-operated devices, redistributable modules for
third-party developers, etc.
Beyond simple software solutions
:
- to increase the number of dedicated sources, SMC intends to work
closely with publishers, courseware developers and standardization bodies. Most
importantly, SMC wants to promote new WWW formats and protocols by using RFCs and working
with IETF, and W3C. This will primarily concern the adherence to context-free semantic
units and, wherever possible and sensible, associating semantic units with knowledge items
used in the process of learning
- for knowledge representation, SMC intends to facilitate or enforce
adherence to the minimum information principle not only at the software level but also by
promoting new standards and formats (see above), educating information publishers by
example, and extensively publishing the rules of effective
knowledge representation. The most crucial in this respect are: (1) publishing
information in context-independent, concise and richly hyperlinked manner, (2) separation
of semantic units whenever necessary, and (3) associating elements or semantic units with
stimulus-response material (e.g. in the form of questions and answers that might be used
in learning and training)
- strategic partnerships with major players in the Internet market may
be most fruitful: Microsoft, Netscape, etc. These would make it easier to develop global
solutions intertwined with the Internet and operating systems that would make cognitive
technologies truly span the globe
Marketing strategy and the step-wise education of the public on
cognitive technologies
introducing SM7 with a couple of catchy databases to educate the
public on the most appealing component of cognitive technologies: repetition spacing.
Application mode: stand-alone.
introducing SM8 as an extension of
repetition spacing into the direction of knowledge structuring. Introducing the concept of
knowledge hierarchy, processing attributes, ordinal attributes, topics and items.
Application modes: stand-alone, and CD-ROM title publishing.
introducing Project SM-XXI with all the remaining components of the
cognitive approach. Introducing client-server architecture and encompassing the Internet.
Application modes: stand-alone, CD-ROM title, Internet organizer, client-server training,
and tele-learning.
promoting standard solutions, formats, protocols and interfaces.
Working on an equivalent of Dewey Decimal or Library of Congress system for WWW within the
framework of cognitive technologies
Glossary
(proprietary terminology is marked with SMC)
- ActiveX controls
encapsulate the WinInet APIs. Programmers can
write to these controls using Visual Basic, Delphi, Powerbuilder, etc. (formerly known as
OLE Controls, or OCXes)
CGI (Common Gateway Interface) the simplest language for
server-side support
cognitive technologies (SMC) technologies targeted at applying
the findings from the field of psychophysiology in the area of information processing
information democracy (SMC) enabling the public to determine
ordinals associated with web pages, hyperlinks, sites, etc. on the basis of popularity,
reliability, usability, etc. Those ordinals might be useful at the entry of elements into
a user’s knowledge system
item (SMC) simple stimulus-response formulation of a piece of
knowledge for learning purposes (e.g. question and answer)
ISA (Internet Server Applications) ISAs are dynamic-link
libraries (DLLs) that are similar to CGI scripts. ISAs are loaded in the same address
space of the HTTP server. This creates a back-end scripting solution that provides a
higher level of performance than CGI and consumes far less RAM
ISAPI (Internet Server API) Application Programming Interface
developed for MS Internet Server
IETF (Internet Engineering Task Force) the protocol engineering
and development arm of the Internet. The IETF is a large, open, international community of
network designers, operators, vendors, and researchers concerned with the evolution of the
Internet architecture and the smooth operation of the Internet. It is open to any
interested individual
knowledge chart (SMC) knowledge hierarchy that graphically
visualizes processing attributes
knowledge elements (SMC) single leaf in knowledge hierarchy that
may have a form of: item, topic or external source of information
knowledge filters (SMC) filters that make a subset of knowledge
elements invisible for the purpose of thematic navigation, knowledge organization
(transition from disorganized pool to knowledge hierarchy, etc.)
knowledge hierarchy (SMC) graph representing semantic
relationships between individual pieces or sources of information. Implemented in SM8 in a
simplified form as a
knowledge tree that corresponds
with a table of contents
HTML (Hypertext Markup Language) standard file format for
distributing hypermedia information on the World Wide Web. HTML allows text to include
codes that define fonts, layout, embedded graphics, and hypertext links
HTTP (Hypertext Transfer Protocol) the method by which World Wide
Web pages are transferred over the network
ordinal attributes (SMC) attributes associated with processing
attributes that determine the priority of the element of semantic unit in a given
processing pool. For example, ordinal attributes of pending elements determine the order
in which pending elements are committed to memory
processing attributes (SMC) attributes associated with elements
or semantic units that determine the degree of processing afforded the element. For
example: dismissed, pending, committed, etc.
RFC (Request For Comment) documents that describe or propose new
standards. All Internet standards are described as RFCs. An RFC is a description of a
protocol, procedure, or service; a status report or a summary of research
semantic attributes (SMC) attributes that approximate the
semantics of a semantic unit (in the simplest case: keywords)
semantic unit (SMC) smallest part of an element that coveys the
simplest understandable message (e.g. a single sentence)
scripting language simple interpreted script used to enliven web
pages (e.g. Perl, Java, LiveScript, Visual Basic Script, etc.). Script applets can be
downloaded from the server and run on the client computer
topic (SMC) simple
representation of a small piece of knowledge (e.g. a short web page)
URL (Uniform Resource Locator) increasingly popular standard for
addressing resources on the Internet. Developed for the World Wide Web. URLs are
essentially an extension of a full pathname
W3C (World Wide Web Consortium) the World Wide Web Consortium
exists to realize the full potential of the Web. W3C works with the global community to
produce specifications and reference software. W3C is funded by industrial members, but
its products are freely available to all