Background and Purpose
In the earliest stages of the transition from a CD-Rom-based collection to the WWW site, it was clear that the nature and scope of the Perseus resource demanded a flexible, extensible, and powerful data management system. Written mostly in Perl, the production version of the on-line Perseus text management system evolved and grew over eight years, becoming a uniquely powerful platform, capable of ingesting heterogeneous source materials and performing a range of automatic services. With few precedents and examples to follow, however, the code behind this system reflected organic growth and experimentation, and became difficult to sustain, share, and modify. While all versions of the Perseus Digital Library system were designed to be open-source (third parties did make use of the HyperTalk, Tcl/TK and Perl code), each of the previous incarnations of Perseus were complex and difficult to document, which presented obstacles to new avenues of collaborative research and development.
As digital library systems matured in the early 00's, the project sought third party solutions for delivering resources. At the time, most digital libraries concentrated on locating objects and then left it to the users to make sense of what they had found. In contrast, Perseus had increasingly focused on giving users the tools to understand what the digital library gave them: the project depended upon a range of automatic linking, information extraction and visualization services that existing, largely catalog-oriented systems could not support. The project chose to build a new digital library system, designing it from the start to be interoperable, modular, and open-source.
Open-Source Services
The Perseus Hopper is an open-source project providing a suite of services for interacting with textual collections. While as a whole it provides an integrated reading environment, its individual services are designed to be modular and can be grouped into three different classes.
Linguistic support: The Hopper itself is language independent, but the code includes native support for Greek, Latin and Arabic. Given a source text in any one of those three languages (either a text bundled with the code release or a TEI-compliant XML text of the user's own), it provides services for automatic lemmatization (linking inflected word forms to the dictionary entries from which they're derived) and morphological analysis (identifying, for instance, that the Latin word amor is a singular masculine nominative noun). At a broader level, it also enables corpus research by automatically generating word and lemma frequency information for the entire collection of texts supplied to it.
Contextualized reading: Since the Hopper is the underlying code base for the Perseus Digital Library, it reflects that same emphasis on being an integrated reading environment: much of its power derives not simply from isolated textual services, but in the knowledge that emerges from the interaction of texts themselves. Users can take advantage of this contextualization with the Greco-Roman and Arabic texts provided, or specify themselves the higher-level relationship between their own texts (e.g., that document X is a translation of document Y) in order to create a reading environment where passages in a source text are accompanied by secondary resources such as translations and commentaries. Contextualized reading also intersects with linguistic support -- since dictionaries are also supported as "secondary" resources, a reader can find not simply what dictionary entry a word in a source text is derived from, but also a definition of what that word means. The library environment also includes an architecture for soliciting user contributions in the form of "voting" -- this is implemented online in the Perseus Digital Library in the form of user votes for morphological forms, but can be extended as well to accommodate other varieties of annotation.
Searching: Users can not only read passages from texts, but use a suite of search tools to find what they are looking for, in any of the languages the Hopper supports. These search tools include word and phrase searches, in individual texts or collections. These searches include the option to search all possible inflections of a word, making them extremely powerful for morphologically rich languages like Greek, Latin and Arabic (e.g., a lemmatized search for the root form sum would also find documents containing the inflected forms est and sunt). For Classical texts, which have a well-adopted citation scheme, users can navigate a text by typing canonical abbreviations (e.g., Thuc. 1.24). The Hopper also provides functionality to search and browse the tagged named entities (places, people, dates, and date ranges) in a corpus, and includes an architecture for presenting archaeological artifact and image data, which is separate from the reading environment.
Extensibility
The code base itself invites two varieties of extensibility. On the one hand, while the code is bundled with a collection of Greco-Roman and Arabic texts around which it has grown, users are able to include their own TEI-compliant XML texts as part of the reading environment and enable the same services for those texts as those that are available online for Perseus' open-source editions. As an API, the Perseus hopper also includes a number of Java classes for interacting with texts outside of a reading environment -- one can, for instance, use the linguistic services such as automatic lemmatization or morphological analysis as standalone tools for analyzing not simply the bundled Perseus texts, but any text of their own as well.
On the other hand, the Java code itself is also designed with modularity and extensibility in mind. An example of this is the variety of classes (all ultimately inherited from CorpusProcessor) to cycle through an entire collection of texts and perform some operation on each one. The workflow to build the library environment relies on these classes to calculate word and lemma statistics for the corpus at large, to map citations between texts, and to index them all in order to make them searchable later. These classes are easily extensible for any task that requires iterating through an entire collection of texts. The hopper source code also includes a number of services for managing named entities such as people and places, and has served as the foundation for visualization projects, plotting that data both geographically on a map and historically on a timeline. In terms of modularity, the hopper also includes a number of low-level classes for manipulating text -- from finding all possible lemmas for a given Latin form to delimiting an accented Greek word.
The rest is here:
Open Source Code - Perseus Project
- Calls to Ban Open Source are Misguided and Dangerous - The New Stack - June 26th, 2024
- Delving the Risks and Rewards of the Open-Source Ecosystem - InformationWeek - June 26th, 2024
- Enhancing security through collaboration with the open-source community - Help Net Security - June 18th, 2024
- It's time to face the open source security problem - ITPro - June 18th, 2024
- Mistral AI just launched 'Codestral', its own competitor to Code Llama and GitHub Copilot and it's fluent in over 80 ... - ITPro - June 2nd, 2024
- Open-source cybersecurity could derail the internet as we know it - Quartz - May 15th, 2024
- Developer Experience Influenced by Open Source Culture - InfoQ.com - May 15th, 2024
- BLint: Open-source tool to check the security properties of your executables - Help Net Security - May 15th, 2024
- Modular Open-Sources Mojo: The Programming Language that Turns Python into a Beast - MarkTechPost - April 2nd, 2024
- Meet the 21-Year-Old Creator of Devika, the Indian Open Source Devin Alternative - Analytics India Magazine - April 2nd, 2024
- Is Open Source Under Threat or Primed to Go to the Next Level? - The New Stack - March 13th, 2024
- Where is Technology Headed in 2024? - Open Source For You - March 13th, 2024
- A Detailed Conversation on Open-Source AI Frameworks for MLOps Workflows and Projects - AiThority - March 5th, 2024
- Everything you need to know about GitHub's new push protection changes - ITPro - March 5th, 2024
- StarCoder 2 is a code-generating AI that runs on most GPUs - TechCrunch - March 5th, 2024
- Is the future of open source software at risk due to protestware? - Tech Xplore - February 25th, 2024
- Google unveils new family of open-source AI models called Gemma to take on Meta and othersdeciding open-source AI aint so bad after all - Fortune - February 25th, 2024
- Jim Zemlin and the Linux Foundation share not-so-secret open-source sauce - ZDNet - February 25th, 2024
- Open source vs closed source AI: What's the difference and why does it matter? - Euronews - February 25th, 2024
- Biden administration to debate whether all AI systems should be open-source or closed - Firstpost - February 25th, 2024
- Some Linkerd service mesh users will soon have to pay - TechTarget - February 25th, 2024
- A lone developer just open sourced a tool that could bring an end to Nvidia's AI hegemony AMD financed it for ... - TechRadar - February 25th, 2024
- Scoping Out the Software-Defined Vehicle: The Benefits of OTA Updates & Open Source - Embedded Computing Design - February 25th, 2024
- The importance and limitations of open source AI models - TechTarget - February 9th, 2024
- 15+ Popular Python IDEs in 2024: Choosing The Best One - Simplilearn - February 9th, 2024
- Balancing Innovation and Security: The Open-Source Conundrum - BNN Breaking - February 9th, 2024
- VCs and startups love open-source AI models but how will they make money? - Sifted - February 9th, 2024
- How better and cheaper software could save millions of dollars while improving Canada's health-care system - The Conversation Indonesia - February 9th, 2024
- Best of 2023: Are We Witnessing the End of Open Source? - DevOps.com - December 28th, 2023
- What comes after open source? Bruce Perens is working on it - The Register - December 28th, 2023
- 200 GB of GTA 5 source code is about to get leaked, making it an open source: Report - Sportskeeda - December 28th, 2023
- Never was so much owed by so many to so few - a look at the unheralded heroes of the open source world - TechRadar - December 28th, 2023
- Rockstar hit with another cyberattack, leaked GTA 5 source code reveal cancelled DLC plans - Times of India - December 28th, 2023
- What is open source software? - Android Police - December 20th, 2023
- Feds Warn Health Sector to Watch for Open-Source Threats - BankInfoSecurity.com - December 11th, 2023
- OpenTofu: Open-source alternative to Terraform - Help Net Security - December 11th, 2023
- AWS exec: 'Our understanding of open source has started to change' - The Register - December 11th, 2023
- Mark Jelic Rings in 40 Years Since the TEC-1 Launch with a New, Open Source, Upgraded TEC-1G SBC - Hackster.io - December 11th, 2023
- AI's future could be 'open-source' or closed. Tech giants are divided as they lobby regulators - Tech Xplore - December 11th, 2023
- Cyber Security Today, Nov. 24, 2023 A warning to tighten security on Kubernetes containers, and more - IT World Canada - November 25th, 2023
- This AI Paper Proposes ML-BENCH: A Novel Artificial Intelligence Approach Developed to Assess the Effectiveness of LLMs in Leveraging Existing... - November 25th, 2023
- Generative AI is a genuine breakthrough unlike most fads in tech: Zerodha CTO Kailash Nadh on the current waves in tech - The Hindu - October 27th, 2023
- Meet RedPajama: An AI Project to Create Fully Open-Source Large Language Models Beginning with the Release of a 1.2 Trillion Token Dataset -... - April 25th, 2023
- Hashtag Trending Apr.24th- Cybersecurity workers burnout; Code generated by ChatGPT and Googles Bard not very secure; Execs would want a robot to make... - April 25th, 2023
- This AI Project Brings Doodles to Life with Animation and Releases Annotated Dataset of Amateur Drawings - MarkTechPost - April 17th, 2023
- EU shares best practices with Ukrainian law enforcers on Open Source Intelligence and Criminal Analysis to - EIN News - April 8th, 2023
- 'I've never seen anything like this:' One of China's most popular apps has the ability to spy on its users, say experts - CNN - April 8th, 2023
- With Just ~20 Lines of Python Code, You can Do Retrieval Augmented GPT Based QA Using This Open Source Repository Called PrimeQA - MarkTechPost - March 5th, 2023
- Daily Crunch: Hundreds of Salesforce workers laid off in January just discovered they were out of work today - TechCrunch - February 7th, 2023
- Unlocking the power of Open AI: how to automate information extraction - The Hindu - February 7th, 2023
- Is composable business most essential technology trend to meet challenges of 2023 and beyond? - ComputerWeekly.com - January 30th, 2023
- Open Definition & Meaning | Dictionary.com - January 22nd, 2023
- 529 Synonyms & Antonyms of OPEN - Merriam-Webster - January 22nd, 2023
- Open Definition & Meaning - Merriam-Webster - January 22nd, 2023
- Can Wazuh Become The Worlds Largest Open Source Cybersecurity Platform And IPO Without VC Funding? - Forbes - January 6th, 2023
- 8 Free/Open Source Code Review Tools for 2022 - SoftwareSuggest - December 28th, 2022
- Finding the next Log4j OpenSSFs Brian Behlendorf on pivoting to a risk-centred view of open source development - The Daily Swig - December 28th, 2022
- Nithin Kamath says FOSS is the 'pillar' on which Zerodha has been built. What is it? - Business Today - December 28th, 2022
- How Dogeliens Will Take Over the Metaverse Like Bitcoin and Stellar Took Over the Crypto World. - newsbtc.com - December 28th, 2022
- Intrinsic Buys Open Robotics' Commercial Arm, But Leaves ROS and Gazebo with the Foundation - Hackster.io - December 20th, 2022
- Open-source code is everywhere; GitHub expands security tools to help ... - December 20th, 2022
- Security Of Enterprise Code: What Companies Using Open-Source Software Should Know About Binary Code Verification - Forbes - December 20th, 2022
- Open Source - Apple Developer - December 12th, 2022
- Your Code of Conduct | Open Source Guides - December 12th, 2022
- Code of Conduct | Meta Open Source - Facebook - December 12th, 2022
- From the creator of Homebrew, Tea raises $8.9M to build a protocol that helps open source developers get paid - TechCrunch - December 12th, 2022
- Consortium of Japan partners successfully promote domestic production and cost reduction for 5G core technology, the basis for next-generation... - November 25th, 2022
- GitHub Vulnerability Allows Hackers to Hijack Thousands of Popular Open-Source Packages - CPO Magazine - November 17th, 2022
- GitHubs Octoverse report finds 97% of apps use open source software - VentureBeat - November 17th, 2022
- Microsoft sued for open-source piracy through GitHub Copilot - BleepingComputer - November 7th, 2022
- The White House Memorandum on Securing the Software Supply Chain: What It Means for Your Organization - Security Boulevard - November 7th, 2022
- First Timers Only - Get involved in Open Source and commit code to your ... - October 23rd, 2022
- List of free and open-source software packages - Wikipedia - October 23rd, 2022
- What is open source? - Red Hat - October 23rd, 2022
- Introducing Triton: Open-Source GPU Programming for Neural Networks - October 23rd, 2022
- Comparison of open-source and closed-source software - October 23rd, 2022
- Java 19 Brings New Patterns to Open Source Programming Language - October 23rd, 2022
- API series - OctoML: ML APIs need to take a lesson from their ancestors - ComputerWeekly.com - October 23rd, 2022
- Benefits of working with open source data quality solutions - TechRepublic - October 15th, 2022
- Microsoft's GitHub Copilot AI is making rapid progress. Here's how its human leader thinks about it - CNBC - October 15th, 2022